An upper-level course for CS majors on formal languages theory and compilers.

Overview

CS4100 Introduction to Formal Languages and Compilers

Spring 2022

An upper-level course for CS majors on formal languages theory and compilers.

Topics (subject to revision): regular expressions; finite automata; context-free grammars; predictive parsing; LR parsing; abstract syntax; type systems and type-checking; stack layout and activation records; intermediate representations; control-flow graphs; dataflow/liveness analysis; register allocation; garbage collection/runtimes; virtual machines; assemblers. Over the course of the semester, students will implement a full functioning compiler for a small programming language, targeting a bespoke virtual machine. The course requires a significant amount of programming.

Details
Lecture Tu/Th 3:05-4:25pm in Morton Hall 127
Instructors Chang Liu ([email protected])
Alexander Bagnall ([email protected])
Office Hours Tu 8-9:40am (email or Teams)
TA/GA "Kanieski, William" [email protected]
Lab Hours TBD

Texbook

There's no one textbook that covers everything we'll be talking about in this course. Instead, we'll assign readings each week from the following sources (and maybe others):

  • "Compilers: Principles, Techniques, and Tools, 2nd edition" by by Alfred Aho, Monica Lam, et al. Aug 31, 2006 (The purple dragon book)
  • Modern Compiler Implementation in ML, Andrew W. Appel (The tiger book)
  • Types and Programming Languages, Benjamin Pierce
  • The Rust Book
  • Crafting Interpreters, Nystrom

Course Difficulty

This is a demanding course that requires extensive programming work, in the form of a series of (often increasingly) difficult assignments. Expect to put in at least 10 hours (sometimes much more) per programming assignment.

Course Structure

The course consists of weekly lectures (Tu/Th), attendance at which is required. The programming assignments for this course are extensive and time consuming, so be prepared!

In addition to biweekly homework assignments, there will be a midterm exam (Week 9, approximately 15% of your grade) and a final (approximately 25%). The homeworks (programming assignments) are worth approximately 40%. We'll also have simple Blackboard quizzes (10%) and programming exercises (10%).

Grade Breakdown

Component Percentage
Programming exercises 10%
Project assignments 40%
Quizzes 10%
Midterm exam 15%
Final exam 25%

Blackboard will be used to report grades and to post lecture notes, homeworks, and reading material.

Schedule

The schedule is subject to revision.

Week Topic Reading Assignment
Week 1 (11 Jan) Intro. to the course, compilers, Rust The Rust Book 1-3 Q0/PE1 (Sat. 15 Jan)
Week 2 (18 Jan) Rust contd. The Rust Book 4-6, 8 Q1/PE2 (Sat. 22 Jan)
Week 3 (25 Jan) Rust contd. The Rust Book 9, 10.1, 10.2 PA0: Intro. to Rust (Sat. 29 Jan)
Week 4 (1 Feb) Virtual machines, bytecode, assemblers Crafting Interpreters 14, 15 Q2/PE3 (Sat. 5 Feb)
Week 5 (8 Feb) Garbage collection, concurrency Appel 13 PA1: Assembler (12 Feb)
Week 6 (15 Feb) Regular languages, regular expressions Appel 2 (through 2.2) Q3 (19 Feb)
Week 7 (22 Feb) DFAs, NFAs, lexers and lexer generators Appel 2.3-2.5 Q4 (26 Feb)
Week 8 (1 Mar) Context-free languages, pushdown automata Appel 3 PA2: VM (5 Mar)
Week 9 (15 Mar) Midterm review Midterm Exam (Thursday)
Week 10 (22 Mar) Recursive descent and predictive parsing, parser generators Appel 3.2-3.3 PA3: GC (27 Mar)
Week 11 (29 Mar) Abstract syntax trees, type systems, typechecking TAPL 3, 8 Q5 (2 Apr)
Week 12 (5 Apr) Intermediate representations, code generation Intermediate Representations, Code Generation No quiz -- work on PA4!
Week 13 (12 Apr) IR/codegen continued, control-flow graphs Appel 7.1 PA4: IR (16 Apr)
Week 14 (19 Apr) Dataflow/liveness analysis Appel 10.1 No quiz -- study for finals!
Apr 25 - Apr 29 FINAL EXAM (TBD) PA5 (23 Apr)

Assignments are due in Blackboard at 11:59pm unless otherwise specified. Q0, Q1, etc., denote quizzes in Blackboard, generally due on the Fridays of weeks with no due programming assignments (PAs).

Homework and Collaboration Policies

Acceptable Collaboration Matrix

Instructor/GA Noninstructor (e.g., Another Student)
You All collaboration allowed High-level discussion (of the problems, not your code!) allowed but only after you've started the assignment; must be documented in README as described below

Unless otherwise noted, homeworks are due Saturdays by 11:59 p.m. Late homework assignments will be penalized according to the following formula:

  • Up to 24 hours late: no deduction, for a max 2 late homeworks per student across the entire semester
  • Homeworks later than 24 hours, or from students who have already turned in 2 late homeworks, will receive 0 points.

You may discuss the homework with other students in the class, but only after you've attempted the problems on your own first. If you do discuss the homework problems with others, write the names of the students you spoke with, along with a brief summary of what you discussed, in a README comment at the top of each submission. Example:

(* README Alex Bagnall, Assn #1 
I worked with X and Y. We swapped tips regarding the use of pattern-matching in Rust. *)

However, under no circumstances are you permitted to share or directly copy code or other written homework material, except with course instructors. The code and proofs you turn in must be your own. Remember: homework is there to give you practice in the new ideas and techniques covered by the course; it does you no good if you don't engage!

That said, if we find that you have cheated on an assignment in this course, you will immediately:

  • Be referred to the Office of Community Standards (which may take disciplinary action against you, possibly expulsion); and
  • Flunk the course (receive a final grade of F).

Students in EECS courses such as this one must adhere to the Russ College of Engineering and Technology Honor Code, and to the OU Student Code of Conduct. If you haven't read these policies, do so now.

Students with Disabilities

If you suspect you may need an accommodation based on the impact of a disability, please contact me privately to discuss your specific needs. If you're not yet registered as a student with a disability, contact the Office of Student Accessibility Services first.

Student Outcomes vs. Course Learning Outcomes

  1. Analyze a complex computing problem and to apply principles of computing and other relevant disciplines to identify solutions.
  • Students will be able to appraise the tradeoffs, in terms of asymptotic complexity and precision, of distinct algorithms used in compiler construction (e.g., for garbage collection).
  1. Design, implement, and evaluate a computing-based solution to meet a given set of computing requirements in the context of the program’s discipline.
  • Students will be able to construct a compiler, over the course of a series of course assignments, for a small programming language.
  1. Apply computer science theory and software development fundamentals to produce computing-based solutions.
  • Students will be able to determine whether a given language is recognizable (e.g., by a regular expression, deterministic finite automaton, or context-free grammar).
  • Students will be able to construct a finite state machine to recognize a given language.
  • Students will be able to apply computer science theory to determine whether a given grammar is parseable by recursive descent.
You might also like...
A stupid macro that compiles and executes Rust and spits the output directly into your Rust code

inline-rust This is a stupid macro inspired by inline-python that compiles and executes Rust and spits the output directly into your Rust code. There

This is a Discord bot written in Rust to translate to and from the Bottom Encoding Standard using bottom-rs and Serenity.
This is a Discord bot written in Rust to translate to and from the Bottom Encoding Standard using bottom-rs and Serenity.

bottom-bot This is a Discord bot written in Rust to translate to and from the Bottom Encoding Standard using bottom-rs and Serenity. Ever had this pro

An implementation of Code Generation and Factoring for Fast Evaluation of Low-order Spherical Harmonic Products and Squares

sh_product An implementation of Code Generation and Factoring for Fast Evaluation of Low-order Spherical Harmonic Products and Squares (paper by John

lightweight and customizable rust s-expression (s-expr) parser and printer

s-expr Rust library for S-expression like parsing and printing parser keeps track of spans, and representation (e.g. number base) number and decimal d

Crates Registry is a tool for serving and publishing crates and serving rustup installation in offline networks.
Crates Registry is a tool for serving and publishing crates and serving rustup installation in offline networks.

Crates Registry Description Crates Registry is a tool for serving and publishing crates and serving rustup installation in offline networks. (like Ver

Simplify temporary email management and interaction, including message retrieval and attachment downloads, using Rust.

Tempmail The Tempmail simplifies temporary email management and interaction, including message retrieval and attachment downloads, using the Rust prog

The simplest way to de-Google your life and business: Inbox, Calendar, Files, Contacts & much more
The simplest way to de-Google your life and business: Inbox, Calendar, Files, Contacts & much more

Bloom The all-in-one private workspace Try it for free! You no longer trust tech monopolies with your data? You are done with your privacy invaded by

MeiliSearch is a powerful, fast, open-source, easy to use and deploy search engine
MeiliSearch is a powerful, fast, open-source, easy to use and deploy search engine

MeiliSearch is a powerful, fast, open-source, easy to use and deploy search engine. Both searching and indexing are highly customizable. Features such as typo-tolerance, filters, and synonyms are provided out-of-the-box. For more information about features go to our documentation.

A job queue built on sqlx and PostgreSQL.

sqlxmq A job queue built on sqlx and PostgreSQL. This library allows a CRUD application to run background jobs without complicating its deployment. Th

Owner
null
A collection of compilers based around compiling a high level language to a Brainfuck dialect.

tf A collection of compilers based around compiling a high level language to a Brainfuck dialect. Built at, and for, the VolHacks V hackathon during O

adam mcdaniel 6 Nov 25, 2021
Milho (corn in portuguese) is a toy dialect of Lisp written as a way to learn more about compilers

Milho (corn in portuguese) is a toy dialect of Lisp written as a way to learn more about compilers. There are implementations in rust and go

Celso Bonutti 27 May 4, 2022
Solutions of Advent of Code 2021 in Rust, and some other languages.

advent-of-rust Solutions of Advent of Code 2021 in Rust, and some other languages. Puzzles Puzzle Stars Languages Day 1: Sonar Sweep ⭐ ⭐ Rust Python D

rene-d 6 Jan 7, 2023
Frame is a markdown language for creating state machines (automata) in 7 programming languages as well as generating UML documentation.

Frame Language Transpiler v0.5.1 Hi! So very glad you are interested in Frame. Frame system design markdown language for software architects and engin

Mark Truluck 35 Dec 31, 2022
A high-level Rust crate around the Discord API, aimed to be easy and straight-forward to use.

rs-cord A high-level Rust crate around the Discord API, aimed to be easy and straight-forward to use. Documentation • Crates.io • Discord Navigation M

Jay3332 4 Sep 24, 2022
High-level PortMidi bindings and wrappers for Rust

High-level PortMidi bindings and wrappers for Rust

Philippe Delrieu 69 Dec 1, 2022
Simple tray application which shows battery level for HyperX Cloud Flight Wireless Headset.

HyperX Cloud Flight Battery Monitoring Introduction Simple tray application which shows battery level for HyperX Cloud Flight Wireless Headset. Screen

Stefan Kondinski 18 Dec 27, 2022
A toy-level BLE peripheral stack

bleps - A toy-level BLE peripheral stack This is a BLE peripheral stack in Rust. (no-std / no-alloc) To use it you need an implementation of embedded-

Björn Quentin 4 Oct 17, 2022
High-level, optionally asynchronous Rust bindings to llama.cpp

llama_cpp-rs Safe, high-level Rust bindings to the C++ project of the same name, meant to be as user-friendly as possible. Run GGUF-based large langua

Binedge.ai 4 Nov 21, 2023
A Rust proc-macro crate which derives functions to compile and parse back enums and structs to and from a bytecode representation

Bytecode A simple way to derive bytecode for you Enums and Structs. What is this This is a crate that provides a proc macro which will derive bytecode

null 4 Sep 3, 2022