The benefits of building a compiler are manyfold; here are two important ones. First, you get a much better understanding of how to translate higher level languages into assembly language. Second, you get to build and maintain a ball of code that’s somewhat larger than you handle in many other courses (between 2K and 10K lines of code, depending on your language of implementation).
As a result, this course lives at the intersection of Programming Languages, Architecture, and Software Engineering. With luck, you’ll learn something about each of these.
implement and maintain a project of up to 10,000 lines of code; tracking bugs, developing tests, and documenting code,
implement a compiler from a modern, high-level programming language to a low-level assembly language, and
present and explain code and design work.
Students taking this course must be familiar with the core principles of programming languages, including the evaluation of Abstract Syntax Trees, and the implementation of function calls, environments, and closures.
Students must also be comfortable writing larger programs (~2 KLOC), and be able to manage the development process efficiently.
Finally, students must be able to write programs in assembly language.
John Clements, aoeuclements @ brinckerhoff.org
Lecture: 14:10–15:00, MWF, room 20-129
Lab: 15:10–16:00, MWF, room 14-301
See my home page for my calendar. You can add it to your calendar, if that makes your life easier.
Office hours also appear on this calendar; you may find them easier to see if you click on the "week" tab of the calendar.
This is the course web page, its link is https://www.brinckerhoff.org/clements/2194-csc431/.
I think that an interactive and lively classroom is a better learning environment. In particular, I will almost certainly learn everyone’s name, and I’m likely to notice if you’re missing. My experience is that if you come to class reliably, you’re extremely likely to pass the class—there’s a reason that we conduct classes face-to-face; it keeps you engaged, and ensures that you’re connected to the other students in the class.
In addition, I’m likely to call on you, in places during the lecture where I want to see if you’re following what’s going on. If you don’t know, it’s totally fine to say "no, I have no idea." In particular, this is probably evidence that I’m going too fast or not explaining things well. However, I try to respect the wishes of students for whom this technique is disruptive. Please let me know if you don’t want me to call on you.
Finally, my experience standing in front of classes and more especially my experience of sitting behind classes has convinced me that laptops are useful for note-taking in approximately 1% of cases. Essentially, never.
Indeed, there’s now a mountain of evidence indicating that laptops are distracting to students and to those around them, and that even when these distractions are eliminated, taking notes on laptops fails to create learning in effective ways. I’ll just cite this one paper, because it’s got copious references to other sources.
For this reason, I do not allow the use of laptops in class without special dispensation. If you need to use a laptop to take notes, please come and talk to me; otherwise, just put it away and take notes on paper.
You will be able to complete the work in this class in one of a number of different programming languages. One of them is Racket. Others will certainly include C, C++, and Java. If students or teams are interested in using other languages (Python? Rust?) we can discuss extending this list.
This class allows teams of size two. You are not required to work with a partner, but I strongly encourage it. Since your work through the quarter will all be on a single project, you will be working with the same partner for the duration of the class, though it is possible to change partners after the first project.
The work in this class will consist principally of one quarter-long development project (hint: it’ll be a compiler). Milestones are TBA.
We’re going to use Github Classroom for assignments in this class. The first lab contains an invitation link.
This book might have a totally awesome textbook from Jeremy Siek and Ryan Newton entitled Essentials of Compilation. The link is at the top of the page.
In addition, there are many classic textbooks that give a broad overview of compilation techniques, including Cooper and Torczon’s “Engineering a Compiler, 2nd edition”. The first chapter is a great overview, and the whole book provides detailed information and (perhaps) a different perspective on the process of building a compiler.
This class will use Piazza. This will be the principal means that I’ll use to notify you of deadlines, organizational updates, and changes to assignments. If you’re not keeping up with the group, you’re going to be missing important information.
It’s also the best way for you to direct questions to me and/or the class. Feel free to e-mail me with personal questions, but use the Piazza group as your main means of communication. It’s possible to post anonymously, if you like.
You should already have received an invitation to the Piazza group; let me know if you need an invite.
Don’t post your code or test cases to the group; anything else is fair game.
Also, please keep in mind that I (and everyone else) judge you based in part on your written communication. Spelling, complete sentences, and evidence of forethought are important in all of your posts & e-mails. One easy rule of thumb: just read over what you’ve written before clicking post or send, and imagine others in the class reading it.
In the programming assignments, you may not copy another student’s code (including test cases). You may not share code with other students in the class, during or after completing the class. That is, you may not allow another student to see the code you write for the class, deliberately or through obvious negligence.
I will use an automated tool to compare student submissions and identify dishonesty.
Students believed to be cheating–that is, both parties involved in the transfer of code–will receive a failing grade in the class.
Programming assignments will be due at 8:00 PM. You must submit your assignments by pushing to your github repo. Remember that committing isn’t the same as pushing! Late assignments will not be accepted.
From time to time, we may examine student code, in lecture. Try to ensure that the code you submit is something you’d be proud to show to the others in the class.
Late Policy: Except for exceptional circumstances, late assignments will be given 0 points.
I will be grading your code repeatedly in this class. On most assignments, your score will consist of a part (usually 20 points) based on your performance on a set of test cases automatically administered by the handin server, and a part (usually 6 points) based on my opinion of your code’s clarity, organization, and adherence to rules about purpose statements and contracts (in short: you’ve got to have them). As a rule, my "eyeball" score rubric runs something like this:
6 points – I simply can’t find anything wrong with your code.
5 points – some inelegant parts, or one or two purpose statements or contracts missing
4 points – an actual misunderstanding, or a widespread lack of purpose statements and contracts
3 points – a serious misunderstanding–you didn’t understand some major part of the assignment
2 points – your program is seriously incomplete, doesn’t compile, or has widespread major problems
1 point – you didn’t make any apparent progress on the program at all.
Finally, please note that I will place comments in some of your submissions indicating errors or stylistic requests. These will all begin with the string ;;> (in Racket) or ##> (in Python), so you can search for these in the e-mail that you get with your final assignment grade.
Here’s the most important thing to know about code in this class:
I do actually read it. That means that—
This means that getting your code to work is not the end of the process; after you get your code to work, you have to clean it up, put nice headers on the various parts, collect the test cases, document strange things that you did, and clarify the code.
You should begin with a single-paragraph comment that describes how far you got: did you finish, or did you get stuck on something? If you got stuck, describe what’s done and what’s still left to implement.
As a rule, I like to read code in a "top-down" way. This means that the definition of the top-level, important functions should come first, and the supporting functions should come later. I want to have a good understanding of the big picture before getting into the details. My experience is that if interp makes sense, then add-to-env will probably not present any difficulty.
Another part of cleaning up the code is collecting the test cases in a place that’s sensible and doesn’t interrupt the flow of reading the code. It’s probably best–after you’re done writing the code–to collect the test cases at the bottom of the file (or put them in another file altogether, if appropriate).
Whatever language you use, it’s likely to have a style guide. Here’s the one for Racket. You’re not required to follow any style guide, but anything that makes your code hard to read could hurt your score.
Finally: dead code is misleading and makes code hard to read. Delete it.
I reserve the right to assign bad scores to programs that work correctly; if I don’t think you’re doing a good job of programming, then you won’t receive a good score. "It works" isn’t a defense for bad code.
Good code is easy to read. I reserve the right to allocate a fixed period of time to grading a program submission. Don’t be surprised to see comments like "ran out of grading time here."
Naturally, all grades contain an element of subjectivity.
My experience suggests that frequent quizzes are a good way to ensure that you’re understanding what I’m teaching, and that I’m teaching things that you understand.
This class will have wednesday quizzes every other week, starting in the second week. These quizzes will probably be fifteen minutes long, and will probably take place during lab.
Each group will submit a paper detailing the design and implementation of their compiler project. At least half of the grade for the paper will depend on the presentation of some performance analysis of the code generated by the group’s compiler.
This paper must:
Outline the overall architecture of the solution,
outline the representation of key data (e.g., control flow graphs and instructions),
outline optimizations implemented, and
provide a section detailing the performance of the code generated for the benchmarks. This section must contain graphs comparing the run-times of the generated code (with and without optimizations) and the C equivalent code compiled using gcc or clang (with and without optimizations).
Grades will be determined by performance on programming projects, your final submission, the quizzes, and the final paper A small fraction of the grade is determined by the labs, and by the instructor’s whim. The breakdown of the grade is as follows:
Final Submission: 40%