Want to write a compiler? Just read these two papers (2008)
407 points - today at 9:41 AM
SourceComments
I disagree with the author's point: the "Dragon book"'s ("Compilers: Principles, Techniques, and Tools" by Aho et al.) Chapter 2 is a self-sufficient introduction into compilers from end to end, and it can be read on its own, ignoring the rest of the excellent book.
Another fantastic intro to compiler writing is the short little book "Compilers" by Niklaus Wirth, which explains and contains the surprisingly short source code of a complete compiler (the whole book is highly understandable - pristine clarity, really) and all in <100 pages total (99).
(I learned enough from these two sources to write a compiler in high school.)
Abdulaziz Ghuloum
http://scheme2006.cs.uchicago.edu/11-ghuloum.pdf
Abstract
Compilers are perceived to be magical artifacts, carefully crafted by the wizards, and unfathomable by the mere mortals. Books on compilers are better described as wizard-talk: written by and for a clique of all-knowing practitioners. Real-life compilers are too complex to serve as an educational tool. And the gap between real-life compilers and the educational toy compilers is too wide. The novice compiler writer stands puzzled facing an impenetrable barrier, “better write an interpreter instead.”
The goal of this paper is to break that barrier. We show that building a compiler can be as easy as building an interpreter. The compiler we construct accepts a large subset of the Scheme programming language and produces assembly code for the Intel-x86 architecture, the dominant architecture of personal computing. The development of the compiler is broken into many small incremental steps. Every step yields a fully working compiler for a progressively expanding subset of Scheme. Every compiler step produces real assembly code that can be assembled then executed directly by the hardware. We assume that the reader is familiar with the basic computer architecture: its components and execution model. Detailed knowledge of the Intel-x86 architecture is not required.
The development of the compiler is described in detail in an extended tutorial. Supporting material for the tutorial such as an automated testing facility coupled with a comprehensive test suite are provided with the tutorial. It is our hope that current and future implementors of Scheme find in this paper the motivation for developing high-performance compilers and the means for achieving that goal.
[1] Ometa https://tinlizzie.org/VPRIPapers/tr2007003_ometa.pdf
[2] Other ometa papers https://tinlizzie.org/IA/index.php/Papers_from_Viewpoints_Re...
[3] Adaptive compilation https://youtu.be/CfYnzVxdwZE?t=4575
the PhD thesis https://www.researchgate.net/publication/309254446_Adaptive_...
[4] Is it really "Complex"? Or did we just make it "Complicated"? Alan Kay https://youtu.be/ubaX1Smg6pY?t=3605
But, to also be fair, the above random access method does not work when you don't know what you don't know. So I understand why having a light, but good introduction to the topic is important, and I believe that's what the author is pointing out.
The Nanopass paper link doesn’t work.
I have ignored all the stuff about parsing theory, parser generators, custom DSL's, formal grammers etc. and instead have just been using the wonderful Megaparsec parser combinator library. I can easily follow the parsing logic, it's unambiguous (only one successful parse is possible, even if it might not be what you intended), it's easy to compose and re-use parser functions (was particularly helpful for whitespace sensitive parsing/line-fold handling), and it removes the tedious lexer/parser split you get with traditional parsing approaches.
What taught me how to write an optimizer was a Stanford summer course taught by Ullman and Hennessy.
The code generator was my own concoction, and is apparently quite unlike any other one out there!
I have the Dragon Book, but have never actually read it. So sue me.
Google search points me to https://github.com/cesarghali/PL241-Compiler/blob/master/DLX... for a description of the architecture and possibly https://bernsteinbear.com/assets/img/linear-scan-ra-context-... for the register allocation algorithm
In fact, inventing new programming languages and writing compilers for them used to be so much of a trend that people created YACC (Yet Another Compiler Compiler) to make it easier.
https://web.archive.org/web/20190712115536/http://home.iae.n...
I think that the nanopass architecture is especially well suited for compilers implemented by LLMs as they're excellent at performing small and well defined pieces of work. I'd love to see Anthropic try their C compiler experiment again but with a Nanopass framework to build on.
I've recently been looking in to adding Nanopass support to Langkit, which would allow for writing a Nanopass compiler in Ada, Java, Python, or a few other languages [3].
[1]: https://andykeep.com/pubs/dissertation.pdf
Right, I've heard of that...
> , which started in 1988.
... Oh. Huh.
(Staring at the red dragon book on my bookshelf, which was my course textbook in the early 00s.)
I quite like "understanding and writing compilers" by Richard Bornat - written in the 1970s using BCPL as the implementation language, so rather old-fashioned, but it gives a friendly gentle overview of how to do it, without excessive quantities of parsing theory.
PD: Klong's intro to statisticks, even if the compiler looks like a joke, it isn't. It can be damn useful. Far easier than Excel. And it comes with a command to output a PS file with your chart being embedded.
Intro to statistics with Klong
https://t3x.org/klong/klong-intro.txt.html
https://t3x.org/klong/klong-ref.txt.html
On S9, well, it has Unix, Curses, sockets and so on support with an easy API. So it's damn easy to write something if you know Scheme/Ncurses and try stuff in seconds. You can complete the "Concrete Abstractions" book with it, and just adapt the graphic functions to create the (frame) one for SICP (and a few more).
And as we are doing compilers... with SICP you create from some simulator to some Scheme interpreter in itself.