by v7engine on 4/4/23, 4:10 PM with 48 comments
by ksherlock on 4/4/23, 4:21 PM
V2: (PDP-11 Unix) Kernel is written in assembly. C compiler is written in assembly.
V3: Kernel is written in assembly. C compiler is written in C.
V4: Kernel is written in C. C compiler is written in C.
https://www.tuhs.org/cgi-bin/utree.pl?file=PDP7-Unix
https://www.tuhs.org/cgi-bin/utree.pl?file=V2
by tetha on 4/4/23, 5:26 PM
- Pick a language that's simple enough. A subset of ML would be good, but if you want to complete it, I'd recommend a simple LISP. This is your new language. This is C.
- Use a language you know and like to implement a compiler for this language. This is your bootstrap language. Compile this for example into C--, ASM or LLVM, depending on what you know. This is the target language. As a recommendation, keep this compiler as simple as possible so you have a reference for the next step. For C, both the bootstrap and the target language were ASM.
- And now iterate on extending the stdlib your language has, until you can implement a compiler for your new language in your new language. Again, keep this compiler simple without optimization or passes, just generate the most trivial machine code possible. This usually takes a bit of back-and-forth. You'll need some function evaluation first, some expression evaluation first (this is where a lisp can be an advantage, as those are the same), then you need function definitions, then you need filesystem interactions and so on. You kinda discover what you need as you implement.
- Once you have all of that, (i) compile the compiler for the new language in your bootstrap language and (ii) compile the compiler for the new language using the result of (i). If you want to verify the results, compile the compiler again with the output of (ii) and check if (ii) and (iii) are different.
- Your new language is now self-hosted.
This was fun, because it was accompanied with other courses like how processor microcode implements processor instructions, how different kinds of assembly is mapped onto processor instructions, and then how higher level languages are compiled down into assembly. All of this across 4-6 semesters resulted in a pretty good understanding how Java ends up being flip-flop operations.
EDIT - got target & bootstrap mixed up in first part.
by anon25783 on 4/4/23, 4:48 PM
by athorax on 4/4/23, 5:13 PM
https://www.youtube.com/watch?v=lJf2i87jgFA&list=RDLVlJf2i87...
by starkparker on 4/4/23, 4:17 PM
pre-C Unix was written in assembly
by AnimalMuppet on 4/4/23, 4:19 PM
by jzellis on 4/4/23, 4:55 PM
(I'm just teasing, as you were.)
by hawski on 4/4/23, 5:51 PM
Now imagine how most things around you were made, how higher tech was made with lower tech. How they made high precision tools, when there were only lower precision tools available? For example: how to make a 0.001 mm precise caliper when all you have is a 0.1 mm one? There were a lot of challenges like that and we still get to new ones. I just wonder what general term is used for things like that.
by abraxas on 4/4/23, 4:44 PM
by JohnFen on 4/4/23, 4:48 PM
by MarkusWandel on 4/4/23, 5:08 PM
Generally it's an incremental process where the compiler for an early/subset version of the language is written in another, existing language (absent one, may be assembly code).
Once it's possible to rewrite the compiler in its own subset language, it becomes self-hosting. Then you can add a feature to the language, and once it works, enhance the compiler to use it, and so on.
Eventually the language and compiler go hand-in hand: The only way to compile it is with a compiler, and the only way to compile the compiler is with itself. This leads to interesting thought experiments such as:
https://www.cs.cmu.edu/~rdriley/487/papers/Thompson_1984_Ref...
by js2 on 4/4/23, 4:45 PM
by coliveira on 4/4/23, 5:03 PM
by snvzz on 4/4/23, 5:28 PM
UNIX is just an UNICS rewrite in C, and was done by the authors of UNICS and C.
by HeckFeck on 4/4/23, 5:14 PM
by simonblack on 4/4/23, 10:10 PM
There were Cs for MSDOS, Cs for CP/M, even Cs for Windows, etc, etc.
by remram on 4/4/23, 5:15 PM
by bitwize on 4/4/23, 4:45 PM
by moralestapia on 4/4/23, 5:54 PM
Doesn't that imply already that C precludes UNIX? The question doesn't make sense.
by _dain_ on 4/4/23, 4:45 PM