by upmind on 1/22/25, 9:54 AM with 27 comments
by Timwi on 1/24/25, 7:42 AM
I just wish more tooling existed that was language-agnostic so that it's easier to get off the ground with something “serious”. I'm talking debuggers, parse-tree-aware diffs, autocompletion like Intellisense, etc.
by norir on 1/24/25, 5:13 PM
When you master the principles of parsing, it is straightforward to do in any language by hand with good performance. It is easy to write a function in any language that takes a range string, such as "_a-zA-Z" and returns a table of size 256 with 65-90, 95 and 97-122 set to 1 and the rest set to zero. You can build your parser easily on top of these tables (this is not necessarily optimal but it is more than good enough when you are starting).
For the backend, you can target another language that you already know. It could be javascript, c or something else. This will be much easier than targeting llvm.
Start with hello world and build up the functionality you need. In no more than 10k lines of code (and typically significantly less), you can achieve self hosting. Then you can rewrite the compiler in itself. By this point you have identified the parts of the language you like and the parts that you don't or are not carrying weight. This is the point at which maybe it makes sense to target llvm or write a custom assembly backend.
The beauty of this approach is you keep the language small from the beginning without relying on a huge pile of dependencies. For a small compiler, you do not need concurrency or other fancy features because the programs it compiles are almost definitionally small.
Now you have a language that is optimized for writing compilers that you understand intimately. You can use it to design a new language that has the fancy things you want like data race protection. Repeat the same process as before except this time you start with concurrent hello world and build in the data race protection from the beginning.
by atan2 on 1/24/25, 10:43 AM
by dunham on 1/24/25, 5:25 PM
It's perhaps not a "proper" language because I targeted Javascript. So I didn't have to write the back half of the compiler. Since it's dependent typed, I had plenty of work to do with dependent pattern matching, solving implicits, a typeclass-like mechanism, etc.
Next I may do a proper backend, or I may concentrate on the front end stuff (experiment with tighter editor integration, add LSP instead of the ad hoc extension that I currently have, or maybe turn it into a query-based compiler). Lots of directions I could go in.
At the moment, I'm looking into lambda-lifting the `where` clauses (I had punted lambda lifting to JS), and adding tail call optimization. I lost Idris' TCO when I self-hosted, so I currently have to run the self-hosted version in `bun` (JavaScriptCore does TCO).
by pyrale on 1/22/25, 10:36 AM
by mistrial9 on 1/24/25, 3:20 PM
by gjadi on 1/24/25, 11:23 AM
by fjfaase on 1/25/25, 7:47 AM
Once you have the grammar, you can use it with IParse developed in C++, which produces an abstract parse tree.
IParse has a build-in scanner for C like terminals, which are now used in many languages. You can implement your own scanner. IParse also has an unparser, which allows you to generate pretty printed output with just some annotations in the grammar.