from Hacker News

I wrote my own “proper” programming language (2020)

by upmind on 1/22/25, 9:54 AM with 27 comments

by Timwi on 1/24/25, 7:42 AM
I'm so excited to see that the idea of creating new programming languages is getting more popular. There is definitely a lot of space in which to explore more creativity; we haven't even remotely begun to scratch the surface of what's possible!
I just wish more tooling existed that was language-agnostic so that it's easier to get off the ground with something “serious”. I'm talking debuggers, parse-tree-aware diffs, autocompletion like Intellisense, etc.
by norir on 1/24/25, 5:13 PM
This is one way to write a compiler. One of the large tradeoffs made is extensive use of libraries on both the front and backends. It's a practical choice but also means that the compiler itself is also likely somewhat slow. Targeting LLVM alone is a choice that will guarantee that code gen is slow (you pay for a lot of unneeded complexity in llvm with every compile).
When you master the principles of parsing, it is straightforward to do in any language by hand with good performance. It is easy to write a function in any language that takes a range string, such as "_a-zA-Z" and returns a table of size 256 with 65-90, 95 and 97-122 set to 1 and the rest set to zero. You can build your parser easily on top of these tables (this is not necessarily optimal but it is more than good enough when you are starting).
For the backend, you can target another language that you already know. It could be javascript, c or something else. This will be much easier than targeting llvm.
Start with hello world and build up the functionality you need. In no more than 10k lines of code (and typically significantly less), you can achieve self hosting. Then you can rewrite the compiler in itself. By this point you have identified the parts of the language you like and the parts that you don't or are not carrying weight. This is the point at which maybe it makes sense to target llvm or write a custom assembly backend.
The beauty of this approach is you keep the language small from the beginning without relying on a huge pile of dependencies. For a small compiler, you do not need concurrency or other fancy features because the programs it compiles are almost definitionally small.
Now you have a language that is optimized for writing compilers that you understand intimately. You can use it to design a new language that has the fancy things you want like data race protection. Repeat the same process as before except this time you start with concurrent hello world and build in the data race protection from the beginning.
by atan2 on 1/24/25, 10:43 AM
I probably won't create a "proper" programming language but this topic fascinates me. As someone that never even took a compilers class in college I was really happy with the content I found at pikuma.com. The course really helped me understand how a simple programming language works. I'm sure others might benefit from it too.
by dunham on 1/24/25, 5:25 PM
I wrote my own language last year[1], ending the year by doing Advent of Code in it, and then translated it to itself in early January (so it's now self-hosted). I wanted to see if I could learn how to write a dependent typed language, wanted it to be self hosted, and able to run in a browser.
It's perhaps not a "proper" language because I targeted Javascript. So I didn't have to write the back half of the compiler. Since it's dependent typed, I had plenty of work to do with dependent pattern matching, solving implicits, a typeclass-like mechanism, etc.
Next I may do a proper backend, or I may concentrate on the front end stuff (experiment with tighter editor integration, add LSP instead of the ad hoc extension that I currently have, or maybe turn it into a query-based compiler). Lots of directions I could go in.
At the moment, I'm looking into lambda-lifting the `where` clauses (I had punted lambda lifting to JS), and adding tail call optimization. I lost Idris' TCO when I self-hosted, so I currently have to run the self-hosted version in `bun` (JavaScriptCore does TCO).
[1]: https://github.com/dunhamsteve/newt
by pyrale on 1/22/25, 10:36 AM
I'd be interested to understand the design choices behind using protobuf as an interface with LLVM: in my reasoning, it may be more performant, but that serialization step is a very small part of compute, and the serialization format is unusable by humans. For debug purposes, it'd have been nice to have a more human-friendly format. Did the project have other constraints?
by mistrial9 on 1/24/25, 3:20 PM
there was a guy on a large science team that wrote the "programming language" for that system.. there was a system of dispatch for "verbs" in the system, and it was large.. there were maybe 20 full time engineers building other parts, all the time. The guy who wrote the "language" was a sports guy with a swim background. For years, five days a week, he would get up before dawn and train swimming, then he would arrive at work at 9am and work on the code system.. every day.. for years. It was admirable in a way but also slavish
by gjadi on 1/24/25, 11:23 AM
His progression is wild. Going from top Cambridge graduate in 2021 to Staff Eng at META in 3y. Nice!
by fjfaase on 1/25/25, 7:47 AM
If you want to design your own language, you might want to start with IParse Studio, an interactive online parser that parses input according a grammer at every keystroke returning a parse tree, if the input can be parsed according the grammar.
Once you have the grammar, you can use it with IParse developed in C++, which produces an abstract parse tree.
IParse has a build-in scanner for C like terminals, which are now used in many languages. You can implement your own scanner. IParse also has an unparser, which allows you to generate pretty printed output with just some annotations in the grammar.