from Hacker News

Show HN: I built a hardware processor that runs Python

by hwpythonner on 4/28/25, 11:44 AM with 265 comments

Hi everyone, I built PyXL — a hardware processor that executes a custom assembly generated from Python programs, without using a traditional interpreter or virtual machine. It compiles Python -> CPython Bytecode -> Instruction set designed for direct hardware execution.

I’m sharing an early benchmark: a GPIO test where PyXL achieves a 480ns round-trip toggle — compared to 14-25 micro seconds on a MicroPython Pyboard - even though PyXL runs at a lower clock (100MHz vs. 168MHz).

The design is stack-based, fully pipelined, and preserves Python's dynamic typing without static type restrictions. I independently developed the full stack — toolchain (compiler, linker, codegen), and hardware — to validate the core idea. Full technical details will be presented at PyCon 2025.

Demo and explanation here: https://runpyxl.com/gpio Happy to answer any questions

by zik on 4/29/25, 12:44 AM
This is a very cool project but I feel like the claim is overstated: "PyXL is a custom hardware processor that executes Python directly — no interpreter, no JIT, and no tricks. It takes regular Python code and runs it in silicon."
Reading further down the page it says you have to compile the python code using CPython, then generate binary code for its custom ISA. That's neat, but it doesn't "execute python directly" - it runs compiled binaries just like any other CPU. You'd use the same process to compile for x86, for example. It certainly doesn't "take regular python code and run it in silicon" as claimed.
A more realistic claim would be "A processor with a custom architecture designed to support python".
by Y_Y on 4/28/25, 1:06 PM
Are there any limitations on what code can run? (discounting e.g. memory limitations and OS interaction)
I'd love to read about the design process. I think the idea of taking bytecode aimed at the runtime of dynamic languages like Python or Ruby or even Lisp or Java and making custom processors for that is awesome and (recently) under-explored.
I'd be very interested to know why you chose to stay this, why it was a good idea, and how you went about the implementation (in broad strokes if necessary).
by hwpythonner on 4/28/25, 11:44 AM
I built a hardware processor that runs Python programs directly, without a traditional VM or interpreter. Early benchmark: GPIO round-trip in 480ns — 30x faster than MicroPython on a Pyboard (at a lower clock). Demo: https://runpyxl.com/gpio
by jonjacky on 4/29/25, 12:50 AM
A much earlier (2012) attempt at a Python bytecode interpreter on an FPGA:
https://pycpu.wordpress.com/
"Running a very small subset of python on an FPGA is possible with pyCPU. The Python Hardware Processsor (pyCPU) is a implementation of a Hardware CPU in Myhdl. The CPU can directly execute something very similar to python bytecode (but only a very restricted instruction set). The Programcode for the CPU can therefore be written directly in python (very restricted parts of python) ..."
by boutell on 4/28/25, 2:53 PM
This is very, very cool. Impressive work.
I'm interested to see whether the final feature set will be larger than what you'd get by creating a type-safe language with a pythonic syntax and compiling that to native, rather than building custom hardware.
The background garbage collection thing is easier said than done, but I'm talking to someone who has already done something impressively difficult, so...
by obitsten on 4/28/25, 12:48 PM
Why is it not routine to "compile" Python? I understand that the interpreter is great for rapid iteration, cross compatibility, etc. But why is it accepted practice in the Python world to eschew all of the benefits of compilation by just dumping the "source" file in production?
by rthomas6 on 4/28/25, 12:35 PM
* What HDL did you use to design the processor?
* Could you share the assembly language of the processor?
* What is the benefit of designing the processor and making a Python bytecode compiler for it, vs making a bytecode compiler for an existing processor such as ARM/x86/RISCV?
by rkagerer on 4/28/25, 4:18 PM
Back when C# came out, I thought for sure someone would make a processor that would natively execute .Net bytecode. Glad to see it finally happened for some language.
by sunray2 on 4/28/25, 10:30 PM
Very interesting!
What's the fundamental physical limits here? Namely, timing precision, latency and jitter? How fast could PyXL bytecode react to an input?
For info, there is ARTIQ: vaguely similar thing that effectively executes Python code with 'embedded level' performance:
https://m-labs.hk/experiment-control/artiq/
ARTIQ is quite common in quantum physics labs. For that you need very precise and determining timing. Imagine you're interfering two photons as they reach a piece of glass, so that they can interact. It doesn't get faster than photons! That typically means nanosecond timing, sub-microsecond latency.
How ARTIQ does it is also interesting. The Python code is separate from the FPGA which actually executes the logic you want to do. In a hand-wavy way, you're then 'as fast' as the FPGA. How, though? The catch is, you have to get the Python code and FPGA gateware talking to each other, and that's technically difficult and has many gotchas. In comparison, although PyXL isn't as performant, if it makes it simpler for the user, that's a huge win for everyone.
Congrats once again!
by froh on 4/28/25, 12:38 PM
Do I get this right? this is an ASIC running a python-specific microcontroller which has python-tailored microcode? and together with that a python bytecode -> microcode compiler plus support infrastructure to get the compiled bytcode to the asic?
fun :-)
but did I get it right?
by thenobsta on 4/28/25, 1:56 PM
Amazing work! This is a great project!
Every time I see a project that has a great implementation on an FPGA, I lament the fact that Tabula didn’t make it, a truly innovative and fast FPGA.
<https://en.m.wikipedia.org/wiki/Tabula,_Inc.>
by asford on 5/2/25, 3:11 PM
The benchmark results presented in this page are extremely misleading; you're not comparing to the actual baseline gpio performance available in micropython.
Micropython already exposes "viper", which transpiles byte code to machine instructions for highly timing or performance critical code paths. This is reasonably well explained in the micropython docs, which has an example explaining how to ... trigger a gpio and very rapidly.
https://docs.micropython.org/en/latest/reference/speed_pytho...
Viper runs on device and directly emits native machine code for decorated micropython functions. If you have serious timing requirements for gpio, then this is how you do it.
Of course, this is restricted subset of the language compatible with direct native code gen, notably just supporting integer datatypes. However, I would be shocked if this project wasn't also restricted to a subset of the language functionality for your transpilation pipeline.
The benchmark should be rewritten to compare against a baseline in micropython using viper. Though this project is pretty neat, the over inflated performance claims would rapidly deflate against a strong baseline.
by Jean-Papoulos on 4/28/25, 12:51 PM
>PyXL is a custom hardware processor that executes Python directly — no interpreter, no JIT, and no tricks. It takes regular Python code and runs it in silicon.
So, no using C libraries. That takes out a huge chunck of pip packages...
by yanniszark on 4/28/25, 3:52 PM
Great work! :D I had a question about that though. Instead of compiling to PySM, why not compile directly to a real assembly like ARM? Is the PySM assembly very special to accomodate python features in a way that can't be done efficiently in existing architectures like ARM?
by JadoJodo on 4/28/25, 6:08 PM
I'd like to invite any Python devs to go on a tangent with me:
Can you give me the scoop on Python, the language? I see things like this project, and it seems very impressive, but being an outsider to the language, I don't "get" it. More specifically: I'm curious to hear thoughts on a) what made this difficult prior to now (with Python), b) why Python is useful for this, and c) what are your thoughts on Python itself?
To add some more context:
I know a lot of developers who work with Python (Flask); Some love it, some hate it (as with any language). My experience has been mainly via homelab/OSS tools that all seem to embrace the language. And yet while the language itself seems very straight forward and easy to use, my experience with the Python _ecosystem_ (again, as an outsider) has been... difficult.
Python 2 vs 3, virtual environments, libraries for each version, etc. It feels as though anytime I've had to use it outside a pre-built Docker container, these issues result in throwing spaghetti at the wall trying to figure out how to even get it working at all. As a PHP/Go dev, it's one of the languages for which I could see myself having a real interest, but this has so far made me hesitant (and I don't want to be).
by bieganski on 4/28/25, 1:49 PM
it would be nice to have some peripheral drivers implemented (UART, eMMC etc).
having this, the next tempting step is to make `print` function work, then the filesystem wrapper etc.
btw - what i'm missing is a clear information of limitations. it's definitely not true that i can take any Python snippet and run it using PyXL (for example threads i suppose?)
by willvarfar on 4/28/25, 1:19 PM
Fantastic work! :D Must be super-satisfying to get it up and running! :D
Is it tied to a particular version of python?
by wodenokoto on 4/28/25, 12:51 PM
I can totally see a future where you can select “accelerated python” as an option for your AWS lambda code.
by swoorup on 4/28/25, 12:50 PM
How does garbage collection work here? Are they just set of PySM code?
by kristianpaul on 4/28/25, 6:26 PM
This always mede think back to J1 Forth CPU https://excamera.com/files/j1.pdf
by nynx on 4/28/25, 4:34 PM
This is cool for sure. I think you’ll ultimately find that this can’t really be faster than modern OoO cores because python instructions are so complex. To execute them OoO or even at a reasonable frequency (e.g. to reduce combinatorial latency), you’ll need to emit type-specialized microcode on the fly, but you can’t do that until the types are known — which is only the case once all the inputs are known for python.
by jrexilius on 4/28/25, 12:36 PM
Amazing work! Is the primary goal here to allow more production use of python in an embedded context, rather than just prototyping?
by tgtweak on 4/28/25, 2:43 PM
Have you tested it on any faster FPGAs? I think Azure has instances with xilinx/AMD accelerators paired.
>Standard_NP10s instance, 1x AMD Alveo U250 FPGA (64GB)
Would be curious to see how this benchmarks on a faster FGPA since I imagine clock frequency is the latency dictator - while memory and tile can determine how many instances can run in parallel.
by IlikeKitties on 4/28/25, 2:05 PM
Is this running on an FPGA or were you able to fab a custom chip?
by fluorinerocket on 4/28/25, 3:55 PM
Makes me think of LabVIEW FPGA, where you could run LabVIEW code directly on FPGA, more like generate vhdl or verilog from LabVIEW, and do very high loop rate deterministic control systems. Very cool. Except with that you were locked down to the national instruments ecosystem and no one really used it.
I
by chippiewill on 4/29/25, 7:43 AM
Very cool. There's a similar project, Polyphony (https://github.com/polyphony-dev/polyphony) that translates Python directly into Verilog - no processor (A bit like what HLS does for C++). As part of my degree dissertation I tacked on AXI bus support to it to facilitate communication between the CPU and FPGA on a Zynq as a PoC of doing hardware/software co-design with Python.
I'd definitely be interested in how this project progresses, particularly if it adds support for integration to the CPU. Some tie-in to the Pynq project could be super fun.
by pjmlp on 4/28/25, 2:17 PM
This is kind of cool, basically a Python Machine. :)
by M4R5H4LL on 4/28/25, 5:35 PM
I love this kind of project, this is wonderful work. I guess the challenge is to now make it work for general purpose Python. In any case it looks very much like a marketable product already. I would seek financing to see how far this can go.
by hermitShell on 4/28/25, 12:37 PM
fantastic project. Do you envision this as living on FPGA's forever, or getting into silicon directly? Maybe an extension of RISC-V?
by igtztorrero on 4/28/25, 1:12 PM
Amazing, I'm sure many programmers would join to contribute to your great project, which could become as big as a Python-based operating system, which due to the simplicity of the code would advance very quickly.
by actinium226 on 4/28/25, 5:07 PM
So first of all, this is awesome and props to you for some great work.
I have what may be a dumb question, but I've heard that Lua can be used in embedded contexts, and that it can be used without dynamic memory allocation and other such things you don't want in real time systems. How does this project compare to that? And like I said it's likely a dumb question because I haven't actually used Lua in an embedded context but I imagine if there's something there you've probably looked at it?
by boxed on 4/28/25, 1:29 PM
How big a deal would it be to include the bytecode->PySM translation into the ISA? It seems like it would be even cooler if the CPU actually ran python bytecode itself.
by echoangle on 4/28/25, 4:57 PM
Would this be able to handle an exec()- or eval()-call? Is there a Python byte code compiler available as python byte code to include in this processor?
by tsukikage on 4/29/25, 10:30 AM
> A custom toolchain compiles a .py file into CPython ByteCode, translates it to a custom assembly, and produces a binary that runs on a pipelined processor built from scratch.
> Runs a subset of Python
What's the advantage of using a new custom toolchain, custom instruction set and custom processor over existing tools that compile a subset of Python for existing CPUs? - e.g. Cython, Nuitka etc?
by focusgroup0 on 4/28/25, 8:08 PM
Incredible work. This is a paradigm shift for ML and embedded workflows. And congratulations, you are going to ring the bell with this one.
by bluelightning2k on 4/28/25, 4:45 PM
I am a pretty smart person. But once in a while I see something like this which reminds me there's always someone far smarter.
Absolutely incredible.
by crest on 4/28/25, 6:57 PM
A "480ns GPIO roundtrip" @ 100MHz implies 48 cycles for a single GPIO access. I would understand one or two cycles, but what does it spend the other ~46 cycles on? Does Python really have a >40x overhead compared to assembler or C even on optimised hardware or is the benchmark code that bad?
by tuetuopay on 4/28/25, 12:57 PM
So basically you took the idea of Jazelle extensions that can run Java bytecode natively, but for python?
This is amazing, great work!
by dec0dedab0de on 4/28/25, 3:08 PM
Congratulations!
This is so cool, I have dreamt about doing this but wouldn't know where to start. Do you have a plan for releasing it? What is your background? Was there anything that was way more difficult than you thought it would be? Or anything that was easier than you expected?
by hoistbypetard on 4/29/25, 12:56 AM
It seems worth noting that the board you're comparing it to costs <$30 where the dev board you're running on costs $250+.
That said... awesome work! I wish I could get to PyCon this year to see your talk.
Are you planning to post your core so others can replicate your work?
by _JamesA_ on 4/28/25, 4:39 PM
It would be interesting to see something like this that runs WASM as a universal bytecode.
by simonw on 4/28/25, 5:34 PM
This looks incredible.
Do you have any open source code available for this yet?
Are you planning to release this as open source? If not, do you have a rough idea for how you plan to commercial license this tech?
by ConanRus on 4/28/25, 4:13 PM
> the program is compiled to a CPython Bytecode and then compiled again to PyXL assembly. It is then linked together and a binary is generated.
why are we not doing this for a standard python? i think LLVM is just for that, no?
by jay-barronville on 4/28/25, 4:18 PM
This type of project is why I love HN. This work is brilliant!
Almost every question I had, you already answered in the comments. The only one remaining at the moment: How long exactly have you been working on PyXL?
by startupsfail on 4/28/25, 4:57 PM
Nice, next step could be rolling out that bytecode compiler in Python, so it’s self-contained. And a port to some LLM-on-silicon, so we could have it executing Python as the inference goes :-P
by freeone3000 on 4/28/25, 12:38 PM
This is amazing! Is the “microcode” compiled to final native on the host or the coprocessor?
I’m guessing due to the lack of JIT, it’s executed on the host?
by two_handfuls on 4/28/25, 3:40 PM
This is a one-person project? I'm impressed!
by zoobab on 4/28/25, 6:43 PM
To reflash ch32v003 chips, I need to create bits of 250ns, so with 480ns it's not enough. Is there a way to make it faster?
by gadys on 4/28/25, 12:18 PM
Look impressive How does this compare to pypy?
by warble on 4/28/25, 5:11 PM
Wow, these FPGAs are not cheap. Don't they also have a couple of ARM cores attached on the SOC?
by davidkwast on 4/28/25, 12:48 PM
Wow. Congratz
by vrighter on 4/30/25, 12:20 PM
you created a custom processor and made a compiler for it. The source language happens to be python, but the generated bytecode is not what executes eon the cpu. A custom ISA is not the python bytecode
by yeahwhatever10 on 4/28/25, 4:51 PM
How are you simulating the designs for the FPGA? Are you paying for ModelSim?
by globalnode on 4/28/25, 11:37 PM
Great idea and frankly I'm surprised it hasn't been done before. Probably because you would have to sell an awful lot of them to make $. But there would definitely be a market I think. For example if they were cheap, say much cheaper than a Pi, I'd go for something like this over a full Linux machine for dedicated projects. But then how would you do complex things like interfacing to cameras and leveraging encoders etc? Or is this sort of device just not for that type of project.
by rangerelf on 4/28/25, 7:31 PM
Incredible work :-)
Congratulations!!
by redox99 on 4/28/25, 4:14 PM
What's the logic behind going for stack based?
by jollyllama on 4/28/25, 4:05 PM
Name's a bit confusing when XLWings exists
by dcreater on 4/28/25, 7:17 PM
Very impressive! Can it run on RISC V?
by esseph on 4/29/25, 3:48 AM
This seems super, super cool!
by sneak on 4/28/25, 3:52 PM
How long did you work on this?
by UncleOxidant on 4/28/25, 3:17 PM
Is the source code available?
by HPsquared on 4/28/25, 12:47 PM
Not to be confused with openpyxl, a library for working with Excel files.
That then makes me wonder if someone could implement Excel in hardware! (Or something like it)
by psychip on 4/29/25, 12:33 PM
it was cool until i read the line "what is gpio"
by TickleSteve on 4/28/25, 1:09 PM
There is a long history of CPUs tailored to specific languages:
- Lisp/lispm
- Ada/iAPX
- C/ARM
- Java/Jazelle
Most don't really take off or go in different directions as the language goes out of fashion.
by brap on 4/28/25, 6:29 PM
Up next: a processor that will directly execute your prompt
by actinium226 on 4/28/25, 3:16 PM
This is awesome
by ingen0s on 4/28/25, 8:43 PM
Thats great!
by jimbokun on 4/28/25, 3:10 PM
What's your development background that prepared you to take on a project like this?
Clearly you know a lot about both low level Python internals and a fair amount about hardware design to pull this off.
by ktimespi on 4/29/25, 6:51 PM
Kind of insane that you achieved this. Does your processor support all python bytecode at this point? How do you implement ref counting and garbage collection?
by hoseja on 4/28/25, 1:19 PM
I wonder if silicon can feel pain.
by igtztorrero on 4/28/25, 1:11 PM
Amazing,
by flmontpetit on 4/28/25, 1:10 PM
For a minute there I was imagining Python as the actual instruction set and my brain was segfaulting.
Very cool project still