by hwpythonner on 4/28/25, 11:44 AM with 265 comments
I’m sharing an early benchmark: a GPIO test where PyXL achieves a 480ns round-trip toggle — compared to 14-25 micro seconds on a MicroPython Pyboard - even though PyXL runs at a lower clock (100MHz vs. 168MHz).
The design is stack-based, fully pipelined, and preserves Python's dynamic typing without static type restrictions. I independently developed the full stack — toolchain (compiler, linker, codegen), and hardware — to validate the core idea. Full technical details will be presented at PyCon 2025.
Demo and explanation here: https://runpyxl.com/gpio Happy to answer any questions
by zik on 4/29/25, 12:44 AM
Reading further down the page it says you have to compile the python code using CPython, then generate binary code for its custom ISA. That's neat, but it doesn't "execute python directly" - it runs compiled binaries just like any other CPU. You'd use the same process to compile for x86, for example. It certainly doesn't "take regular python code and run it in silicon" as claimed.
A more realistic claim would be "A processor with a custom architecture designed to support python".
by Y_Y on 4/28/25, 1:06 PM
I'd love to read about the design process. I think the idea of taking bytecode aimed at the runtime of dynamic languages like Python or Ruby or even Lisp or Java and making custom processors for that is awesome and (recently) under-explored.
I'd be very interested to know why you chose to stay this, why it was a good idea, and how you went about the implementation (in broad strokes if necessary).
by hwpythonner on 4/28/25, 11:44 AM
by jonjacky on 4/29/25, 12:50 AM
"Running a very small subset of python on an FPGA is possible with pyCPU. The Python Hardware Processsor (pyCPU) is a implementation of a Hardware CPU in Myhdl. The CPU can directly execute something very similar to python bytecode (but only a very restricted instruction set). The Programcode for the CPU can therefore be written directly in python (very restricted parts of python) ..."
by boutell on 4/28/25, 2:53 PM
I'm interested to see whether the final feature set will be larger than what you'd get by creating a type-safe language with a pythonic syntax and compiling that to native, rather than building custom hardware.
The background garbage collection thing is easier said than done, but I'm talking to someone who has already done something impressively difficult, so...
by obitsten on 4/28/25, 12:48 PM
by rthomas6 on 4/28/25, 12:35 PM
* Could you share the assembly language of the processor?
* What is the benefit of designing the processor and making a Python bytecode compiler for it, vs making a bytecode compiler for an existing processor such as ARM/x86/RISCV?
by rkagerer on 4/28/25, 4:18 PM
by sunray2 on 4/28/25, 10:30 PM
What's the fundamental physical limits here? Namely, timing precision, latency and jitter? How fast could PyXL bytecode react to an input?
For info, there is ARTIQ: vaguely similar thing that effectively executes Python code with 'embedded level' performance:
https://m-labs.hk/experiment-control/artiq/
ARTIQ is quite common in quantum physics labs. For that you need very precise and determining timing. Imagine you're interfering two photons as they reach a piece of glass, so that they can interact. It doesn't get faster than photons! That typically means nanosecond timing, sub-microsecond latency.
How ARTIQ does it is also interesting. The Python code is separate from the FPGA which actually executes the logic you want to do. In a hand-wavy way, you're then 'as fast' as the FPGA. How, though? The catch is, you have to get the Python code and FPGA gateware talking to each other, and that's technically difficult and has many gotchas. In comparison, although PyXL isn't as performant, if it makes it simpler for the user, that's a huge win for everyone.
Congrats once again!
by froh on 4/28/25, 12:38 PM
fun :-)
but did I get it right?
by thenobsta on 4/28/25, 1:56 PM
Every time I see a project that has a great implementation on an FPGA, I lament the fact that Tabula didn’t make it, a truly innovative and fast FPGA.
by asford on 5/2/25, 3:11 PM
Micropython already exposes "viper", which transpiles byte code to machine instructions for highly timing or performance critical code paths. This is reasonably well explained in the micropython docs, which has an example explaining how to ... trigger a gpio and very rapidly.
https://docs.micropython.org/en/latest/reference/speed_pytho...
Viper runs on device and directly emits native machine code for decorated micropython functions. If you have serious timing requirements for gpio, then this is how you do it.
Of course, this is restricted subset of the language compatible with direct native code gen, notably just supporting integer datatypes. However, I would be shocked if this project wasn't also restricted to a subset of the language functionality for your transpilation pipeline.
The benchmark should be rewritten to compare against a baseline in micropython using viper. Though this project is pretty neat, the over inflated performance claims would rapidly deflate against a strong baseline.
by Jean-Papoulos on 4/28/25, 12:51 PM
So, no using C libraries. That takes out a huge chunck of pip packages...
by yanniszark on 4/28/25, 3:52 PM
by JadoJodo on 4/28/25, 6:08 PM
Can you give me the scoop on Python, the language? I see things like this project, and it seems very impressive, but being an outsider to the language, I don't "get" it. More specifically: I'm curious to hear thoughts on a) what made this difficult prior to now (with Python), b) why Python is useful for this, and c) what are your thoughts on Python itself?
To add some more context:
I know a lot of developers who work with Python (Flask); Some love it, some hate it (as with any language). My experience has been mainly via homelab/OSS tools that all seem to embrace the language. And yet while the language itself seems very straight forward and easy to use, my experience with the Python _ecosystem_ (again, as an outsider) has been... difficult.
Python 2 vs 3, virtual environments, libraries for each version, etc. It feels as though anytime I've had to use it outside a pre-built Docker container, these issues result in throwing spaghetti at the wall trying to figure out how to even get it working at all. As a PHP/Go dev, it's one of the languages for which I could see myself having a real interest, but this has so far made me hesitant (and I don't want to be).
by bieganski on 4/28/25, 1:49 PM
having this, the next tempting step is to make `print` function work, then the filesystem wrapper etc.
btw - what i'm missing is a clear information of limitations. it's definitely not true that i can take any Python snippet and run it using PyXL (for example threads i suppose?)
by willvarfar on 4/28/25, 1:19 PM
Is it tied to a particular version of python?
by wodenokoto on 4/28/25, 12:51 PM
by swoorup on 4/28/25, 12:50 PM
by kristianpaul on 4/28/25, 6:26 PM
by nynx on 4/28/25, 4:34 PM
by jrexilius on 4/28/25, 12:36 PM
by tgtweak on 4/28/25, 2:43 PM
>Standard_NP10s instance, 1x AMD Alveo U250 FPGA (64GB)
Would be curious to see how this benchmarks on a faster FGPA since I imagine clock frequency is the latency dictator - while memory and tile can determine how many instances can run in parallel.
by IlikeKitties on 4/28/25, 2:05 PM
by fluorinerocket on 4/28/25, 3:55 PM
I
by chippiewill on 4/29/25, 7:43 AM
I'd definitely be interested in how this project progresses, particularly if it adds support for integration to the CPU. Some tie-in to the Pynq project could be super fun.
by pjmlp on 4/28/25, 2:17 PM
by M4R5H4LL on 4/28/25, 5:35 PM
by hermitShell on 4/28/25, 12:37 PM
by igtztorrero on 4/28/25, 1:12 PM
by actinium226 on 4/28/25, 5:07 PM
I have what may be a dumb question, but I've heard that Lua can be used in embedded contexts, and that it can be used without dynamic memory allocation and other such things you don't want in real time systems. How does this project compare to that? And like I said it's likely a dumb question because I haven't actually used Lua in an embedded context but I imagine if there's something there you've probably looked at it?
by boxed on 4/28/25, 1:29 PM
by echoangle on 4/28/25, 4:57 PM
by tsukikage on 4/29/25, 10:30 AM
> Runs a subset of Python
What's the advantage of using a new custom toolchain, custom instruction set and custom processor over existing tools that compile a subset of Python for existing CPUs? - e.g. Cython, Nuitka etc?
by focusgroup0 on 4/28/25, 8:08 PM
by bluelightning2k on 4/28/25, 4:45 PM
Absolutely incredible.
by crest on 4/28/25, 6:57 PM
by tuetuopay on 4/28/25, 12:57 PM
This is amazing, great work!
by dec0dedab0de on 4/28/25, 3:08 PM
This is so cool, I have dreamt about doing this but wouldn't know where to start. Do you have a plan for releasing it? What is your background? Was there anything that was way more difficult than you thought it would be? Or anything that was easier than you expected?
by hoistbypetard on 4/29/25, 12:56 AM
That said... awesome work! I wish I could get to PyCon this year to see your talk.
Are you planning to post your core so others can replicate your work?
by _JamesA_ on 4/28/25, 4:39 PM
by simonw on 4/28/25, 5:34 PM
Do you have any open source code available for this yet?
Are you planning to release this as open source? If not, do you have a rough idea for how you plan to commercial license this tech?
by ConanRus on 4/28/25, 4:13 PM
why are we not doing this for a standard python? i think LLVM is just for that, no?
by jay-barronville on 4/28/25, 4:18 PM
Almost every question I had, you already answered in the comments. The only one remaining at the moment: How long exactly have you been working on PyXL?
by startupsfail on 4/28/25, 4:57 PM
by freeone3000 on 4/28/25, 12:38 PM
I’m guessing due to the lack of JIT, it’s executed on the host?
by two_handfuls on 4/28/25, 3:40 PM
by zoobab on 4/28/25, 6:43 PM
by gadys on 4/28/25, 12:18 PM
by warble on 4/28/25, 5:11 PM
by davidkwast on 4/28/25, 12:48 PM
by vrighter on 4/30/25, 12:20 PM
by yeahwhatever10 on 4/28/25, 4:51 PM
by globalnode on 4/28/25, 11:37 PM
by rangerelf on 4/28/25, 7:31 PM
Congratulations!!
by redox99 on 4/28/25, 4:14 PM
by jollyllama on 4/28/25, 4:05 PM
by dcreater on 4/28/25, 7:17 PM
by esseph on 4/29/25, 3:48 AM
by sneak on 4/28/25, 3:52 PM
by UncleOxidant on 4/28/25, 3:17 PM
by HPsquared on 4/28/25, 12:47 PM
That then makes me wonder if someone could implement Excel in hardware! (Or something like it)
by psychip on 4/29/25, 12:33 PM
by TickleSteve on 4/28/25, 1:09 PM
- Lisp/lispm
- Ada/iAPX
- C/ARM
- Java/Jazelle
Most don't really take off or go in different directions as the language goes out of fashion.
by brap on 4/28/25, 6:29 PM
by actinium226 on 4/28/25, 3:16 PM
by ingen0s on 4/28/25, 8:43 PM
by jimbokun on 4/28/25, 3:10 PM
Clearly you know a lot about both low level Python internals and a fair amount about hardware design to pull this off.
by ktimespi on 4/29/25, 6:51 PM
by hoseja on 4/28/25, 1:19 PM
by igtztorrero on 4/28/25, 1:11 PM
by flmontpetit on 4/28/25, 1:10 PM
Very cool project still