by mattip on 7/31/23, 10:53 AM with 180 comments
[0] https://www.pypy.org/contact.html [1] https://www.pypy.org/posts/2022/11/pypy-and-conda-forge.html [2] https://www.pypy.org/download.html [3] https://www.pypy.org/contact.html
by ggm on 7/31/23, 11:52 AM
Moving to pypy definitely speeded me up a bit. Not as much as I'd hoped, it's probably all about string index into dict and dict management. I may recode into a radix tree. Hard to work out in advance how different it would be: People optimised core datastructs pretty well.
Uplift from normal python was trivial. Most dev time spent fixing pip3 for pypy in debian not knowing what apts to load, with a lot of "stop using pip" messaging.
by reftel on 7/31/23, 12:25 PM
by macNchz on 7/31/23, 12:38 PM
Haven’t used it in a bit mostly because I’ve been working on projects that haven’t had the same bottleneck, or that rely on incompatible extensions.
Thank you for your work on the project!
by ADcorpo on 7/31/23, 11:50 AM
I am still working on it but the main issue is psycopg support for now, as I had to install psycopg2cffi in my test environment, but it will probably prevent me from using pypy for running our test suite, because psycopg2cffi does not have the same features and versions as psycopg2. This means either we switch our prod to pypy, which won't be possible because I am very new in this team and that would be seen as a big, risky change by the others, or we keep in mind the tests do not run using the exact same runtime as production servers (which might cause bugs to go unnoticed and reach production, or failing tests that would otherwise work on a live environment).
I think if I ever started a python project right now, I'd probably try and use pypy from the start, since (at least for web development) there does not seem to be any downsides to using it.
Anyways, thank you very much for your hard work !
by PaulHoule on 7/31/23, 11:40 AM
With CPython, I was frustrated with how slow it was, and complained about it to the people I was working with, PyPy was a simple upgrade that sped up my code to the point where it was comfortable to work with.
by eigenvalue on 7/31/23, 3:04 PM
Create venv and activate it and install packages:
python3 -m venv venv
source venv/bin/activate
python3 -m pip install --upgrade pip
python3 -m pip install wheel
pip install -r requirements.txt
I wanted a similar one-liner that I could use on a fresh Ubuntu machine so I can try out PyPy easily in the same way. After a bit of fiddling, I came up with this monstrosity which should work with both bash and zsh (though I only tested it on zsh):Create venv and activate it and install packages using pyenv/pypy/pip:
if [ -d "$HOME/.pyenv" ]; then rm -Rf $HOME/.pyenv; fi && \
curl https://pyenv.run | bash && \
DEFAULT_SHELL=$(basename "$SHELL") && \
if [ "$DEFAULT_SHELL" = "zsh" ]; then RC_FILE=~/.zshrc; else RC_FILE=~/.bashrc; fi && \
if ! grep -q 'export PATH="$HOME/.pyenv/bin:$PATH"' $RC_FILE; then echo -e '\nexport PATH="$HOME/.pyenv/bin:$PATH"' >> $RC_FILE; fi && \
if ! grep -q 'eval "$(pyenv init -)"' $RC_FILE; then echo 'eval "$(pyenv init -)"' >> $RC_FILE; fi && \
if ! grep -q 'eval "$(pyenv virtualenv-init -)"' $RC_FILE; then echo 'eval "$(pyenv virtualenv-init -)"' >> $RC_FILE; fi && \
source $RC_FILE && \
LATEST_PYPY=$(pyenv install --list | grep -P '^ pypy[0-9\.]*-\d+\.\d+' | grep -v -- '-src' | tail -1) && \
LATEST_PYPY=$(echo $LATEST_PYPY | tr -d '[:space:]') && \
echo "Installing PyPy version: $LATEST_PYPY" && \
pyenv install $LATEST_PYPY && \
pyenv local $LATEST_PYPY && \
pypy -m venv venv && \
source venv/bin/activate && \
pip install --upgrade pip && \
pip install wheel && \
pip install -r requirements.txt
Maybe others will find it useful.by pdw on 7/31/23, 12:27 PM
So the good: It apparently now supports Python 3.9? Might want to update your front page, it only mentions Python 3.7.
The bad: It only supports Python 3.9, we use newer features throughout our code, so it'd be painful to even try it out.
by mkl on 7/31/23, 11:26 AM
Personally I don't use PyPy for anything, though I have followed it with interest. Most of the things I need to go faster are numerical, so Numba and Cython seem more appropriate.
by q3k on 7/31/23, 11:31 AM
The biggest blocker for me for 'defaulting' to PyPy is a) issues when dealing with CPython extensions and how quite often it ends up being a significant effort to 'port' more complex applications to PyPy b) the muscle memory for typing 'python3' instead of 'pypy3'.
by cpburns2009 on 7/31/23, 2:04 PM
We use the PyPy provided downloads (Linux x86 64 bit) because it's easier to maintain multiple versions simultaneously on Ubuntu servers. The PyPy PPA does not allow this. I try to keep the various projects using the latest stable version of PyPy as they receive maintenance, and we're currently transitioning from 3.9/v7.3.10 to 3.10/v7.3.12.
Thank you for all of the hard work providing a JITed Python!
by v3ss0n on 7/31/23, 6:08 PM
PyPy should had become standard implemention and it would save a lot of investment on Fast python
I tried to shill PyPy all the time but thanks to outdated website and weird reason of hetapod love ( at least put something on GitHub for discovery sick) , the devs who won't bother to look anything further than a GitHub page frawns upon me thinking PyPy is outdated and inactive project.
PyPy is one of the most ambitious project in opensource history and lack of publicity make me scream internally.
by rsecora on 7/31/23, 12:37 PM
Speed up of 30x - 40x. The highest speedup on those that require logic in the transformation. (lot of function calls, numerical operations and dictionary lookups).
by ghj on 7/31/23, 6:05 PM
PyPy is pretty well stress-tested by the competitive programming community.
https://codeforces.com/contests has around 20-30k participants per contest, with contests happening roughly twice a week. I would say around 10% of them use python, with the vast majority choosing pypy over cpython.
I would guesstimate at least 100k lines of pypy is written per week just from these contests. This covers virtually every textbook algorithm you can think of and were automatically graded for correctness/speed/memory. Note that there's no special time multiplier for choosing a slower language, so if you're not within 2x the speed of the equivalent C++, your solution won't pass! (hence the popularity of pypy over cpython)
The sheer volume of advanced algorithms executed in pypy gives me huge amount of confidence in it. There was only one instance where I remember a contestant running into a bug with the jit, but it was fixed within a few days after being reported: https://codeforces.com/blog/entry/82329?#comment-693711 https://foss.heptapod.net/pypy/pypy/-/issues/3297.
New edit from that previous comment: there's now a Legendary Grandmaster (ELO rating > 3000, ranking 33 out of hundreds of thousands) who almost exclusively use pypy: https://codeforces.com/submissions/conqueror_of_tourist
by eigenvalue on 7/31/23, 3:22 PM
Also, you might want to flag the libraries that technically "work" but still require an extremely long and involved build process. For example, I recently started the process of installing Pandas with pip in a PyPy venv and it was stuck on `Getting requirements to build wheel ...` for a very long time, like 20+ minutes.
by Twirrim on 7/31/23, 3:00 PM
I'm rarely using python in places at work where it would suit it (lots of python usage, but they're more on the order of short run tools), but I'm always looking for chances and always using it for random little personal things.
by twp on 7/31/23, 1:05 PM
Thank you for your amazing work!
by ant6n on 7/31/23, 1:24 PM
Python is fun to work with (except classes…), but its just sooo slow. Pypy can be a life saver.
[1] https://blog.transitapp.com/how-we-shrank-our-trip-planner-t... [2] https://blog.transitapp.com/how-we-built-the-worlds-pretties...
by wiz21c on 7/31/23, 11:45 AM
(nevertheless, PyPy is impressive :-) )
by oebs on 7/31/23, 4:02 PM
So we've made it configurable to run some instances with Pypy - which was able to work through the data in realtime, i.e. without generating a lag in the data stream. The downside of using pypy was increased memory usage (4-8x) - which isn't really a problem. An actually problem that I didn't really track down was that the test suite (running pytest) was taking 2-3 times longer with Pypy than with CPython.
A few months ago I upgraded the system to run with CPython 3.11 and the performance improvements of 10-20% that come with that version now actually allowed us to drop Pypy and only run CPython. Which is more convenient and makes the deployment and configuration less complex.
by eslaught on 7/31/23, 4:36 PM
We eventually rewrote the profiler tool in Rust for additional speedups, but as mentioned for the verification engine, it's probably too complicated to ever do that so we really appreciate drop-in tools like PyPy that can speed up our code.
[1]: https://github.com/StanfordLegion/legion/blob/master/tools/l...
[2]: https://github.com/StanfordLegion/legion/blob/master/tools/l...
by waysa on 7/31/23, 11:52 AM
by t90fan on 7/31/23, 12:03 PM
The performance of PyPy over CPython saved us loads and loads time and thus $$$s, from what I can recall.
by tgbugs on 7/31/23, 6:26 PM
We also use pypy3 to accelerate rdflib parsing and serialization of various RDF formats. See for example [3].
Thanks to you and the whole PyPy team!
1. https://github.com/tgbugs/dockerfiles/blob/6f4ad5d873b7ab267...
2. https://github.com/tgbugs/dockerfiles/blob/6f4ad5d873b7ab267...
3. https://github.com/SciCrunch/sparc-curation/blob/0fdf393e26f...
by fragebogen on 8/1/23, 11:50 AM
Basically I'm using a SciPy exclusively for the optimization routine:
* minimize(method="SLSQP") [0]
* A list comprehention which calls ~10-500 pre-fitted PchipInterpolator [1] functions and stores the values as a np.array().
The Pchip functions (and it's first derivatives) are used in the main opt function as well as in several constraints.
Most jobs took about 10 seconds but the long tail might take up to 10 min some times. I tried the pypy 3.8 (7.3.9), and saw similar compute times on the shorter jobs, but roughly ~2x slower compute times on the heavier jobs. This obviously was not what I expected, but I had very limited experience with pypy and didn't know how to debug further.
Eventually python 3.10 came around and gave 1.25x speed increase, and then 3.11 which gave another 1.6-1.7x increase which gave a decent ~2x cumulative speedup, but the occasional heavy jobs still stay in the 5 min range and would have been nicer in the 10-30s obviously.
Still I would like to say that trying pypy out was a quite smooth experience, staying within scipy land, took me half a day to switch and benchmark. But if anyone else has experience with pypy and scipy, knowing some obvious pitfalls, it would be much appreciated to hear.
[0] https://docs.scipy.org/doc/scipy/reference/optimize.minimize...
[1] https://docs.scipy.org/doc/scipy/reference/generated/scipy.i...
by Apreche on 7/31/23, 1:18 PM
That said, if I do ever run into a situation where I need my code to perform better, PyPy is high on my list of things to try. It’s nice to know it’s an option.
by cool-RR on 7/31/23, 6:08 PM
I'm currently doing multi-agent reinforcement learning research using RLlib, which is part of Ray. I tried to install a PyPy environment for it. It failed because Ray doesn't provide a wheel for it:
Could not find a version that satisfies the requirement ray (from versions: none)
My hunch is that even Ray did provide that, there would have been some other roadblock that would have prevented me from using PyPy.by oxmane on 7/31/23, 4:24 PM
FWIW, since I've seen it mentioned, we've also been using psycopg2cffi to access Postgres sources.
The product now lives (at least partially) as Datastream on GCP (https://cloud.google.com/datastream/docs/overview). I'm not sure though if it's still running on PyPy.
I could try and connect with the folks still working on it, if you're interested.
by lsferreira42 on 7/31/23, 1:31 PM
Also in my day job we use pypy in all our python deployments, to be fair until now I thought that everybody would develop in python, test in pypy for an easy speed boost and only got back to python if pypy was slower than cpython
by _han on 7/31/23, 1:13 PM
I would be interested in seeing benchmarks where PyPy is compared with more recent versions of CPython. https://www.pypy.org/ currently shows a comparison with CPython 3.7, but recent releases of CPython (3.11+) put a lot of effort into performance which is important to take into account.
by wg0 on 7/31/23, 12:50 PM
by bofaGuy on 7/31/23, 1:17 PM
by IshKebab on 7/31/23, 12:32 PM
If I could just `pip3 install pypy` and then set an environment variable to use it or something like that then I'd give it a try. It does feel a bit like adding a jet pack to a rowing boat though. I know some people use Python in situations where the performance requirement isn't "I literally don't care" but surely not very many?
Obviously if it was the default that would be fantastic.
by btown on 7/31/23, 6:08 PM
Things like https://github.com/gevent/gevent/issues/676 and the fix at https://github.com/gevent/gevent/commit/f466ec51ea74755c5bee... indicate to me that there are subtleties on how PyPy's memory management interacts with low-level tweaks like gevent that have relied on often-implicit historical assumptions about memory management timing.
Not sure if this is limited to gevent, either - other libraries like Sentry, NewRelic, and OpenTelemetry also have low-level monkey-patched hooks, and it's unclear whether they're low-level enough that they might run into similar issues.
For a stack without any monkey-patching I'd be overjoyed to use PyPy - but between gevent and these monitoring tools, practically every project needs at least some monkey-patching, and I think that there's a lack of clarity on how battle-tested PyPy is with tools like these.
by RMPR on 7/31/23, 4:16 PM
by PartiallyTyped on 7/31/23, 1:21 PM
by saltcured on 8/1/23, 4:33 PM
1. The same naive deserialization and dict processing code ran much faster with PyPy.
2. Conveniently, PyPy also tolerated some broken surrogate pairs in Twitter's UTF8 stream, which threw exceptions when trying to decode the same events with the regular Python interpreter.
I've had some web service code where I wished I could easily swap to PyPy, but these were conservative projects using Apache + mod_wsgi daemons with SE-Linux. If there were a mod_wsgi_pypy that could be a drop-in replacement, I would have advocated for trials/benchmarking with the ops team.
Most other performance-critical work for me has been with combinations of numpy, PyOpenCL, PyOpenGL, and various imaging codecs like `tifffile` or piping numpy arrays in/out of ffmpeg subprocesses.
by ahallan on 7/31/23, 1:31 PM
I've deployed used the pypy:3.9 image on docker.
One thing I did notice is that it was significantly faster on my local machine vs when I tried to deploy it using an AWS lambda/fargate. I know this is because of virtualization/virtual-cpu, but there was not much I could do to improve it.
by danielpassy on 8/9/23, 1:39 PM
by wenc on 7/31/23, 3:23 PM
Two reasons for my hesitation:
1) Cpython is fast enough for most things I need to do. The speed improvement from Pypy is either not enough or not necessary.
2) Lingering doubts about subtle incompatibility (in terms of library support) that I might have to spend hours getting to the bottom of.
I already work long hours and don’t have bandwidth to tinker. With Cpython, although slow, I can be assured is the standard surface that everyone targets, and I can google solutions for.
It’s the subtle things that i waste a lot of time on. It’s analogous to an Ubuntu user trying to use Red Hat. They’re both Linuxes but the way things are done are different enough that they trip you up.
The only way to get out of this quandary is for Pypy to be a first class citizen. Guido will never endorse this so this means a bunch of us will always have hesitation putting it into production systems.
by comboy on 7/31/23, 3:58 PM
Quite often you would want to just thank somebody, or say that you would prefer it that way and don't understand why is it this way or it would be cool to have this or that, but of course opening ticket on github feels like wasting time of the maintainer and especially when you have some feedback like e.g. what would you like to see or what you do and don't like it feels entitled because well you can do it yourself, you can fork etc.
It would need to be low friction for both sides. Preferably with no way to respond so that there's zero pressure and little time waste for maintainers.
Mail feels like you want something, it works for thank you but still feels bad on receiving end when you just ignore them.
by CurriedHautious on 7/31/23, 5:07 PM
SQL Alchemy actually points to PyPy in its recommendations of things to try in ORM performance. https://docs.sqlalchemy.org/en/20/faq/performance.html#resul...
by landtuna on 7/31/23, 3:56 PM
by Qem on 7/31/23, 3:34 PM
But while programming as a hobby at home, mostly small-scale simulations, PyPy is my default interpreter for Python. It seems PyPy has a sweet spot on code written relying heavily on OOP style, with a lot of method calls and self invocation. I consistently get 8-10x speed improvements.
by hnfong on 7/31/23, 5:14 PM
I was close to trying pypy on a production django deployment (which gets ~100k views a month), but given that the tiny AWS EC2 instance we're running it on is memory bound, the increased pypy memory usage made it impractical to do so.
by alfalfasprout on 7/31/23, 4:50 PM
Nowadays, to be honest, everything that I need to be fast in Python is largely around numerical code which either calls out to C/C++ (via numpy or some ML library) or I use numba for. And these are either slower w/ PyPi or won't work.
HTTP web servers are notoriously slow in Python (even the fastest ones like falcon) but I found they either didn't play nicely with Pypi or weren't any faster. In large part because if the API does any kind of "heavy lifting" they can't be truly concurrent.
by claytonjy on 7/31/23, 1:48 PM
If we could use pypy, while still using those packages, I think it'd be the go-to interpreter. Why can't pypy optimize everything else, and leave the C stuff as-is?
How does pypy handle packages written in other languages, like rust? can I use pypy if I depend on Pydantic?
by ideasman42 on 7/31/23, 12:51 PM
by justinc-md on 7/31/23, 4:39 PM
by Aqueous on 7/31/23, 4:56 PM
by vogu66 on 7/31/23, 7:10 PM
by garyrob on 7/31/23, 11:54 AM
by qeternity on 7/31/23, 3:59 PM
by pyuser583 on 7/31/23, 1:16 PM
The big obstacle is that for while we would have multiple execution environments. It’s not like we could flip a switch and all Dockerfiles are using PyPy.
Plus I don’t think AWS Lambda supports it.
If I could go back in time, we would use it from the beginning.
by kzrdude on 7/31/23, 12:18 PM
by radus on 7/31/23, 2:47 PM
by garashovb on 8/1/23, 6:34 AM
by zapregniqp on 8/1/23, 12:38 PM
by woopwoop24 on 7/31/23, 4:43 PM
by czbond on 7/31/23, 5:50 PM
by password4321 on 7/31/23, 5:41 PM
So... thanks for not doing that.
by ComplexSystems on 7/31/23, 3:58 PM
by nurettin on 8/1/23, 6:19 PM
by m_antis89 on 7/31/23, 2:44 PM
by andrewstuart on 7/31/23, 11:49 AM
I don’t use it.
Why would I use it, what’s the compelling benefit?
by ceeam on 7/31/23, 12:01 PM