I recently had to script reading a large Excel XLSB file. Using pyxlsb it took about two minutes. I found an alternative library with significally better performance - python-calamine, but this one reads all the data to memory consuming GBs of RAM, so was a no starter. Then I tried PyPy and miraculously the same script with pyxlsb takes 15 seconds.
Looks like Django is insanely faster under PyPy. Feels like a potential waste not to use PyPy on a deployed web app in most cases. I wonder how FastAPI scales with PyPy and other Python interpreters.
At least for me / my former $WORK, the answer was "definitely yes we used it in deployed applications", we saved more than $1M (this in ~2015) annually in infrastructure costs by quite literally switching from CPython to PyPy and never really looked back, but obviously there are many considerations involved there that are going to be specific to your (company's) application(s) and infrastructure.
Note they're comparing to CPython v3.7 and while https://speed.python.org doesn't go back to 3.7, the improvements between 3.8 to 3.12 are pretty massive.
I don't doubt PyPy is faster than CPython, but it would be very interesting to see latest PyPy compared to latest CPython.
I used it at my old job specifically to speed up a Django application.
After a bunch of profiling, I narrowed down a bunch of our performance problems to the ORM doing a bunch of __setattr__ (or something, it was a long time ago).
We could have started rewriting everything and using .values() (or something, basically get back tuples instead of python objects) but switching to PyPy got the performance to good enough without having to sacrifice DX.
It's been a long time, so I'm a bit fuzzy on the details, but I definitely recommend PyPy for Django unless there's some reason you can't use it.
I've put a bit of effort into trying to make it work with the python apps we write at my company. We have a lot of electrical engineers who only write python and we're butting up against the limit of what the language can do as far as the data we need to process..
It's worked well in some cases, but the main situation where we needed it to work was with a GUI application where there was no separation between computation and the GUI elements.. I tried really hard, and got frustratingly close, but could not build wxPython that would work using pypy.
But it's super cool imo, as someone who has to write/maintain python code it's awesome to see some innovative work to make the language faster.
Well the thing is that it's code that's already been written, and honestly is kind of a mess. It would take more effort than it's worth to try and rewrite using a different GUI framework.
I much prefer Qt but this was written before I started here so it is what it is :/
The longer the test runs we end up losing data because they're looping over a dataframe that keeps growing and growing and the program just can't finish that loop in the 2 seconds that it has to run. But it's not a big enough deal for us to fix
> Extends QTableWidget with some useful functions for automatic data handling and copy / export context menu. Can automatically format and display a variety of data types (see setData() for more information
pyqtgraph mentions CUDA and NumPy support, but not pandas. Hard to believe there's not yet specific support for drawing spreadsheets of pandas dataframes in qt or pyqtgraph yet.
dask.DataFrame and dask-CuDF also support CUDA. sympy's lambdify function compiles readable symbolic algebra to fast code for various libraries and GPUs.
The Dataframe Protocol __dataframe__ interface spec Purpose and Scope doc mentions the NumPy __array_interface__ protocol; which may do sufficient auto-casting to a NumPy array before trying to draw a data table with formatting derived from then-removed columnar metadata:
https://data-apis.org/dataframe-protocol/latest/purpose_and_...
Practically, pd.DataFrame(,dtype_backend="arrow") may be a quick performance boost.
Jupyter kernels with Papermill or cron and templated parameters at the top of a notebook with a dated filename in a (repo2docker compatible) git repo also solve for interactive reports. To provision a temporary container for a user editing report notebooks, there's Voila on binderhub / jupyterhub, https://github.com/binder-examples/voila
But that's not a GUI, that's notebooks. For Jupyter integration, TIL pyqtgraph has jupyter_rfb, Remote Frame Buffer: https://github.com/vispy/jupyter_rfb
I run it in a production environment (side project).
I also use it locally when developing when necessary.
It really does speed up loops by 5x or so.
So when you're trying to say... test 100 million+ iterations of something, pypy will run that in something like 2 minutes versus cpython can take me 15 minutes.
Honestly it's an amazing performance gain for 0 effort, and I have yet to run into a limitation with it.
I used it in production for a while, but it caused instability with pandas and often froze so I had to take it out. It does have some serious speed benefits for simple / pure-Python without compiled libraries.
I think you have to start your Django project with PyPy, it's always a gamble which dependencies will work with it, and for a project already going the chances are that something won't work.
If you find something like that it's worth reporting, there are less of them over time.
In my specific circumstance it behaved like a statically linked, one-.tar-and-ready mechanism to get python onto Flatcar (nee CoreOS, but not the modern one). So my case for it wasn't speed it was the dead simple deploy
A JIT is now available in CPython main, it's not that performant yet so won't be turned on for Python 3.13 by default, informational PEP is here (still being reviewed, check the discourse thread for more details): https://peps.python.org/pep-0744/
Definitly, one hardned lesson from my Tcl days, on the startup that used a AOLServer like product, was to never again use dynamic languages without any sort of JIT/AOT toolchains for production delivery of products, only scripting stuff.
Apparently it took Microsoft and Facebook to actually change the minds of CPython core team, however what is coming on 3.13 is only the start.
Is there an overview of the user share of PyPy vs CPython? I have the feeling that PyPy usage became less in the recent years.
How well does PyPy work together with frameworks like PyTorch, JAX, TensorFlow, etc? I know there has been some work to support NumPy, but I guess all these other frameworks are much more relevant nowadays.
Does PyPy still release 2.7 because RPython is still based on it?
I was recently trying to play with RPython for the first time, and having to remember all the python 2 vs python 3 differences felt strange, and very retro.
Developing / helping contribute to PyPy may mean touching 2.7 in its toolchain which is used to implement it, but no, users using it have been able to use 3.x for a very very long time now.
You'll note I said I was playing with RPython, not PyPy. In my case, I was playing with writing a small interpreter, and comparing the RPython toolchain with the Truffle/Graal framework.
Writing RPython code, even if one is not developing or contributing to PyPy, means writing within a subset of python 2.
> RPython ("Restricted Python") is a subset of Python 2
... so getting the RPython toolchain (even if one is intending to improve the PyPy 3+ interpreters) requires setting up a pypy 2 interpreter. Hence the question in my post.
We try to keep around useful versions of Python3, based on what wheels packagers make available. NumPy<2 provides PyPy3.9 wheels, and we have PyPy3.10 ready. Now that NumPy has moved to PyPy3.10 wheels, we will probably drop 3.9. Help is needed to move forward to 3.11/3.12.
I recently had to script reading a large Excel XLSB file. Using pyxlsb it took about two minutes. I found an alternative library with significally better performance - python-calamine, but this one reads all the data to memory consuming GBs of RAM, so was a no starter. Then I tried PyPy and miraculously the same script with pyxlsb takes 15 seconds.
I never really did much with PyPy, do people mostly use it in a deployed application setting? I ask because looking over at the PyPy Speed page...
https://speed.pypy.org/
Looks like Django is insanely faster under PyPy. Feels like a potential waste not to use PyPy on a deployed web app in most cases. I wonder how FastAPI scales with PyPy and other Python interpreters.
At least for me / my former $WORK, the answer was "definitely yes we used it in deployed applications", we saved more than $1M (this in ~2015) annually in infrastructure costs by quite literally switching from CPython to PyPy and never really looked back, but obviously there are many considerations involved there that are going to be specific to your (company's) application(s) and infrastructure.
Note they're comparing to CPython v3.7 and while https://speed.python.org doesn't go back to 3.7, the improvements between 3.8 to 3.12 are pretty massive.
I don't doubt PyPy is faster than CPython, but it would be very interesting to see latest PyPy compared to latest CPython.
I used it at my old job specifically to speed up a Django application.
After a bunch of profiling, I narrowed down a bunch of our performance problems to the ORM doing a bunch of __setattr__ (or something, it was a long time ago).
We could have started rewriting everything and using .values() (or something, basically get back tuples instead of python objects) but switching to PyPy got the performance to good enough without having to sacrifice DX.
It's been a long time, so I'm a bit fuzzy on the details, but I definitely recommend PyPy for Django unless there's some reason you can't use it.
I've put a bit of effort into trying to make it work with the python apps we write at my company. We have a lot of electrical engineers who only write python and we're butting up against the limit of what the language can do as far as the data we need to process..
It's worked well in some cases, but the main situation where we needed it to work was with a GUI application where there was no separation between computation and the GUI elements.. I tried really hard, and got frustratingly close, but could not build wxPython that would work using pypy.
But it's super cool imo, as someone who has to write/maintain python code it's awesome to see some innovative work to make the language faster.
Gtk could be an option it used to not work well at all, but since they improved their C binding support it got better.
Good to know, thanks!
I havent used it in a long time but did you try Qt as well? PySide worked insanely nicely for me. Although this wasnt with PyPy to be fair.
Well the thing is that it's code that's already been written, and honestly is kind of a mess. It would take more effort than it's worth to try and rewrite using a different GUI framework.
I much prefer Qt but this was written before I started here so it is what it is :/
The longer the test runs we end up losing data because they're looping over a dataframe that keeps growing and growing and the program just can't finish that loop in the 2 seconds that it has to run. But it's not a big enough deal for us to fix
FWICS pyqtwebgraph has a TableWidget: https://pyqtgraph.readthedocs.io/en/latest/api_reference/wid... https://github.com/pyqtgraph/pyqtgraph/blob/master/pyqtgraph... :
> Extends QTableWidget with some useful functions for automatic data handling and copy / export context menu. Can automatically format and display a variety of data types (see setData() for more information
DataTreeWidget handles nested structs: https://pyqtgraph.readthedocs.io/en/latest/api_reference/wid...
SO says it's better to extend Qt QTableView and do pagination for larger datasets: https://stackoverflow.com/questions/61517220/pyqt-pandas-fas... https://www.pythonguis.com/tutorials/qtableview-modelviews-n...
pyqtgraph mentions CUDA and NumPy support, but not pandas. Hard to believe there's not yet specific support for drawing spreadsheets of pandas dataframes in qt or pyqtgraph yet.
dask.DataFrame and dask-CuDF also support CUDA. sympy's lambdify function compiles readable symbolic algebra to fast code for various libraries and GPUs.
The Dataframe Protocol __dataframe__ interface spec Purpose and Scope doc mentions the NumPy __array_interface__ protocol; which may do sufficient auto-casting to a NumPy array before trying to draw a data table with formatting derived from then-removed columnar metadata: https://data-apis.org/dataframe-protocol/latest/purpose_and_...
Practically, pd.DataFrame(,dtype_backend="arrow") may be a quick performance boost.
Jupyter kernels with Papermill or cron and templated parameters at the top of a notebook with a dated filename in a (repo2docker compatible) git repo also solve for interactive reports. To provision a temporary container for a user editing report notebooks, there's Voila on binderhub / jupyterhub, https://github.com/binder-examples/voila
or repo2jupyterlite in WASM with MathTex and Pyodide's NumPy/pandas/sympy: https://github.com/jupyterlite/repo2jupyterlite
But that's not a GUI, that's notebooks. For Jupyter integration, TIL pyqtgraph has jupyter_rfb, Remote Frame Buffer: https://github.com/vispy/jupyter_rfb
I run it in a production environment (side project). I also use it locally when developing when necessary.
It really does speed up loops by 5x or so.
So when you're trying to say... test 100 million+ iterations of something, pypy will run that in something like 2 minutes versus cpython can take me 15 minutes.
Honestly it's an amazing performance gain for 0 effort, and I have yet to run into a limitation with it.
I used it in production for a while, but it caused instability with pandas and often froze so I had to take it out. It does have some serious speed benefits for simple / pure-Python without compiled libraries.
I think you have to start your Django project with PyPy, it's always a gamble which dependencies will work with it, and for a project already going the chances are that something won't work.
If you find something like that it's worth reporting, there are less of them over time.
Awesome I will keep this in mind since I am working on a new Django project, thanks!
In my specific circumstance it behaved like a statically linked, one-.tar-and-ready mechanism to get python onto Flatcar (nee CoreOS, but not the modern one). So my case for it wasn't speed it was the dead simple deploy
Still crazy to me that Python is this popular in all sorts of production uses without a JIT reference implementation.
A JIT is now available in CPython main, it's not that performant yet so won't be turned on for Python 3.13 by default, informational PEP is here (still being reviewed, check the discourse thread for more details): https://peps.python.org/pep-0744/
Definitly, one hardned lesson from my Tcl days, on the startup that used a AOLServer like product, was to never again use dynamic languages without any sort of JIT/AOT toolchains for production delivery of products, only scripting stuff.
Apparently it took Microsoft and Facebook to actually change the minds of CPython core team, however what is coming on 3.13 is only the start.
Is there an overview of the user share of PyPy vs CPython? I have the feeling that PyPy usage became less in the recent years.
How well does PyPy work together with frameworks like PyTorch, JAX, TensorFlow, etc? I know there has been some work to support NumPy, but I guess all these other frameworks are much more relevant nowadays.
"Writing extension modules for pypy" https://doc.pypy.org/en/latest/extending.html ; CFFI, ~ ctypes (libffi), cppyy, or RPython
Does PyPy still release 2.7 because RPython is still based on it?
I was recently trying to play with RPython for the first time, and having to remember all the python 2 vs python 3 differences felt strange, and very retro.
Developing / helping contribute to PyPy may mean touching 2.7 in its toolchain which is used to implement it, but no, users using it have been able to use 3.x for a very very long time now.
You'll note I said I was playing with RPython, not PyPy. In my case, I was playing with writing a small interpreter, and comparing the RPython toolchain with the Truffle/Graal framework.
Writing RPython code, even if one is not developing or contributing to PyPy, means writing within a subset of python 2.
> RPython ("Restricted Python") is a subset of Python 2
https://www.pypy.org/posts/2022/04/how-is-pypy-tested.html
And RPython's translator specifically uses pypy, and uses python 2 syntax:
https://github.com/pypy/pypy/blob/main/rpython/bin/rpython#L...
... so getting the RPython toolchain (even if one is intending to improve the PyPy 3+ interpreters) requires setting up a pypy 2 interpreter. Hence the question in my post.
Too bad they don't compile to wasm. Shouldn't be tooo hard.
Why do they have 3.9 and 3.10 is it their policy to have two previous versions for every release?
We try to keep around useful versions of Python3, based on what wheels packagers make available. NumPy<2 provides PyPy3.9 wheels, and we have PyPy3.10 ready. Now that NumPy has moved to PyPy3.10 wheels, we will probably drop 3.9. Help is needed to move forward to 3.11/3.12.