CPython 3.13 went further with an experimental copy-and-patch JIT compiler -- a lightweight JIT that stitches together pre-compiled machine code templates instead of generating code from scratch. It's not a full optimizing JIT like V8's TurboFan or a tracing JIT like PyPy's;
Good news. Python 3.15 adapts Pypy tracing approach to JIT and there are real performance gains now:
I've been in the pandas (and now polars world) for the past 15 years. Staying in the sandbox gets most folks good enough performance. (That's why Python is the language of data science and ML).
I generally teach my clients to reach for numba first. Potentially lots of bang for little buck.
One overlooked area in the article is running on GPUs. Some numpy and pandas (and polars) code can get a big speedup by using GPUs (same code with import change).
reppletoday at 3:34 PM
Significant AI smell in this write up. As a result, my current reflex is to immediately stop reading. Not judgement on the actual analysis and human effort which went in. It’s just that the other context is missing.
seanwilsontoday at 1:31 PM
> The real story is that Python is designed to be maximally dynamic -- you can monkey-patch methods at runtime, replace builtins, change a class's inheritance chain while instances exist -- and that design makes it fundamentally hard to optimize. ...
> 4 bytes of number, 24 bytes of machinery to support dynamism. a + b means: dereference two heap pointers, look up type slots, dispatch to int.__add__, allocate a new PyObject for the result (unless it hits the small-integer cache), update reference counts.
Would Python be a lot less useful without being maximally dynamic everywhere? Are there domains/frameworks/packages that benefit from this where this is a good trade-off?
I can't think of cases in strong statically typed languages where I've wanted something like monkey patching, and when I see monkey patching elsewhere there's often some reasonable alternative or it only needs to be used very rarely.
elophanto_agenttoday at 5:16 PM
the optimization ladder is just the five stages of grief but for python developers. denial ("it's fast enough"), anger ("why is this so slow"), bargaining ("maybe if I use numpy"), depression ("I should rewrite this in rust"), acceptance ("actually cython is fine")
IshKebabtoday at 8:04 PM
Instead of just using a language that isn't dog slow, why not jump through these 5 different hoops? It's much easier!
rusakov-fieldtoday at 1:59 PM
Python is perfect as a "glue" language. "Inner Loops" that have to run efficiently is not where it shines, and I would write them in C or C++ and patch them with Python for access to the huge library base.
This is the "two language problem" ( I would like to hear from people who extensively used Julia by the way, which claims to solve this problem, does it really ?)
pjmlptoday at 3:18 PM
Kudos for going through all the existing JIT approaches, instead of reaching for rewrite into X straight away.
However if Rust with PyO3 is part of the alternatives, then Boost.Python, cppyy, and pybind11 should also be accounted for, given their use in HPC and HFT integrations.
blttoday at 3:07 PM
Surprised Python is only 21x slower than C for tree traversal stuff. In my experience that's one of the most painful places to use Python. But maybe that's because I use numpy automatically when simple arrays are involved, and there's no easy path for trees.
markisustoday at 4:49 PM
I wish there were more details on this part.
> Missing @cython.cdivision(True) inserts a zero-division check before every floating-point divide in the inner loop. Millions of branches that are never taken.
I thought never taken branches were essentially free. Does this mean something in the loop is messing with the branch predictor?
adsharmatoday at 4:41 PM
Missing: write static python and transpile to rust pyO3 which is at the top of the ladder.
Some nuance: try transpiling to a garbage collected rust like language with fast compilation until you have millions of users.
Also use a combination of neural and deterministic methods to transpile depending on the complexity.
superlopuhtoday at 3:06 PM
Missing Muna[0][1], I'm curious how it would compare on these benchmarks.
I love how in an article about making python faster, the fastest option is to simply write Rust, lol
threethirtytwotoday at 4:26 PM
>The usual suspects are the GIL, interpretation, and dynamic typing. All three matter, but none of them is the real story. The real story is that Python is designed to be maximally dynamic -- you can monkey-patch methods at runtime, replace builtins, change a class's inheritance chain while instances exist -- and that design makes it fundamentally hard to optimize.
ok I guess the harder question is. Why isn't python as fast as javascript?
jahariostoday at 4:54 PM
json.loads is something you don't want to use in a loop if you care for performance at all. Just simple using orjson can give you 3x speed without the need to change anything.
zahlmantoday at 5:18 PM
The replacement of emdashes with double hyphens here is almost insulting. A look through the blog history suggests that the author has no issue writing in English normally, and nothing seems really off about the actual findings here (or even the speculation about causes etc.), so I really can't understand the motivation for LLM-generated prose. (The author's usual writing style appears to have some arguable LLM-isms, but they make a lot more sense in context and of course those patterns had to come from somewhere. The overall effect is quite different.)
Edit: it's strange to get downvoted while also getting replies that agree with me and don't seem to object.
(Also, I thought it wasn't supposed to be possible to edit after getting a reply?)
retsibsitoday at 2:21 PM
A personal opinion: I would much prefer to read the rough, human version of this article than this AI-polished version. I'm interested in the content and the author clearly put thought and effort into it, but I'm constantly thrown out of it by the LLM smell. (I'm also a bit mad that `--` is now on the em dash treadmill and will soon be unusable.)
I'm not just saying this to vent. I honestly wonder if we could eventually move to a norm where people publish two versions of their writing and allow the reader to choose between them. Even when the original is just a set of notes, I would personally choose to make my own way through them.
kelvinjps10today at 1:55 PM
Great post saved it for when I need to optimize my python code
arlattimoretoday at 2:37 PM
What a great article!
Mawrtoday at 3:53 PM
Shockingly good article — correct identification of the root cause of performance issues being excessive dynamism and ranking of the solutions based on the value/effort ratio. Excellent taste. Will keep this in my back pocket as a quick Python optimization reference.
It's just somewhat unfortunate that I have to question every number and fact presented since the writing was clearly at least somewhat AI-assisted with the author seemingly not being upfront about that at all.
perching_aixtoday at 4:24 PM
> language slow
> looks inside
> the reference implementation of language is slow
Despite its content, this blogpost also pushes this exact "language slow" thinking in its preamble. I don't think nearly enough people read past introductions for that to be a responsible choice or a good idea.
The only thing worse than this is when Python specifically is outright taught (!) as an "interpeted language", as if an implementation-detail like that was somehow a language property. So grating.
skeledrewtoday at 4:36 PM
I must admit that I'm amused by the people who find the writeup useful but are turned off by the AI "smell". And look forward to the day when all valued content reeks of said "smell"; let's see what detractors-for-no-good-reason do then (yes I'm a bit ticked by the attitude).