TurboQuant: A first-principles walkthrough

199 points - today at 1:54 AM

Comments

amitport today at 4:01 AM

TurboQuant is a restricted version of EDEN quantization (NeurIPS 21, ICML 22). It lacks the optimal scale derivations, which makes the TurboQuant variant considerably less accurate than those works. We show this thoroughly in a new note at https://arxiv.org/abs/2604.18555.

We were the first to introduce post-rotation distribution-aware quantization in 2021. This was later implemented in many fields, including federated learning, vector retrieval, databases, inference engines, and KV-cache.

It would be appropriate to receive credit for this. Furthermore, it is baffling to see the name "TurboQuant" repeated in this context, considering the many works published from 2021 onwards.

The blog post mentioned above essentially guides you through EDEN quantization but ultimately settles on a sub-optimal MSE-minimizing version and an unbiasing trick. This trick often costs a full bit more than DRIVE/EDEN requires to achieve the same results using the unbiasing scale shown in the original 2021 paper.

mskkm today at 8:20 AM

The public comments on Openreview now include explicit allegations that the TurboQuant paper knowingly misrepresented RaBitQ and understated RaBitQ’s results. The RaBitQ authors also report in a technical note that several of TurboQuant’s runtime and recall numbers do not reproduce from the released code under the paper’s stated setup. In the note, TurboQuant generally loses to RaBitQ: https://arxiv.org/abs/2604.19528. If these public allegations hold up, then this is not just overhype or sloppy citation practice, but points to a distorted comparison and benchmark claims that do not survive reproduction.

linuxhansl today at 3:22 AM

I am fascinated by this and similar research (RotorQuant, etc). It seem by next year we will be able to run this year's largest models on last year's hardware. :)

Maybe we won't need as many data centers and as much power as we thought. Maybe we can run more powerful models locally.

treexs today at 6:49 AM

I feel like I've gotten really good at noticing which model generates what type of site and this oozes codex

gcr today at 11:33 AM

On TheTom’s llama-cpp fork, TurboQuant makes inference about five to ten times slower than vanilla (M1 Max, qwen3.6-35b-a3b). Seems like the productionization is still a ways away.

jarbus today at 4:18 AM

This is incredible. Interactive demos like this make mathematics 10x more accessible

deleted today at 7:16 AM

nafistiham today at 8:10 AM

Thanks a lot. It helped me get a much more detailed view of turboquant than a few youtube videos that I watched. Also, the choice of color is excellent as it serves both light and dark mode. I'll try to use it in my sites. Kudos!

vb-8448 today at 8:41 AM

what did the author used to create the site?

sirluky today at 8:32 AM

xcz

marlburrow today at 8:04 AM

[dead]

deleted today at 7:50 AM

jiusanzhou today at 7:01 AM

[dead]

TranspectiveDev today at 3:26 AM

[dead]

iggerews today at 3:38 AM

[dead]

semiinfinitely today at 5:52 AM

"AI vectors"