Leanstral 1.5: Proof abundance for all

327 points - yesterday at 10:33 PM

Comments

InsideOutSanta today at 6:40 AM

There's a lot of criticism of Mistral being unable to compete with large model, and that's fair. But I think it dismisses what Mistral is actually doing, which is making specific capabilities available at high quality in tiny models.

I do a lot of OCR, file analysis, stuff like that. I use Mistral for that. I put 100$ into my account, and it just runs for a year without any worries about the amount of requests I make, because the cost is minuscule. That's valuable, even if it doesn't compete with Opus 4.8.

boulos yesterday at 11:24 PM

This is nice work, but I found the bug finding example to be weird:

> One such bug was in the sign function for zigzag decoding of the datrs/varinteger library. On input Std.U64.MAX, the expression (value + 1) overflowed, causing crashes in debug mode and silent corruption in release mode—an edge case that testing and fuzzing would typically miss.

In what way would this boundary condition case be considered something that "testing [...] would typically miss"? It's certainly something that bad tests would miss or not think about, but I find that (a) careful people and (b) ML coding systems are actually really good at "oh, I should test the extreme values". Especially for things that parse user input.

I'm curious if they found other bugs that were more interesting, but found them too hard to explain quickly.

andai today at 1:30 AM

Halfway thru the article it shows a comparison with several frontier-ish LLMs. But they're all from half a year ago. "Our new model is better than all these Chinese models from 3 generations ago" is pretty funny to me.

Groxx today at 1:18 AM

>One such bug was in the sign function for zigzag decoding of the datrs/varinteger library. On input Std.U64.MAX, the expression (value + 1) overflowed, causing crashes in debug mode and silent corruption in release mode—an edge case that testing and fuzzing would typically miss.

that library is: https://github.com/datrs/varinteger

it seems probably correct, as there's an identical issue filed on that repo a week before this was published: https://github.com/datrs/varinteger/issues/8 (is this a leanstral employee? they have almost no info and only very sparse activity. or did leanstral perhaps just pick up this issue?)

it's a tiny, surprisingly-poorly tested, long-untouched (8y) library: https://github.com/datrs/varinteger/blob/master/tests/test.r... that has about 1k downloads per day: https://crates.io/crates/varinteger [1] which seems rather low.

I don't think I'd consider that such a smashing success that it's worth bringing up as the sole example tbh. though automated detection is certainly useful. or is this a noteworthy accomplishment for this sub-field? I haven't played with proof-writing LLMs, but given the paucity of training data I wouldn't be surprised if they're a bit rough compared to general coding.

1: https://crates.io/crates/varinteger lists it as https://github.com/mafintosh/varinteger-rs which redirects to https://github.com/datrs/varinteger , so despite looking different at a glance it does appear to be the same library

raphinou today at 6:09 AM

Can this be useful for someone with no prior knowledge of lean? I'd like to verify a software I'm working on, but I have no experience in formal verification. Can I get useful result with the spec, the code and some (limited) learning time on my side?

RossBencina today at 1:39 AM

Curious that they are pitching Lean 4 for formal verification. I thought that this was more the domain of Isabelle/HOL and TLA+. At least I would have expected a model trained at using all three. Maybe also Isabell/Isar, which seems preferable for forward derivations in linear algebra. Could anyone shed some light on this?

andai today at 1:22 AM

Discussed the other day:

https://news.ycombinator.com/item?id=48738938

easygenes today at 7:01 AM

Was fun to see their developers make nods to Le Chaton Fat in the announcements for this on Twitter.

I suspect a true "big new general-purpose" model is around the corner from them, whether or not they were in on Le Chaton Fat for real. They've mentioned it after the media circus. Hopefully more creatively named than just "Large 4".

satvikpendem yesterday at 11:34 PM

I also submitted the HuggingFace link itself here: https://news.ycombinator.com/item?id=48779902

henryrobbins00 today at 12:38 AM

Try out Leanstral 1.5 on the latest version of OpenATP! OpenATP is an open-source Python package and CLI for agentic automated theorem provers. It natively supports running provers locally in Docker or remotely in Modal sandboxes.

GitHub: https://github.com/henryrobbins/open-atp

Docs: https://open-atp.henryrobbins.com

strongly-typed today at 1:01 AM

Lean is such a wonderful language. So hyped by these releases.

nullc yesterday at 11:44 PM

It would be nice if special purpose models provided a some diverse examples of exactly the input required to get its expected performance on a mix of problem types. Maybe also a document intended for LLMs to read that advises on prompt construction.

I've found that you can get wildly different quality results from these sorts of models due to seemingly insignificant differences in prompt construction. It would be much easier to guess at what it wants if I could just see some RL transcripts -- and so the model author is in a much better position to provide initial advice.

ChrisArchitect today at 1:45 AM

[dupe] https://news.ycombinator.com/item?id=48738938

TokenLens today at 1:31 PM

[dead]

yashthakker today at 1:43 AM

[flagged]

moonset today at 12:37 AM

I gave Codex with GPT-5.5 High this prompt:

    Identify bugs in [datrs/varinteger](https://github.com/datrs/varinteger) . Do NOT look at the GitHub issues, just inspect the source

It also found the bug that Leanstral 1.5 found and the authors highlighted. I think this bug wasn't especially tricky; it's just a case of too few eyeballs on this repo.

Congrats on the release regardless! Excited for the direction Lean + automated AI proofs are headed.

Disclosure: I work at OpenAI.

zuzululu today at 2:20 AM

I applaud mistral's efforts but reading this release made me realize that Europe is far far behind and that once the gap is solidified I don't think its recoverable in the same way Canada's brain drain had on its economy

The best and the brightest from Europe have no incentive to build in Europe when they can do it in America and be compensated and treated far better