GPT-5.2 derives a new result in theoretical physics

265 points - today at 7:20 PM

Source

Comments

outlace today at 7:36 PM
The headline may make it seem like AI just discovered some new result in physics all on its own, but reading the post, humans started off trying to solve some problem, it got complex, GPT simplified it and found a solution with the simpler representation. It took 12 hours for GPT pro to do this. In my experience LLM’s can make new things when they are some linear combination of existing things but I haven’t been to get them to do something totally out of distribution yet from first principles.
Davidzheng today at 7:36 PM
"An internal scaffolded version of GPT‑5.2 then spent roughly 12 hours reasoning through the problem, coming up with the same formula and producing a formal proof of its validity."

When I use GPT 5.2 Thinking Extended, it gave me the impression that it's consistent enough/has a low enough rate of errors (or enough error correcting ability) to autonomously do math/physics for many hours if it were allowed to [but I guess the Extended time cuts off around 30 minute mark and Pro maybe 1-2 hours]. It's good to see some confirmation of that impression here. I hope scientists/mathematicians at large will be able to play with tools which think at this time-scale soon and see how much capabilities these machines really have.

square_usual today at 8:20 PM
It's interesting to me that whenever a new breakthrough in AI use comes up, there's always a flood of people who come in to handwave away why this isn't actually a win for LLMs. Like with the novel solutions GPT 5.2 has been able to find for erdos problems - many users here (even in this very thread!) think they know more about this than Fields medalist Terence Tao, who maintains this list showing that, yes, LLMs have driven these proofs: https://github.com/teorth/erdosproblems/wiki/AI-contribution...
cpard today at 8:49 PM
AI can be an amazing productivity multiplier for people who know what they're doing.

This result reminded me of the C compiler case that Anthropic posted recently. Sure, agents wrote the code for hours but there was a human there giving them directions, scoping the problem, finding the test suites needed for the agentic loops to actually work etc etc. In general making sure the output actually works and that it's a story worth sharing with others.

The "AI replaces humans in X" narrative is primarily a tool for driving attention and funding. It works great for creating impressions and building brand value but also does a disservice to the actual researchers, engineers and humans in general, who do the hard work of problem formulation, validation and at the end, solving the problem using another tool in their toolbox.

nilkn today at 8:03 PM
It would be more accurate to say that humans using GPT-5.2 derived a new result in theoretical physics (or, if you're being generous, humans and GPT-5.2 together derived a new result). The title makes it sound like GPT-5.2 produced a complete or near-complete paper on its own, but what it actually did was take human-derived datapoints, conjecture a generalization, then prove that generalization. Having scanned the paper, this seems to be a significant enough contribution to warrant a legitimate author credit, but I still think the title on its own is an exaggeration.
Insanity today at 7:28 PM
They also claimed ChatGPT solved novel erdös problems when that wasn’t the case. Will take with a grain of salt until more external validation happened. But very cool if true!
mym1990 today at 9:14 PM
Many innovations are built off cross pollination of domains and I think we are not too far off from having a loop where multiple agents grounded very well in specific domains can find intersections and optimizations by communicating with each other, especially if they are able to run for 12+ hours. The truth is that 99% of attempts at innovation will fail, but the 1% can yield something fantastic, the more attempts we can take, the faster progress will happen.
elashri today at 7:48 PM
I would be less interested in scattering amplitude of all particle physics concepts as a test case because the scattering amplitudes because it is one of the concisest definition and its solution is straightforward (not easy of course). So once you have a good grasp of the QM and the scattering then it is a matter of applying your knowledge of math to solve the problem. Usually the real problem is to actually define your parameters from your model and define the tree level calculations. Then for LLM to solve these it is impressive but the researchers defined everything and came up with the workflow.

So I would read this (with more information available) with less emphasize on LLM discovering new result. The title is a little bit misleading but actually "derives" being the operative word here so it would be technically correct for people in the field.

another_twist today at 9:53 PM
Thats great. I think we need to start researching how to get cheaper models to do math. I have a hunch it should be possible to get leaner models to achieve these results with the right sort of reinforcement learning.
major4x today at 9:54 PM
crorella today at 7:31 PM
vbarrielle today at 8:29 PM
I' m far from being an LLM enthusiast, but this is probably the right use case for this technology: conjectures which are hard to find, but then the proof can be checked with automated theorem provers. Isn't it what AlphaProof does by the way?
emp17344 today at 8:22 PM
Cynically, I wonder if this was released at this time to ward off any criticism from the failure of LLMs to solve the 1stproof problems.
getnormality today at 9:57 PM
I'll believe it when someone other than OpenAI says it.

Not saying they're lying, but I'm sure it's exaggerated in their own report.

pruufsocial today at 7:35 PM
All I saw was gravitons and thought we’re finally here the singularity has begun
snarky123 today at 7:38 PM
So wait,GPT found a formula that humans couldn't,then the humans proved it was right? That's either terrifying or the model just got lucky. Probably the latter.
deleted today at 7:39 PM
baalimago today at 8:10 PM
Well, anyone can derive a new result in anything. Question is most often if the result makes any sense
sfmike today at 9:00 PM
5.2 is the best model on the market.
PlatoIsADisease today at 8:44 PM
I'll read the article in a second, but let me guess ahead of time: Induction.

Okay read it: Yep Induction. It already had the answer.

Don't get me wrong, I love Induction... but we aren't having any revolutions in understanding with Induction.

ares623 today at 8:21 PM
I guess the important question is, is this enough news to sustain OpenAI long enough for their IPO?
gaigalas today at 8:09 PM
I like the use of the word "derives". However, it gets outshined by "new result" in public eyes.

I expect lots of derivations (new discoveries whose pieces were already in place somewhere, but no one has put them together).

In this case, the human authors did the thinking and also used the LLM, but this could happen without the original human author too (some guy posts some partial on the internet, no one realizes is novel knowledge, gets reused by AI later). It would be tremendously nice if credit was kept in such possible scenarios.

deleted today at 7:56 PM
vonneumannstan today at 7:32 PM
Interesting considering the Twitter froth recently about AI being incapable in principle of discovering anything.
mrguyorama today at 8:41 PM
Don't lend much credence to a preprint. I'm not insinuating fraud, but plenty of preprints turn out to be "Actually you have a math error here", or are retracted entirely.

Theoretical physics is throwing a lot of stuff at the wall and theory crafting to find anything that might stick a little. Generation might actually be good there, even generation that is "just" recombining existing ideas.

I trust physicists and mathematicians to mostly use tools because they provide benefit, rather than because they are in vogue. I assume they were approached by OpenAI for this, but glad they found a way to benefit from it. Physicists have a lot of experience teasing useful results out of probabilistic and half broken math machines.

If LLMs end up being solely tools for exploring some symbolic math, that's a real benefit. Wish it didn't involve destroying all progress on climate change, platforming truly evil people, destroying our economy, exploiting already disadvantaged artists, destroying OSS communities, enabling yet another order of magnitude increase in spam profitability, destroying the personal computer market, stealing all our data, sucking the oxygen out of investing into real industry, and bold faced lies to all people about how these systems work.

Also, last I checked, MATLAB wasn't a trillion dollar business.

Interestingly, the OpenAI wrangler is last in the list of Authors and acknowledgements. That somewhat implies the physicists don't think it deserves much credit. They could be biased against LLMs like me.

When Victor Ninov (fraudulently) analyzed his team's accelerator data using an existing software suite to find a novel SuperHeavy element, he got first billing on the authors list. Probably he contributed to the theory and some practical work, but he alone was literate in the GOOSY data tool. Author lists are often a political game as well as credit, but Victor got top billing above people like his bosses, who were famous names. The guy who actually came up with the idea of how to create the element, in an innovative recipe that a lot of people doubted, was credited 8th

https://journals.aps.org/prl/abstract/10.1103/PhysRevLett.83...

brcmthrowaway today at 7:38 PM
End times approach..
starkeeper today at 7:40 PM
[flagged]
baggachipz today at 7:36 PM
[flagged]
starkeeper today at 7:48 PM
[flagged]
longfacehorrace today at 8:00 PM
Car manufacturers need to step up their hype game...

New Honda Civic discovered Pacific Ocean!

New F150 discovers Utah Salt Flats!

Sure it took humans engineering and operating our machines, but the car is the real contributor here!