A recent experience with ChatGPT 5.5 Pro

353 points - today at 2:41 AM

Comments

ziotom78 today at 6:50 AM

I am a physics professor and often use Gemini to check my papers. It is a formidable tool: it was able to find a clerical error (a missing imaginary unit in a complex mathematical expression) I was not able to find for days, and it often underlines connections between concepts and ideas that I overlooked.

However, it often makes conceptual errors that I can spot only because I have good knowledge of the topic I am discussing. For instance, in 3D Clifford algebras it repeatedly confuses exponential of bivectors and of pseudoscalars.

Good to know that ChatGPT 5.5 Pro can produce a publishable paper, but from what I have seen so far with Gemini, it seems to me that it is better to consider LLMs as very efficient students who can read papers and books in no time but still need a lot of mentoring.

pmontra today at 5:31 AM

It's a very long post with a mix of technical (math) and philosophical sections. Here are the most striking points to reflect upon IMHO.

> It seems to me that training beginning PhD students to do research [...] has just got harder, since one obvious way to help somebody get started is to give them a problem that looks as though it might be a relatively gentle one. If LLMs are at the point where they can solve “gentle problems”, then that is no longer an option. The lower bound for contributing to mathematics will now be to prove something that LLMs can’t prove, rather than simply to prove something that nobody has proved up to now and that at least somebody finds interesting.

Training must start from the basics though. Of course everybody's training in math starts with summing small integers, which calculators have been doing without any mistake since a long time.

The point is perhaps confirmed by another comment further down in the post

> by solving hard problems you get an insight into the problem-solving process itself, at least in your area of expertise, in a way that you simply don’t if all you do is read other people’s solutions. One consequence of this is that people who have themselves solved difficult problems are likely to be significantly better at using solving problems with the help of AI, just as very good coders are better at vibe coding than not such good coders

People pay coders to build stuff that they will use to make money and I can happily use an AI to deliver faster and keep being hired. I'm not sure if there is a similar point with math. Again from the post

> suppose that a mathematician solved a major problem by having a long exchange with an LLM in which the mathematician played a useful guiding role but the LLM did all the technical work and had the main ideas. Would we regard that as a major achievement of the mathematician? I don’t think we would.

mxwsn today at 6:05 AM

> Here’s a thought experiment: suppose that a mathematician solved a major problem by having a long exchange with an LLM in which the mathematician played a useful guiding role but the LLM did all the technical work and had the main ideas. Would we regard that as a major achievement of the mathematician? I don’t think we would.

This is a cultural choice. It makes sense that in the mathematics culture we currently have, this is alien. But already, other fields, and many individuals, would disagree and say that the human did have a major achievement here. As long as human-AI collaborations are producing the best results, there is meaningful contribution by the humans, and people that are deeper experts and skilled LLM whisperers should be able to make outsized contributions. The real shoe drops when pure AI beats humans and human-AI collaboration.

few today at 5:32 AM

>So if your aim in doing mathematics is to achieve some kind of immortality, so to speak, then you should understand that that won’t necessarily be possible for much longer — not just for you, but for anybody.

This made me a little sad

NotOscarWilde today at 5:47 AM

As a TCS assistant professor from Eastern Europe, I always am a little jealous of the biggest names in math having such an easy access to the expensive, long thinking models.

Paying for Pro from any of my current academic budgets is completely ouf of the field of reality here -- all budgets tend to have restricted uses and software payments fit into very few categories. Effectively, I'd have to ask for a brand new grant and hope the grant rules allow for large software payments and I won't encounter an anti-AI reviewer; such a thing would take one year at least.

As a nail to the coffin, I was "denied" all Claude Opus recently as part of Microsoft's clampdown on individual (and academic) use of Copilot.

(Chagpt 5.5 Plus does not seem sufficient for any deeper investigations into new research topics, I've tried.)

Apologies for the rant.

MinimalAction today at 5:50 AM

As a graduate student, this piece made me sad. I always believed that my work speaks for itself and transcends beyond my limited time on this cosmic experience. This notion of immortality was just a small intangible bonus I hoped for when I jumped into grad school. AI is making me feel less worthy.

bustermellotron today at 5:12 AM

I saw Tim Gowers give a talk at the AMS-MAA joint meeting in Seattle about ten years ago where he predicted that in 100 years humans would no longer be doing research mathematics. I wonder if he’s adjusted his timeline.

At the time I thought the key missing tool was a natural language search that acted like mathoverflow, where you could explain your problem or ideas as you understood them and get references to relevant literature (possibly outside your experience or vocabulary).

MrDrDr today at 10:07 AM

> "Even though I can motivate it in retrospect, ChatGPT’s idea to use h^2-dissociated sets to control relations of order at most h feels quite ingenious. As far as I can tell, this idea is completely original."

The question that keep bothering me is can an LLM generate an idea that is truly novel? How would/could that actually happen? But then that leads to the question - what are we actually doing when we think?

Perhaps it's as simple as the ability to just make mistakes that matters, the same things that powers evolution. As long as the LLM can make mistakes, it's capable of generating something genuinely novel. And it can make more mistakes much faster than we can.

kang today at 10:34 AM

> The lower bound for contributing to mathematics will now be to prove something that LLMs can’t prove, rather than simply to prove something that nobody has proved up to now and that at least somebody finds interesting.

5.5pro is amazing but this implication might not be true & is the core argument of this piece.

AI will prove all sort of things - interesting, boring & incorrect.

To sort it will be the task of the PhD.

momojo today at 6:05 AM

Sorry, I'm reposting a comment I made yesterday that seems fitting:

> This reminds me of Antirez's "Don't fall into the anti-AI hype". In a sentence: These foundation models are really good at optimizing these extremely high level, extremely well defined problem spaces (ie multiply matrices faster). In Antirez's case, it's "make Redis faster".

zkmon today at 10:16 AM

>> but it was definitely a non-trivial extension of those ideas, and for a PhD student to find that extension it would be necessary to invest quite a bit of time digesting Isaac’s paper

The "non-trivial" is for human abilities. The weights lifted by a crane are also "non-trivial". People keep getting amazed at machine's abilities. Just like a radio telescope can see things humans can't, microscope can see the detail humans can't, we need not be amazed. The sensory perception of patterns is at different level for AI. It's a machine.

iandanforth today at 10:30 AM

I found the section on publishing very interesting. Even if the quality of the output is up to snuff, where should it go? Arxiv doesn't allow AI written work. The author proposes that only work that has been certified by human should be published. However, now the field is in the same boat as software engineering where we are facing a glut of pull requests and not enough time and people to review them.

dabinat today at 6:32 AM

I feel like this experiment was successful because those prompting the AI were knowledgeable enough to ask the right questions and verify the output was correct. This shows that there is still a place for expertise, even if the LLM does the actual research.

amelius today at 10:00 AM

Makes sense as a mathematician basically has two powers (1) using their intuition and (2) an enormous amount of mental stamina. A mathematician builds their intuition by reading maths books. It is thus not surprising that an LLM is well equipped to take over the tasks of the mathematician.

lysecret today at 8:18 AM

There is a great recent episode of latent space about a similar topic it’s worth a watch even with the click baiti thumbnail and title https://youtu.be/9d899Ram9Bs?is=pQMoVmlWVsTNKfRK

arjie today at 9:07 AM

The question of where the creative input is was a big thing around Experiments in Musical Intelligence and co-composing. But it seems perhaps that it’s a transient state we needn’t spend too much effort it. The machine has failed to disappoint repeatedly. Perhaps this is as far as it gets or perhaps we will be like people in Catching Crumbs by the Table by Ted Chiang where almost all science is interpretation of papers by vastly greater intellects.

iTokio today at 5:27 AM

On complex problems with lengthy proofs, the first step that I would have done is to ask 5.5 pro in a new, unrelated, session, to be very critical, to try to find flaws in the arguments.

And certainly not to send it to a fellow colleague to ask its opinion first.

LLMs are certainly becoming capable to code, find vulnerabilities, solve mathematical problems, but we need to avoid putting their works in production, or in front of other humans, without assessing it by any possible mean.

Otherwise tech leads, maintainers, experts get overwhelmed and this is how the « AI slop » fatigue begins.

To be clear I’m talking about this step:

> That preprint would have been hard for me to read, as that would have meant carefully reading Rajagopal’s paper first, but I sent it to Nathanson, who forwarded it to Rajagopal, who said he thought it looked correct.

fulafel today at 7:04 AM

Link to source blog post: https://gowers.wordpress.com/2026/05/08/a-recent-experience-...

zingar today at 9:29 AM

The post talks about LLM+human contributions being recognized in some different category from human-only. But is it possible to spot the difference between the two?

adammdaw today at 6:05 AM

This is certainly interesting, though I would say that based on my understanding of how the current models work combinatorial problems would be an area where they could be particularly successful. They are pretty good at combinatorial creativity - its the exploratory and transformational aspects that are still pretty tricky, and I expect would come to bear in other areas of mathematics.

casey2 today at 11:04 AM

I think mathematicians like LLMs because this is the first time we have something like a computer for the kinds of math most people do, high level, hand wavy abstractions that are (relatively) easy for people to grok but hard to explain to traditional computers.

ionwake today at 8:33 AM

one thing I was wondering, is, if LLMs are word completions seemingly coming up with new solutions could this just be because stuff that was kept secret and now - is no longer is due to ingestion? I dont know enough about it tho

__rito__ today at 5:58 AM

> So maybe there should be a different repository where AI-produced results can live.

Does the author know about CAISc 2026 [0]?

[0]: https://caisc2026.github.io

incrediblylarge today at 6:03 AM

A month ago my PhD supervisor told me it rips on proofs but he also said it's useless when formalising arguments in Lean - is this still the case?

sexylinux today at 11:25 AM

Unfortunately it still does create errors.

This is of enormous importance but still is being actively ignored by many professionals or dismissed as as a minor issue.

Our emotional human brains are very enthusiastic about these new kind of "intelligent" products ("partners") and we want to believe so hard that they are finally "there" that we tend to ignore how big of a problem it is that LLMs carry a fundamental design problem with them that will make them produce errors even when we use a grotesque amount of resources to build "bigger" versions of them. The potential for errors will never go away with the current AI architecture.

This is a fundamental paradigm shift in computing. Instead of putting a lot of energy into building an architecture that will produce reliable results, we are now maximizing on a system / idea that will never give us 100% reliable results.

Basically it is just a marketing stunt. Probably the computer science guy building it knew very well that he would still need some fundamental break troughs to get to a real product, but the marketing guy saw that there is still potential to make a lot of money by selling a product that will produce correct results only 80% of the time.

The marketing guy was right and marketing is now dominating science, but humanity will pay a big price for that.

Putting enormous amounts of money into a fundamentally flawed system that we can not optimize to produce reliably error free results is just stupid.

The big achievement of "classical" computing is that the results are reliably error free. We have still some known issues eg. with floating point math and bad blocks on disk / bit flipping etc. but these are observable and we can handle / avoid them. Generally "non-ai-computing" was made so reliable, that we can depend on it for many very important things. This came not by accident but was created by a lot of people who put a lot of resources into research to achieve that result.

LLMs introduce a level of uncertainty and unreliability into computing that makes them practically useless.

Because if you have enough knowledge to verify the result and AI is only quicker in producing the result, what is the point then putting so much resources in it (besides making money by re-centralizing computing, of course). Verifying a lot of results that have been produced quicker is still slow, so the people who are now just AI verifiers should just produce the results themselves, makes the whole process quicker.

AI is only of value if it can produce results about things that you or your organization does not know anything about. But these results you can not verify and therefore potentially wrong results can be fatal for you, your organization and all the people that are affected by actions generated based on these wrong results.

Many people have already been killed because decision makers are not able to follow that very simple logic.

So we can still create "interesting and enjoyable results", but finally it is a gigantic miss-allocation of resources of historic idiocy. It fits, of course, very well in a timeline where grifters are on top of societies around the world.

It is a fundamentally wrong path that should not be followed and scientists around the world should articulate exactly that instead of producing marketing blog posts for a system with such fatal inherent issues.

zkmon today at 10:11 AM

CharlesLau today at 5:25 AM

Is the assessment system of undergraduate mathematics education no longer effective?

zuogl today at 7:34 AM

The HTML generation is surprisingly good because the training corpus for markup is cleaner than most programming languages.

globular-toast today at 6:57 AM

I wish people would stop generating stuff they don't understand only to forward it to someone who does. Something about that really rubs me the wrong way.

locknitpicker today at 8:22 AM

From the article:

> Conversely, for problems where one’s initial reaction is to be impressed that an LLM has come up with a clever argument, it often turns out on closer inspection that there are precedents for those arguments, so it is still just about possible to comfort oneself that LLMs are merely putting together existing knowledge rather than having truly original ideas. How much of a comfort that is I will not discuss here, other than to note that quite a lot of perfectly good human mathematics consists in putting together existing knowledge and proof techniques.

This is exactly what leads me to believe that the real impact of LLMs in human history is yet to come. My work as a researcher was mostly spent on two classes of workloads: reading papers that were recently published to gather ideas and keep up with the state of the art, and work on a selection of ideas gathered from said papers to build my research upon. It turns out that LLMs excel at the most critical component of both workloads: parsing existing content and use it when prompting the model to generate additional content based on specific goals and constraints. I mean, papers are already a way to store and distribute context.

adaml_623 today at 7:41 AM

"It is the sort of idea I would be very proud to come up with after a week or two of pondering, and it took ChatGPT less than an hour"

This comment about time is very interesting to me. I know it's "just" doing mathematical proofs but the possibilities of speeding up planning, proposals and decision making in the physical world should excite people.

SubiculumCode today at 7:06 AM

I honestly can't say this isn't AGI anymore. AGI shouldn't be a bar so taboo that it has to be at the extreme capability in every domain. What human is?

This is as AGI as it needs to be to get my vote. And it's scary.

jdw64 today at 5:36 AM

[dead]

shevy-java today at 6:03 AM

[flagged]

verisimi today at 5:33 AM

[flagged]

slopinthebag today at 6:40 AM

AI generated article btw.

Maybe if you find AI to be doing stuff you find impressive, the stuff you were doing wasn't that impressive? Worth ruminating on your priors at least.

bambax today at 7:53 AM

> quite a lot of perfectly good human mathematics consists in putting together existing knowledge and proof techniques

Creativity is connecting ideas from different domains and see if something from one field applies to another. I do think AI is overhyped generally; but a major benefit from AI could be that after ingesting all the existing human knowledge (something no single human can ever hope to achieve) it would "mix and connect" it and come up with novel insights.

Most published research sits ignored and unread; AI can uncover and use everything.

einrealist today at 7:16 AM

"After 16 minutes and 41 seconds, it came back" ... "further 47 minutes and 39 seconds" ... "After 13 minutes and 33 seconds" ... "After 9 minutes and 12 seconds" ... "After 31 minutes and 40 seconds" ... plus other computations

Anyone spotting the issue here? What did that really cost?

I am not against compute being used for scientific or other important problems. We did that before LLMs. However, the major LLM gatekeepers want to make all industries and companies dependent on their models. And, at some point, they need to charge them the actual, unsubsidized costs for the compute. In the meantime, companies restructure in the hopes that the compute costs remain cheap.