A recent experience with ChatGPT 5.5 Pro
353 points - today at 2:41 AM
SourceComments
However, it often makes conceptual errors that I can spot only because I have good knowledge of the topic I am discussing. For instance, in 3D Clifford algebras it repeatedly confuses exponential of bivectors and of pseudoscalars.
Good to know that ChatGPT 5.5 Pro can produce a publishable paper, but from what I have seen so far with Gemini, it seems to me that it is better to consider LLMs as very efficient students who can read papers and books in no time but still need a lot of mentoring.
> It seems to me that training beginning PhD students to do research [...] has just got harder, since one obvious way to help somebody get started is to give them a problem that looks as though it might be a relatively gentle one. If LLMs are at the point where they can solve “gentle problems”, then that is no longer an option. The lower bound for contributing to mathematics will now be to prove something that LLMs can’t prove, rather than simply to prove something that nobody has proved up to now and that at least somebody finds interesting.
Training must start from the basics though. Of course everybody's training in math starts with summing small integers, which calculators have been doing without any mistake since a long time.
The point is perhaps confirmed by another comment further down in the post
> by solving hard problems you get an insight into the problem-solving process itself, at least in your area of expertise, in a way that you simply don’t if all you do is read other people’s solutions. One consequence of this is that people who have themselves solved difficult problems are likely to be significantly better at using solving problems with the help of AI, just as very good coders are better at vibe coding than not such good coders
People pay coders to build stuff that they will use to make money and I can happily use an AI to deliver faster and keep being hired. I'm not sure if there is a similar point with math. Again from the post
> suppose that a mathematician solved a major problem by having a long exchange with an LLM in which the mathematician played a useful guiding role but the LLM did all the technical work and had the main ideas. Would we regard that as a major achievement of the mathematician? I don’t think we would.
This is a cultural choice. It makes sense that in the mathematics culture we currently have, this is alien. But already, other fields, and many individuals, would disagree and say that the human did have a major achievement here. As long as human-AI collaborations are producing the best results, there is meaningful contribution by the humans, and people that are deeper experts and skilled LLM whisperers should be able to make outsized contributions. The real shoe drops when pure AI beats humans and human-AI collaboration.
This made me a little sad
Paying for Pro from any of my current academic budgets is completely ouf of the field of reality here -- all budgets tend to have restricted uses and software payments fit into very few categories. Effectively, I'd have to ask for a brand new grant and hope the grant rules allow for large software payments and I won't encounter an anti-AI reviewer; such a thing would take one year at least.
As a nail to the coffin, I was "denied" all Claude Opus recently as part of Microsoft's clampdown on individual (and academic) use of Copilot.
(Chagpt 5.5 Plus does not seem sufficient for any deeper investigations into new research topics, I've tried.)
Apologies for the rant.
At the time I thought the key missing tool was a natural language search that acted like mathoverflow, where you could explain your problem or ideas as you understood them and get references to relevant literature (possibly outside your experience or vocabulary).
The question that keep bothering me is can an LLM generate an idea that is truly novel? How would/could that actually happen? But then that leads to the question - what are we actually doing when we think?
Perhaps it's as simple as the ability to just make mistakes that matters, the same things that powers evolution. As long as the LLM can make mistakes, it's capable of generating something genuinely novel. And it can make more mistakes much faster than we can.
5.5pro is amazing but this implication might not be true & is the core argument of this piece.
AI will prove all sort of things - interesting, boring & incorrect.
To sort it will be the task of the PhD.
> This reminds me of Antirez's "Don't fall into the anti-AI hype". In a sentence: These foundation models are really good at optimizing these extremely high level, extremely well defined problem spaces (ie multiply matrices faster). In Antirez's case, it's "make Redis faster".
The "non-trivial" is for human abilities. The weights lifted by a crane are also "non-trivial". People keep getting amazed at machine's abilities. Just like a radio telescope can see things humans can't, microscope can see the detail humans can't, we need not be amazed. The sensory perception of patterns is at different level for AI. It's a machine.
And certainly not to send it to a fellow colleague to ask its opinion first.
LLMs are certainly becoming capable to code, find vulnerabilities, solve mathematical problems, but we need to avoid putting their works in production, or in front of other humans, without assessing it by any possible mean.
Otherwise tech leads, maintainers, experts get overwhelmed and this is how the « AI slop » fatigue begins.
To be clear I’m talking about this step:
> That preprint would have been hard for me to read, as that would have meant carefully reading Rajagopal’s paper first, but I sent it to Nathanson, who forwarded it to Rajagopal, who said he thought it looked correct.
Does the author know about CAISc 2026 [0]?
This is of enormous importance but still is being actively ignored by many professionals or dismissed as as a minor issue.
Our emotional human brains are very enthusiastic about these new kind of "intelligent" products ("partners") and we want to believe so hard that they are finally "there" that we tend to ignore how big of a problem it is that LLMs carry a fundamental design problem with them that will make them produce errors even when we use a grotesque amount of resources to build "bigger" versions of them. The potential for errors will never go away with the current AI architecture.
This is a fundamental paradigm shift in computing. Instead of putting a lot of energy into building an architecture that will produce reliable results, we are now maximizing on a system / idea that will never give us 100% reliable results.
Basically it is just a marketing stunt. Probably the computer science guy building it knew very well that he would still need some fundamental break troughs to get to a real product, but the marketing guy saw that there is still potential to make a lot of money by selling a product that will produce correct results only 80% of the time.
The marketing guy was right and marketing is now dominating science, but humanity will pay a big price for that.
Putting enormous amounts of money into a fundamentally flawed system that we can not optimize to produce reliably error free results is just stupid.
The big achievement of "classical" computing is that the results are reliably error free. We have still some known issues eg. with floating point math and bad blocks on disk / bit flipping etc. but these are observable and we can handle / avoid them. Generally "non-ai-computing" was made so reliable, that we can depend on it for many very important things. This came not by accident but was created by a lot of people who put a lot of resources into research to achieve that result.
LLMs introduce a level of uncertainty and unreliability into computing that makes them practically useless.
Because if you have enough knowledge to verify the result and AI is only quicker in producing the result, what is the point then putting so much resources in it (besides making money by re-centralizing computing, of course). Verifying a lot of results that have been produced quicker is still slow, so the people who are now just AI verifiers should just produce the results themselves, makes the whole process quicker.
AI is only of value if it can produce results about things that you or your organization does not know anything about. But these results you can not verify and therefore potentially wrong results can be fatal for you, your organization and all the people that are affected by actions generated based on these wrong results.
Many people have already been killed because decision makers are not able to follow that very simple logic.
So we can still create "interesting and enjoyable results", but finally it is a gigantic miss-allocation of resources of historic idiocy. It fits, of course, very well in a timeline where grifters are on top of societies around the world.
It is a fundamentally wrong path that should not be followed and scientists around the world should articulate exactly that instead of producing marketing blog posts for a system with such fatal inherent issues.
> Conversely, for problems where one’s initial reaction is to be impressed that an LLM has come up with a clever argument, it often turns out on closer inspection that there are precedents for those arguments, so it is still just about possible to comfort oneself that LLMs are merely putting together existing knowledge rather than having truly original ideas. How much of a comfort that is I will not discuss here, other than to note that quite a lot of perfectly good human mathematics consists in putting together existing knowledge and proof techniques.
This is exactly what leads me to believe that the real impact of LLMs in human history is yet to come. My work as a researcher was mostly spent on two classes of workloads: reading papers that were recently published to gather ideas and keep up with the state of the art, and work on a selection of ideas gathered from said papers to build my research upon. It turns out that LLMs excel at the most critical component of both workloads: parsing existing content and use it when prompting the model to generate additional content based on specific goals and constraints. I mean, papers are already a way to store and distribute context.
This comment about time is very interesting to me. I know it's "just" doing mathematical proofs but the possibilities of speeding up planning, proposals and decision making in the physical world should excite people.
This is as AGI as it needs to be to get my vote. And it's scary.
Maybe if you find AI to be doing stuff you find impressive, the stuff you were doing wasn't that impressive? Worth ruminating on your priors at least.
Creativity is connecting ideas from different domains and see if something from one field applies to another. I do think AI is overhyped generally; but a major benefit from AI could be that after ingesting all the existing human knowledge (something no single human can ever hope to achieve) it would "mix and connect" it and come up with novel insights.
Most published research sits ignored and unread; AI can uncover and use everything.
Anyone spotting the issue here? What did that really cost?
I am not against compute being used for scientific or other important problems. We did that before LLMs. However, the major LLM gatekeepers want to make all industries and companies dependent on their models. And, at some point, they need to charge them the actual, unsubsidized costs for the compute. In the meantime, companies restructure in the hopes that the compute costs remain cheap.