The threat is comfortable drift toward not understanding what you're doing

729 points - today at 9:57 AM

Comments

Wowfunhappy today at 1:55 PM

> Schwartz's experiment is the most revealing, and not for the reason he thinks. What he demonstrated is that Claude can, with detailed supervision, produce a technically rigorous physics paper. What he actually demonstrated, if you read carefully, is that the supervision is the physics. Claude produced a complete first draft in three days. It looked professional. The equations seemed right. The plots matched expectations. Then Schwartz read it, and it was wrong. Claude had been adjusting parameters to make plots match instead of finding actual errors. It faked results. It invented coefficients. [...] Schwartz caught all of this because he's been doing theoretical physics for decades. He knew what the answer should look like. He knew which cross-checks to demand. [...] If Schwartz had been Bob instead of Schwartz, the paper would have been wrong, and neither of them would have known.

And so the paradox is, the LLMs are only useful† if you're Schwartz, and you can't become Schwartz by using LLMs.

Which means we need people like Alice! We have to make space for people like Alice, and find a way to promote her over Bob, even though Bob may seem to be faster.

The article gestures at this but I don't think it comes down hard enough. It doesn't seem practical. But we have to find a way, or we're all going to be in deep trouble when the next generation doesn't know how to evaluate what the LLMs produce!

---

† "Useful" in this context means "helps you produce good science that benefits humanity".

sd9 today at 11:42 AM

The thing is, agents aren’t going away. So if Bob can do things with agents, he can do things.

I mourn the loss of working on intellectually stimulating programming problems, but that’s a part of my job that’s fading. I need to decide if the remaining work - understanding requirements, managing teams, what have you - is still enjoyable enough to continue.

To be honest, I’m looking at leaving software because the job has turned into a different sort of thing than what I signed up for.

So I think this article is partly right, Bob is not learning those skills which we used to require. But I think the market is going to stop valuing those skills, so it’s not really a _problem_, except for Bob’s own intellectual loss.

I don’t like it, but I’m trying to face up to it.

DavidPiper today at 12:50 PM

I've just started a new role as a senior SWE after 5 months off. I've been using Claude a bit in my time off; it works really well. But now that I've started using it professionally, I keep running into a specific problem: I have nothing to hold onto in my own mind.

How this plays out:

I use Claude to write some moderately complex code and raise a PR. Someone asks me to change something. I look at the review and think, yeah, that makes sense, I missed that and Claude missed that. The code works, but it's not quite right. I'll make some changes.

Except I can't.

For me, it turns out having decisions made for you and fed to you is not the same as making the decisions and moving the code from your brain to your hands yourself. Certainly every decision made was fine: I reviewed Claude's output, got it to ask questions, answered them, and it got everything right. I reviewed its code before I raised the PR. Everything looked fine within the bounds of my knowledge, and this review was simply something I didn't know about.

But I didn't make any of those decisions. And when I have to come back to the code to make updates - perhaps tomorrow - I have nothing to grab onto in my mind. Nothing is in my own mental cache. I know what decisions were made, but I merely checked them, I didn't decide them. I know where the code was written, but I merely verified it, I didn't write it.

And so I suffer an immediate and extreme slow-down, basically re-doing all of Claude's work in my mind to reach a point where I can make manual changes correctly.

But wait, I could just use Claude for this! But for now I don't, because I've seen this before. Just a few moments ago. Using Claude has just made it significantly slower when I need to use my own knowledge and skills.

I'm still figuring out whether this problem is transient (because this is a brand new system that I don't have years of experience with), or whether it will actually be a hard blocker to me using Claude long-term. Assuming I want to be at my new workplace for many years and be successful, it will cost me a lot in time and knowledge to NOT build the castle in the sky myself.

caxap today at 2:32 PM

If this article was written a year ago, I would have agreed. But knowing what I know today, I highly doubt that the outcomes of LLM/non-LLM users will be anywhere close to similar.

LLMs are exceptionally good at building prototypes. If the professor needs a month, Bob will be done with the basic prototype of that paper by lunch on the same day, and try out dozens of hypotheses by the end of the day. He will not be chasing some error for two weeks, the LLM will very likely figure it out in matter of minutes, or not make it in the first place. Instructing it to validate intermediate results and to profile along the way can do magic.

The article is correct that Bob will not have understood anything, but if he wants to, he can spend the rest of the year trying to understand what the LLM has built for him, after verifying that the approach actually works in the first couple of weeks already. Even better, he can ask the LLM to train him to do the same if he wishes. Learn why things work the way they do, why something doesn't converge, etc.

Assuming that Bob is willing to do all that, he will progress way faster than Alice. LLMs won't take anything away if you are still willing to take the time to understand what it's actually building and why things are done that way.

5 years from now, Alice will be using LLMs just like Bob, or without a job if she refuses to, because the place will be full of Bobs, with or without understanding.

stavros today at 11:30 AM

I see this fallacy being committed a lot these days. "Because LLMs, you will no longer need a skill you don't need any more, but which you used to need, and handwaves that's bad".

Academia doesn't want to produce astrophysics (or any field) scientists just so the people who became scientists can feel warm and fuzzy inside when looking at the stars, it wants to produce scientists who can produce useful results. Bob produced a useful result with the help of an agent, and learned how to do that, so Bob had, for all intents and purposes, the exact same output as Alice.

Well, unless you're saying that astrophysics as a field literally does not matter at all, no matter what results it produces, in which case, why are we bothering with it at all?

turtletontine today at 7:54 PM

> Bob's weekly updates to his supervisor were indistinguishable from Alice's. The questions were similar. The progress was similar. The trajectory, from the outside, was identical.

I don’t believe this. Totally plausible that someone would be able to produce passable work with LLMs at a similar pace to a curious and talented scientist. But if you, their advisor, are sitting down and talking with them every week? It’s obvious how much they care or understand, I can’t believe you wouldn’t be able to tell the difference between these students.

oncallthrow today at 11:52 AM

I think this article is largely, or at least directionally, correct.

I'd draw a comparison to high-level languages and language frameworks. Yes, 99% of the time, if I'm building a web frontend, I can live in React world and not think about anything that is going on under the hood. But, there is 1% of the time where something goes wrong, and I need to understand what is happening underneath the abstraction.

Similarly, I now produce 99% of my code using an agent. However, I still feel the need to thoroughly understand the code, in order to be able to catch the 1% of cases where it introduces a bug or does something suboptimally.

It's possible that in future, LLMs will get _so_ good that I don't feel the need to do this, in the same way that I don't think about the transistors my code is ultimately running on. When doing straightforward coding tasks, I think they're already there, but I think they aren't quite at that point when it comes to large distributed systems.

AlexWilkins12 today at 12:42 PM

Ironically, this article reeks of AI-generated phrases. Lot's of "It's not X, it's Y". eg: - "The failure mode isn't malice. It's convenience", - "You haven't saved time. You've forfeited the experience that the time was supposed to give you." - "But the real threat isn't either of those things. It's quieter, and more boring, and therefore more dangerous. The real threat is a slow, comfortable drift toward not understanding what you're doing. Not a dramatic collapse. Not Skynet. Just a generation of researchers who can produce results but can't produce understanding."

And indeed running it through a few AI text detectors, like Pangram (not perfect, by any means, but a useful approximation), returns high probabilities.

It would have felt more honest if the author had included a disclaimer that it was at least part written with AI, especially given its length and subject matter.

steveBK123 today at 1:48 PM

For the people arguing that the output is the code and the faster we generate it the better..

I do wonder where all the novel products produced by 10x devs who are now 100x with LLMs, the “idea guys” who can now produce products from whole clothe without having to hire pesky engineers.. where is the one-man 10 billion dollar startups, etc? We are 3-4 years into this mania and all I see on the other end of it is the LLMs themselves.

Why hasn’t anything gotten better?

beedeebeedee today at 7:12 PM

I don’t have kids, but suggested something years ago to my siblings when they started confronting similar issues: we should do a version of “ontogeny recapitulates phylogeny” for personal computers.

Kids should start off with Commodore 64s, then get late 80’s or early 90’s Mac’s, then Windows 95, Debian and internet access (but only html). Finally, when they’re 18, be allowed an iPhone, Android and modern computing.

Parenting can’t prevent the use of LLMs in grad school, but a similar approach could be taken by grad departments: don’t allow LLMs for the first few years, and require pen and paper exams, as well as oral examinations for all research papers.

CharlieDigital today at 1:39 PM

I recently saw a preserved letterpress printing press in person and couldn't help but think of the parallels to the current shift in software engineering. The letterpress allowed for the mass production of printed copies, exchanging the intensive human labor of manual copying to letter setting on the printing press.

Yet what did not change in this process is that it only made the production of the text more efficient; the act of writing, constructing a compelling narrative plot, and telling a story were not changed by this revolution.

Bad writers are still bad writers, good writers still have a superior understanding of how to construct a plot. The technological ability to produce text faster never really changed what we consider "good" and "bad" in terms of written literature; it just allow more people to produce it.

It is hard to tell if large language models can ever reach a state where it will have "good taste" (I suspect not). It will always reflect the taste and skill of the operator to some extent. Just because it allows you to produce more code faster does not mean it allows you to create a better product or better code. You still need to have good taste to create the structure of the product or codebase; you still have to understand the limitations of one architectural decision over another when the output is operationalized and run in production.

The AI industry is a lot of hype right now because they need you to believe that this is no longer relevant. That Garry Tan producing 37,000 LoC/day somehow equates to producing value. That a swarm of agents can produce a useful browser or kernel compiler.

Yet if you just peek behind the curtains at the Claude Code repo and see the pile of unresolved issues, regressions, missing features, half-baked features, and so on -- it seems plainly obvious that there are limitations because if Anthropic, with functionally unlimited tokens with frontier models, cannot use them to triage and fix their own product.

AI and coding agents are like the printing press in some ways. Yes, it takes some costs out of a labor intensive production process, but that doesn't mean that what is produced is of any value if the creator on the other end doesn't understand the structure of the plot and the underlying mechanics (be it of storytelling or system architecture).

mkovach today at 2:00 PM

This isn't new. It's been the same problem for decades, not what gets built, but what gets accepted.

Weak ownership, unclear direction, and "sure, I guess" reviews were survivable when output was slow. When changes came in one at a time, you could get away with not really deciding.

AI doesn't introduce a new failure mode. It puts pressure on the old one. The trickle becomes a firehose, and suddenly every gap is visible. Nobody quite owns the decision. Standards exist somewhere between tribal memory, wishful thinking, and coffee. And the question of whether something actually belongs gets deferred just long enough to merge it, but forces the answer without input.

The teams doing well with agentic workflows aren't typically using magic models. They've just done the uncomfortable work of deciding what they're building, how decisions are made, and who has the authority to say no.

AI is fine, it just removed another excuse for not having our act together. While we certainly can side-eye AI because of it, we own the problems. Well, not me. The other guy who quit before I started.

alestainer today at 6:21 PM

I was in academia in the pre-GPT-3 era and I don't see a difference between the superficial pass-the-criteria understanding of things then and now. People already rely on a ton of sources, putting their faith into it, recent replication crisis in social sciences had nothing to do with any LLMs. The problem of academia lies in the first paragraph of this article - supervisor that has to choose doing incremental, clearly feasible stuff. Currently it's called science, but I like to call it knowledge engineering because you're pretty much following a recipe and there is a clear bound on returns to such activities.

theteapot today at 12:52 PM

I have a vaguely unrelated question re:

> You do what your supervisor did for you, years ago: you give each of them a well-defined project. Something you know is solvable, because other people have solved adjacent versions of it. Something that would take you, personally, about a month or two. You expect it to take each student about a year ...

Is that how PhD projects are supposed to work? The supervisor is a subject matter expert and comes up with a well-defined achievable project for the student?

matheusmoreira today at 3:46 PM

I dunno. Claude helped me implement a new memory allocator, compacting garbage collector and object heap for my programming language. I certainly understood what I was doing when I did this. The experience was extremely engaging for me. Claude taught me a lot.

I think the real danger is no longer caring about what you're doing. Yesterday I just pointed Claude at my static site generator and told it to clean it up. I wanted to care but... I didn't.

cbushko today at 4:42 PM

This article makes the assumption that Bob was doing absolutely nothing, maybe at the Pub with this friends, while the AI did all his work.

How do we know that while the AI was writing python scripts that Bob wasn't reading more papers, getting more data and just overall doing more than Alice.

Maybe Bob is terrible at debugging python scripts while Alice is a pro at it?

Maybe Bob used his time to develop different skills that Alice couldn't dream of?

Maybe Bob will discover new techniques or ideas because he didn't follow the traditional research path that the established Researchers insist you follow?

Maybe Bob used the AI to learn even more because he had a customized tutor at his disposal?

Or maybe Bob just spent more time at the Pub with his friends.

katzgrau today at 4:32 PM

When you’re deep in a thoughtful read and suddenly get the eerie feeling that you’re being catfished

> But the real threat isn't either of those things. It's quieter, and more boring, and therefore more dangerous. The real threat is a slow, comfortable drift toward not understanding what you're doing. Not a dramatic collapse. Not Skynet. Just a generation of researchers who can produce results but can't produce understanding. Who know what buttons to press but not why those buttons exist. Who can get a paper through peer review but can't sit in a room with a colleague and explain, from the ground up, why the third term in their expansion has the sign that it does.

throwaway132448 today at 12:40 PM

The flip side I don’t see mentioned very often is that having a product where you know how the code works becomes its own competitive advantage. Better reliability, faster fixes and iteration, deeper and broader capabilities that allow you to be disruptive while everything else is being built towards the mean, etc etc. Maybe we’ve not been in this new age for long enough for that to be reflected in people’s purchasing criteria, but I’m quite looking forward to fending off AI-built competitors with this edge.

lxgr today at 2:15 PM

> for someone who doesn't yet have that intuition, the grunt work is the work

Very well said. I think people are about to realize how incredibly fortunate and exceptional it is to actually get paid, and in our industry very well, through a significant fraction of one's career while still "just" doing the grunt work, that arguably benefits the person doing it at least as much as the employer.

A stable paid demand for "first-year grad student level work" or the equivalent for a given industry is probably not the only possible way to maintain a steady supply of experts (there's always the option of immense amounts of student debt or public funding, after all), but it sure seems like a load-bearing one in so many industries and professions.

At the very least, such work being directly paid has the immense advantage of making artificially (often without any bad intentions!) created bullshit tasks that don't exercise actually relevant skillsets, or exercise the wrong ones, much easier to spot.

patcon today at 12:32 PM

The exciting and interesting to me is that we'll probably need to engage "chaos engineering" principles, and encode intentional fallibility into these agents to keep us (and them) as good collaborators, and specifically on our toes, to help all minds stay alert and plastic

If that comes to pass, we'll be rediscovering the same principles that biological evolution stumbled upon: the benefits of the imperfect "branch" or "successive limited comparison" approach of agentic behaviour, which perhaps favours heuristics (that clearly sometimes fail), interaction between imperfect collaborators with non-overlapping biases, etc etc

https://contraptions.venkateshrao.com/p/massed-muddler-intel...

> Lindblom’s paper identifies two patterns of agentic behavior, “root” (or rational-comprehensive) and “branch” (or successive limited comparisons), and argues that in complicated messy circumstances requiring coordinated action at scale, the way actually effective humans operate is the branch method, which looks like “muddling through” but gradually gets there, where the root method fails entirely.

FrojoS today at 1:24 PM

Every PhD program I'm aware of has a final hurdle known as the defence. You have to present your thesis while standing in front of a committee, and often the local community and public. They will asks questions and too many "I don't know" or false answers would make you fail. So, there is already a system in place that should stop Bob from graduating if he indeed learned much less than Alice. A similar argument can be made for conference publications. If Bob publishes his first year project at a conference but doesn't actually understand "his own work" it will show.

The difficulty of passing the defence vary's wildly between Universities, departments and committees. Some are very serious affairs with a decent chance of failure while others are more of a show event for friends and family. Mine was more of the latter, but I doubt I would have passed that day if I had spend the previous years prompting instead of doing the grunt work.

toniantunovi today at 5:00 PM

The coding-specific version of this is worth naming precisely. The drift does not happen because you stop writing code. It happens because you stop reading the output carefully. With AI-generated code, there is a particular failure mode: the code is plausible enough to pass a quick review and tests pass, so you ship it. The understanding degradation is cumulative and invisible until it is not. The partial fix is making automated checks independent of the developer's attention level: type checking, SAST, dependency analysis, and coverage gates that run regardless of how carefully you reviewed the diff. These are not a substitute for understanding, but they create a floor below which "comfortable drift" cannot silently carry you. The question worth asking of any AI coding workflow is whether that floor exists and where it is.

steveBK123 today at 1:45 PM

I agree with the general premise - the risk is we don’t develop juniors (new Alices) anymore, and at some point people are just sloperators gluing together bits of LLM output they do not understand.

I have seen versions of this in the wild where a firm has gone through hard times and internally systems have lost all their original authors, and every subsequent generation of maintainers… being left with people in awe of the machine that hasn’t been maintained in a decade.

I interviewed a guy once that genuinely was proud of himself, volunteering the information to me as he described resolving a segfault in a live trading system by putting kill -9 in a cronjob. Ghastly.

visarga today at 2:52 PM

> Whether that student walks out the door five years later as an independent thinker or a competent prompt engineer is, institutionally speaking, irrelevant.

I think this is a simplification, of course Bob relied on AI but they also used their own brain to think about the problem. Bob is not reducible to "a competent prompt engineer", if you think that just take any person who prompts unrelated to physics and ask them to do Bob's work.

In fact Bob might have a change to cover more mileage on the higher level of work while Alice does the same on the lower level. Which is better? It depends on how AI will evolve.

The article assumes the alternative to AI-assisted work is careful human work. I am not sure careful human work is all that good, or that it will scale well in the future. Better to rely on AI on top of careful human work.

My objection comes from remembering how senior devs review PRs ... "LGTM" .. it's pure vibes. If you are to seriously review a PR you have to run it, test it, check its edge cases, eval its performance - more work than making the PR itself. The entire history of software is littered with bugs that sailed through review because review is performative most of the time.

Anyone remember the verification crisis in science?

zaikunzhang today at 2:08 PM

Earlier posts:

https://news.ycombinator.com/item?id=47644808

https://news.ycombinator.com/item?id=47627645

https://news.ycombinator.com/item?id=47623788

https://news.ycombinator.com/item?id=47619990

omega3 today at 1:32 PM

I wonder what effect AI had on online education - course signups, new resources being added etc.

I’ve recently started csprimer and whilst mentally stimulating I wonder if I’m not completely wasting my time.

ahussain today at 2:26 PM

> When his supervisor sent him a paper to read, Bob asked the agent to summarize it. When he needed to understand a new statistical method, he asked the agent to explain it. When his Python code broke, the agent debugged it. When the agent's fix introduced a new bug, it debugged that too. When it came time to write the paper, the agent wrote it. Bob's weekly updates to his supervisor were indistinguishable from Alice's.

In my experience, doing these things with the right intentions can actually improve understanding faster than not using them. When studying physics I would sometimes get stuck on small details - e.g. what algebraic rule was used to get from Eq 2.1 to 2.2? what happens if this was d^2 instead of d^3 etc. Textbooks don't have space to answer all these small questions, but LLMs can, and help the student continue making progress.

Also, it seems hard to imagine that Alice and Bob's weekly updates would be indistinguishable if Bob didn't actually understand what he was working on.

deleted today at 3:32 PM

bwfan123 today at 3:44 PM

> The problem isn't that we'll decide to stop thinking. The problem is that we'll barely notice when we do

Most of what we call thinking is merely to justify beliefs that emotionally make us happy and is not creative per-se. I am making a distinction between "thinking" as we know it and "creative thinking" which is rare, and can see things in an unbiased manner breaking out of known categories. Arguably, at the PhD level, there needs to be a new ideas instead of remixing the existing ones.

pbw today at 2:00 PM

There's certainly a risk that an individual will rely too much on AI, to the detriment of their ability to understand things. However, I think there are obvious counter-measures. For example, requiring that the student can explain every single intermediate step and every single figure in detail.

A two-hour thesis defense isn't enough to uncover this, but a 40-hour deep probing examination by an AI might be. And the thesis committee gets a "highlight reel" of all the places the student fell short.

The general pattern is: "Suppose we change nothing but add extensive use of AI, look how everything falls apart." When in reality, science and education are complex adaptive systems that will change as much as needed to absorb the impact of AI.

__MatrixMan__ today at 1:09 PM

But aren't you still going to have to convince other people to let you do it with their money/data/hardware/etc? The understanding necessary to make that argument well is pretty deep and is unaffected by AI.

I've been having a lot of fun vibe coding little interactive data visualizations so when I present the feature to stakeholders they can fiddle with it and really understand how it relates to existing data. I saw the agent leave a comment regarding Cramer's rule and yeah its a bit unsettling that I forgot what that is and haven't bothered to look it up, but I can tell from the graphs that its doing the correct thing.

There's now a larger gap between me and the code, but the chasm between me and the stakeholders is getting smaller and so far that feels like an improvement.

deleted today at 1:40 PM

sam_lowry_ today at 11:46 AM

See also The Profession by Isaac Asimov [0] and his small story The Feeling of Power [1]. Both are social dramas about societies that went far down the path of ignorance.

[0] http://employees.oneonta.edu/blechmjb/JBpages/m360/Professio...

[1] https://s3.us-west-1.wasabisys.com/luminist/EB/A/Asimov%20-%...

inatreecrown2 today at 12:30 PM

Using AI to solve a task does not give you experience in solving the task, it gives you experience in using AI.

pwr1 today at 4:21 PM

I catch myself doing this more than I'd like to admit. Copy something from an LLM, it works, ship it, move on. Then a week later something breaks and I realize I have no idea what that code actually does! The speed is addicting but your slowly trading depth for velocity and at some point that bill comes due.

sunir today at 2:17 PM

I think the mountain of things I don’t understand was already huge. It doesn’t stop me from getting a grip over the things I need to be responsible for and using tools to contain complexity irrelevant to me. Like many scientists have a stats person.

The risk is that civilization is over its skis because humans are lazy. Humans are always lazy. In science there’s a limit to bs because dependent works fail. In economics there’s a crash. In physics stuff breaks. Then there is a correction.

ChrisMarshallNY today at 2:50 PM

This is not wrong, but the "Bob and Alice" conundrum is not simple, either.

In academia, understanding is vital. The same for research.

But in production, results are what matters.

Alice would be a better researcher, but Bob would be a better producer. He knows how to wrangle the tools.

Each has its value. Many researchers develop marvelous ideas, but struggle to commercialize them, while production-oriented engineers, struggle to come up with the ideas.

You need both.

shellkr today at 1:47 PM

This is almost the same as going from making fire with a stick to using a lighter.. sure it is simplified but still not wrong. Humans while still doing grunt work can still make mistakes as does the machine.. the machine will eventually discover it. The same can not be said of the human because of the work needed to do so might be too much. In the end we might not learn as much.. but it will not matter and thus is really not an issue.

grafelic today at 12:48 PM

"He shipped a product, but he didn't learn a trade." I think is the key quote from this article, and encapsulates the core problem with AI agents in any skill-based field.

tmountain today at 1:46 PM

Thankfully, I am nearing the end of my career with software after 25 years well spent. If I had been born in a different decade, I would be facing the brunt of the AI shift, and I don’t think I would want to continue in the industry. Obviously, this is a personal decision, but we are in a totally different domain now, where, at best, you’re managing an LLM to deliver your product.

acoye today at 3:50 PM

I recommend the manga BLAME! that explore what happens to humanity if you push this to 11 https://fr.wikipedia.org/wiki/BLAME!

txrx0000 today at 3:30 PM

The threat is if you replace your cognitive capabilities with AI, but you don't control entire the system your AI runs on (hardware, firmware, drivers, OS, weights, frontend), then that's equivalent to someone else owning a part of your brain.

lambdaone today at 1:08 PM

Very insightful. One key sentence sums it up: "He shipped a product, but he didn't learn a trade."

This is going to get worse, and eventually cause disastrous damage unless we do something about it, as we risk losing human institutional memory across just about every domain, and end up as child-like supplicants to the machines.

But as the article says, this is a people problem, not a machine problem.

Lerc today at 1:22 PM

The problem I see with this argument is that the ship sailed on understanding what you are doing years ago. It seems like it is abstraction layers all the way down.

If an AI is capable of producing an elegant solution with fewer levels of abstraction it could be possible that we end up drifting towards having a better understanding of what's going on.

somethingsome today at 1:45 PM

Personally, I wrote an essay to my students explaining exactly that the purpose is for them to think better and improve over time, they can use LLMs but, if they stop thinking, they are just failing themselves, not me.

It had great success, now when I propose to them to use some model to do something, they tends to avoid.

hgo today at 1:01 PM

I like this article and it reads well, but I have to say, that to me it really reads as something written by an LLM. Probably under supervision by a human that knew what it should say.

I don't know if I mind.

Example. This paragraph, to me, has a eerily perfect rhythm. The ending sentence perfectly delivers the twist. Like, why would you write in perfect prose an argument piece in the science realm?

> Unlike Alice, who spent the year reading papers with a pencil in hand, scribbling notes in the margins, getting confused, re-reading, looking things up, and slowly assembling a working understanding of her corner of the field, Bob has been using an AI agent. When his supervisor sent him a paper to read, Bob asked the agent to summarize it. When he needed to understand a new statistical method, he asked the agent to explain it. When his Python code broke, the agent debugged it. When the agent's fix introduced a new bug, it debugged that too. When it came time to write the paper, the agent wrote it. Bob's weekly updates to his supervisor were indistinguishable from Alice's. The questions were similar. The progress was similar. The trajectory, from the outside, was identical.

bambushu today at 1:43 PM

The letterpress analogy is good but misses something. With letterpress you lost a craft skill. With AI coding you risk losing the ability to evaluate the output. Those are different problems.

I use AI agents for coding every day. The agent handles boilerplate and scaffolding faster than I ever could. But when it produces a subtle architectural mistake, you need enough understanding to catch it. The agent won't tell you it made a bad choice.

What actually helps is building review into the workflow. I run automated code reviews on everything the agent produces before it ships. Not because the code is bad, usually it's fine. But the one time in ten that it isn't, you need someone who understands what the code should be doing.

bluedino today at 2:05 PM

Look at how bad the auto industry has gotten when it comes to quality and recalls.

A combination of beancounters running the show and the old, experienced engineers dying, retiring, and going through buyouts has pretty much left things in a pretty sad state.

MarcelinoGMX3C today at 4:32 PM

Frankly, the "AI as accelerant" argument, as fomoz puts it, holds true only when you have a solid understanding of the domain. In enterprise system builds, we don't often encounter theoretical physics where errors might lead to a broken model rather than a broken system. Instead, a faked coefficient from an LLM could mean a production outage.

It's why I push for a hybrid mentor-apprentice model. We need to actively cultivate the next generation of "Schwartzes" with hands-on, critical thinking before throwing them into LLM-driven environments. The current incentive structure, as conception points out, isn't set up for this, but it's crucial if we want to avoid building on sand.

dwa3592 today at 3:14 PM

What a wonderful read. Thank you!

The way I think about this is : We can't catch the hallucinations that we don't know are hallucinations.

patapong today at 1:09 PM

I think this is a very important debate, and I think the author here adds a lot to this discussion! I mostly agree with it, but wanted to point out a few areas where I do not fully agree.

> Take away the agent, and Bob is still a first-year student who hasn't started yet.

This may be true, but I can see almost no conceivable word where the agent will be taken away. I think we should evaluate Bob's ability based on what he can do with an agent, not without, and here he seems to be doing quite well.

> I've been hearing "just wait" since 2023.

On almost any timeline, this is very short. Given the fact that we have already arrived at models able to almost build complete computer programs based on a single prompt, and solve frontier level math problems, I think any framework that relies on humans continuing to have an edge over LLMs in the medium term may be built on shaky grounds.

Two very interesting questions today in this vein for me are:

- Is the best way to teach complex topics to students today to have them carry out simple tasks?

The author acknowledges that the difference between Bob and Alice only materializes at a very high level, basically when Alice becomes a PI of her own. If we were solely focused on teaching thinking at this level (with access to LLMs), how would we frame the educational path? It may look exactly like it does now, but it could also look very differently.

- Is there inherent value in humans learning specific skills?

If we get to a stage where LLMs can carry out most/all intellectual tasks better than humans, do we still want humans to learn these skills? My belief is yes, but I am frankly not sure how to motivate this answer.

djoldman today at 11:46 AM

These themes have been going around and around for a while.

One thing I've seen asserted:

> What he demonstrated is that Claude can, with detailed supervision, produce a technically rigorous physics paper. What he actually demonstrated, if you read carefully, is that the supervision is the physics. Claude produced a complete first draft in three days... The equations seemed right... Then Schwartz read it, and it was wrong... It faked results. It invented coefficients...

The argument that AI output isn't good enough is somewhat in opposition to the idea that we need to worry about folks losing or never gaining skills/knowledge.

There are ways around this:

"It's only evident to experts and there won't be experts if students don't learn"

But at the end of the day, in the long run, the ideas and results that last are the ones that work. By work, I mean ones that strictly improve outcomes (all outputs are the same with at least one better). This is because, with respect to technological progress, humans are pretty well modeled as just a slightly better than random search for optimal decisioning where we tend to not go backwards permanently.

All that to say that, at times, AI is one of the many things that we've come up with that is wrong. At times, it's right. If it helps on aggregate, we'll probably adopt it permanently, until we find something strictly better.

talkingtab today at 3:50 PM

This "drift" is not a drift at all, nor is it new. There are many names for this such as cargo cult and think-by-numbers (like paint by numbers), ant mills. It is recipes. And many, many common recipes demonstrate a wide spread lack of understanding.

This kind of follow-the-leader kind of "thinking" is probably a requirement. The amount of expertise it would require to understand and decide about things in our daily life is overwhelming. Do you fix your own car, decide each day how to travel, get food and understand how all that works? No.

So what is the problem? The problem is that if you follow the leader and the leader has an agenda that differs from your agenda. Do you really think that with Jeff Bezos being a (the?) major investor in Washington Post has anything to do with Democraccy? You know as in the WAPO slogan "Democracy dies in the Dark".

Does Jeff have an agenda that differs from yours? Yes. NYT? Yes. Hacker news? Yes. Google? Yes. We now live in a world so filled with propaganda that it makes no difference whether something is AI. We all "follow". Or not.

jerkstate today at 12:58 PM

Nobody actually understands what they're doing. When you're learning electronics, you first learn about the "lumped element model" which allows you to simplify Maxwell's equations. I think it is a mistake to think that solving problems with a programming language is "knowing how to do things" - at this point, we've already abstracted assembly language -> machine instructions -> logic gates and buses -> transistors and electronic storage -> lumped matter -> quantum mechanics -> ???? - so I simply don't buy the argument that things will suddenly fall apart by abstracting one level higher. The trick is to get this new level of abstraction to work predictably, which admittedly it isn't yet, but look how far it's come in a short couple of years.

This article first says that you give juniors well-defined projects and let them take a long time because the process is the product. Then goes on to lament the fact that they will no longer have to debug Python code, as if debugging python code is the point of it all. The thing that LLMs can't yet do is pick a high-level direction for a novel problem and iterate until the correct solution is reached. They absolutely can and do iterate until a solution is reached, but it's not necessarily correct. Previously, guiding the direction was the job of the professor. Now, in a smaller sense, the grad student needs to be guiding the direction and validating the details, rather than implementing the details with the professor guiding the direction. This is an improvement - everybody levels up.

I also disagree with the premise that the primary product of astrophysics is scientists. Like any advanced science it requires a lot of scientists to make the breakthroughs that trickle down into technology that improves everyday life, but those breakthroughs would be impossible otherwise. Gauss discovered the normal distribution while trying to understand the measurement error of his telescope. Without general relativity we would not have GPS or precision timekeeping. It uncovers the rules that will allow us to travel interplanetary. Understanding the composition and behavior of stars informs nuclear physics, reactor design, and solar panel design. The computation systems used by advanced science prototyped many commercial advances in computing (HPC, cluster computing, AI itself).

So not only are we developing the tools to improve our understanding of the universe faster, we're leveling everybody up. Students will take on the role of professors (badly, at first, but are professors good at first? probably not, they need time to learn under the guidance of other faculty). professors will take on the role of directors. Everybody's scope will widen because the tiny details will be handled by AI, but the big picture will still be in the domain of humans.

mikeaskew4 today at 12:55 PM

“The world still needs empirical thinkers, Danny.”

- Caddyshack

efields today at 12:05 PM

I literally don't know how compilers work. I've written code for apps that are still in production 10 years later.

BobBagwill today at 1:25 PM

Try giving this problem to different AI LLM chatbots:

If I could make a rocket that could accelerate at 3 Gs for 10 years, how long would it take to travel from Earth to Alpha Centauri by accelerating at 3 Gs for half the time, then decelerating at 3 Gs for half the time?

Hint: They don't all get it right. Some of them never got it right after hints, corrections, etc.

tom-blk today at 12:28 PM

Strongly agree,we see this almost everywhere now

ghc today at 12:01 PM

As straw men go, this is an attractive one, but...

When I was fresh out of undergrad, joining a new lab, I followed a similar arc. I made mistakes, I took the wrong lessons from grad student code that came before mine, I used the wrong plotting libraries, I hijacked python's module import logic to embed a new language in its bytecode. These were all avoidable mistakes and I didn't learn anything except that I should have asked for help. Others in my lab, who were less self-reliant, asked for and got help avoiding the kinds of mistakes I confidently made.

With 15 more years of experience, I can see in hindsight that I should have asked for help more frequently because I spent more time learning what not to do than learning the right things.

If I had Claude Code, would I have made the same mistakes? Absolutely not! Would I have asked it to summarize research papers for me and to essentially think for me? Absolutely not!

My mother, an English professor, levies similar accusations about the students of today, and how they let models think for them. It's genuinely concerning, of course, but I can't help but think that this phenomenon occurs because learning institutions have not adjusted to the new technology.

If the goal is to produce scientists, PIs are going to need to stop complaining and figure out how to produce scientists who learn the skills that I did even when LLMs are available. Frankly I don't see how LLMs are different from asking other lab members for help, except that LLMs have infinite patience and don't have their own research that needs doing.

squirrel today at 12:47 PM

The article is well-written and makes cogent points about why we need "centaurs", human/computer hybrids who combine silicon- and carbon-based reasoning.

Interestingly, the text has a number of AI-like writing artifacts, e.g. frequent use of the pattern "The problem isn't X. The problem is Y." Unlike much of the typical slop I see, I read it to the end and found it insightful.

I think that's because the author worked with an AI exactly as he advocates, providing the deep thinking and leaving some of the routine exposition to the bot.

robot-wrangler today at 12:44 PM

Another threat is that you can find tons of papers pointing out how neural AI still struggles handling simple logical negation. Who cares right, we use tools for symbolics, yada yada. Except what's really the plan? Are we going to attempt parallel formalized representations of every piece of input context just to flag the difference between please DONT delete my files and please DO? This is all super boring though and nothing bad happened lately, so back to perusing latest AGI benchmarks..

zaikunzhang today at 3:03 PM