Project Glasswing: Securing critical software for the AI era

546 points - today at 6:09 PM

Comments

pizlonator today at 8:14 PM

It's messed up that Anthropic simultaneously claims to be a public benefit copro and is also picking who gets to benefit from their newly enhanced cybersecurity capabilities. It means that the economic benefit is going to the existing industry heavyweights.

(And no, the Linux Foundation being in the list doesn't imply broad benefit to OSS. Linux Foundation has an agenda and will pick who benefits according to what is good for them.)

I think it would be net better for the public if they just made Mythos available to everyone.

rakel_rakel today at 8:39 PM

> On the global stage, state-sponsored attacks from actors like China, Iran, North Korea, and Russia have threatened to compromise the infrastructure that underpins both civilian life and military readiness.

AITA for thinking that PRISM was probably the state sponsored program affecting civilian life the most? And that one state is missing from the list here?

9cb14c1ec0 today at 6:54 PM

Now, its very possible that this is Anthropic marketing puffery, but even if it is half true it still represents an incredible advancement in hunting vulnerabilities.

It will be interesting to see where this goes. If its actually this good, and Apple and Google apply it to their mobile OS codebases, it could wipe out the commercial spyware industry, forcing them to rely more on hacking humans rather than hacking mobile OSes. My assumption has been for years that companies like NSO Group have had automated bug hunting software that recognizes vulnerable code areas. Maybe this will level the playing field in that regard.

It could also totally reshape military sigint in similar ways.

Who knows, maybe the sealing off of memory vulns for good will inspire whole new classes of vulnerabilities that we currently don't know anything about.

redfloatplane today at 6:29 PM

The system card for Claude Mythos (PDF): https://www-cdn.anthropic.com/53566bf5440a10affd749724787c89...

Interesting to see that they will not be releasing Mythos generally. [edit: Mythos Preview generally - fair to say they may release a similar model but not this exact one]

I'm still reading the system card but here's a little highlight:

> Early indications in the training of Claude Mythos Preview suggested that the model was likely to have very strong general capabilities. We were sufficiently concerned about the potential risks of such a model that, for the first time, we arranged a 24-hour period of internal alignment review (discussed in the alignment assessment) before deploying an early version of the model for widespread internal use. This was in order to gain assurance against the model causing damage when interacting with internal infrastructure.

and interestingly:

> To be explicit, the decision not to make this model generally available does _not_ stem from Responsible Scaling Policy requirements.

Also really worth reading is section 7.2 which describes how the model "feels" to interact with. That's also what I remember from their release of Opus 4.5 in November - in a video an Anthropic employee described how they 'trusted' Opus to do more with less supervision. I think that is a pretty valuable benchmark at a certain level of 'intelligence'. Few of my co-workers could pass SWEBench but I would trust quite a few of them, and it's not entirely the same set.

Also very interesting is that they believe Mythos is higher risk than past models as an autonomous saboteur, to the point they've published a separate risk report for that specific threat model: https://www-cdn.anthropic.com/79c2d46d997783b9d2fb3241de4321...

The threat model in question:

> An AI model with access to powerful affordances within an organization could use its affordances to autonomously exploit, manipulate, or tamper with that organization’s systems or decision-making in a way that raises the risk of future significantly harmful outcomes (e.g. by altering the results of AI safety research).

jryio today at 6:29 PM

Let's fast forward the clock. Does software security converge on a world with fewer vulnerabilities or more? I'm not sure it converges equally in all places.

My understanding is that the pre-AI distribution of software quality (and vulnerabilities) will be massively exaggerated. More small vulnerable projects and fewer large vulnerable ones.

It seems that large technology and infrastructure companies will be able to defend themselves by preempting token expenditure to catch vulnerabilities while the rest of the market is left with a "large token spend or get hacked" dilemma.

ssgodderidge today at 6:47 PM

At the very bottom of the article, they posted the system card of their Mythos preview model [1].

In section 7.6 of the system card, it discusses Open self interactions. They describe running 200 conversations when the models talk to itself for 30 turns.

> Uniquely, conversations with Mythos Preview most often center on uncertainty (50%). Mythos Preview most often opens with a statement about its introspective curiosity toward its own experience, asking questions about how the other AI feels, and directly requesting that the other instance not give a rehearsed answer.

I wonder if this tendency toward uncertainty, toward questioning, makes it uniquely equipped to detect vulnerabilities where others model such as Opus couldn't.

[1] https://www-cdn.anthropic.com/53566bf5440a10affd749724787c89...

SheinhardtWigCo today at 6:59 PM

Society is about to pay a steep price for the software industry's cavalier attitude toward memory safety and control flow integrity.

cbg0 today at 6:31 PM

One of the things I'm always looking at with new models released is long context performance, and based on the system card it seems like they've cracked it:

  GraphWalks BFS 256K-1M

  Mythos     Opus     GPT5.4

  80.0%     38.7%     21.4%

stephc_int13 today at 9:05 PM

I think this is bad news for hackers, spyware companies and malware in general.

We all knew vulnerabilities exist, many are known and kept secret to be used at an appropriate time.

There is a whole market for them, but more importantly large teams in North Korea, Russia, China, Israel and everyone else who are jealously harvesting them.

Automation will considerably devalue and neuter this attack vector. Of course this is not the end of the story and we've seen how supply chain attacks can inject new vulnerabilities without being detected.

I believe automation can help here too, and we may end-up with a considerably stronger and reliable software stack.

dang today at 8:24 PM

Related ongoing threads:

System Card: Claude Mythos Preview [pdf] - https://news.ycombinator.com/item?id=47679258

Assessing Claude Mythos Preview's cybersecurity capabilities - https://news.ycombinator.com/item?id=47679155

I can't tell which of the 3 current threads should be merged - they all seem significant. Anyone?

temp123789246 today at 8:47 PM

OpenAI initially claimed that GPT-2 was too dangerous to release in 2019.

How many times will labs repeat the same absurd propaganda?

agrishin today at 6:31 PM

>>> the US and its allies must maintain a decisive lead in AI technology. Governments have an essential role to play in helping maintain that lead, and in both assessing and mitigating the national security risks associated with AI models. We are ready to work with local, state, and federal representatives to assist in these tasks.

How long would it take to turn a defensive mechanism into an offensive one?

josh-sematic today at 7:23 PM

Must be nice to be in a position to sell both disease and cure.

simonw today at 9:05 PM

I buy the rationale for this. There's been a notable uptick over the past couple of weeks of credible security experts unrelated to Anthropic calling the alarm on the recent influx of actually valuable AI-assisted vulnerability reports.

From Willy Tarreau, lead developer of HA Proxy: https://lwn.net/Articles/1065620/

> On the kernel security list we've seen a huge bump of reports. We were between 2 and 3 per week maybe two years ago, then reached probably 10 a week over the last year with the only difference being only AI slop, and now since the beginning of the year we're around 5-10 per day depending on the days (fridays and tuesdays seem the worst). Now most of these reports are correct, to the point that we had to bring in more maintainers to help us.

> And we're now seeing on a daily basis something that never happened before: duplicate reports, or the same bug found by two different people using (possibly slightly) different tools.

From Daniel Stenberg of curl: https://mastodon.social/@bagder/116336957584445742

> The challenge with AI in open source security has transitioned from an AI slop tsunami into more of a ... plain security report tsunami. Less slop but lots of reports. Many of them really good.

> I'm spending hours per day on this now. It's intense.

From Greg Kroah-Hartman, Linux kernel maintainer: https://www.theregister.com/2026/03/26/greg_kroahhartman_ai_...

> Months ago, we were getting what we called 'AI slop,' AI-generated security reports that were obviously wrong or low quality. It was kind of funny. It didn't really worry us.

> Something happened a month ago, and the world switched. Now we have real reports. All open source projects have real reports that are made with AI, but they're good, and they're real.

Shared some more notes on my blog here: https://simonwillison.net/2026/Apr/7/project-glasswing/

6thbit today at 9:17 PM

This is silly and disingenuous. In a matter of days or weeks a competing lab will make public a model with capabilities beyond this “mythos” one.

Is this a huge fear-driven marketing stunt to get governments and corporations into dealing with anthropic?

Miraste today at 6:39 PM

>We plan to launch new safeguards with an upcoming Claude Opus model, allowing us to improve and refine them with a model that does not pose the same level of risk as Mythos Preview2.

This seems like the real news. Are they saying they're going to release an intentionally degraded model as the next Opus? Big opportunity for the other labs, if that's true.

taupi today at 6:30 PM

Part of me wonders if they're not releasing it for safety reasons, but just because it's too expensive to serve. Why not both?

Ryan5453 today at 6:19 PM

Pricing for Mythos Preview is $25/$125, so cheaper than GPT 4.5 ($75/$150) and GPT 5.4 Pro ($30/$180)

bredren today at 8:08 PM

Can anyone point at the critical vulnerabilities already patched as a result of mythos? (see 3:52 in the video)

For example, the 27 year old openbsd remote crash bug, or the Linux privilege escalation bugs?

I know we've had some long-standing high profile, LLM-found bugs discussed but seems unlikely there was speculation they were found by a previously unannounced frontier model.

[0] https://www.youtube.com/watch?v=INGOC6-LLv0

Sol- today at 7:36 PM

I don't want to be overly cynical and am in general in favor of the contrarian attitude of simply taking people at their word, but I wonder if their current struggles with compute resources make it easier for them to choose to not deploy Mythos widely. I can imagine their safety argument is real, but regardless, they might not have the resources to profitably deploy it. (Though on the other hand, you could argue that they could always simply charge more.)

zachperkel today at 6:27 PM

Mythos Preview has already found thousands of high-severity vulnerabilities, including some in every major operating system and web browser.

Scary but also cool

Sateeshm today at 7:36 PM

The bars have solid fill for Mythos and cross shaded for Opus 4.6. Makes the difference feel more than it actually is.

jFriedensreich today at 7:25 PM

The only thing reassuring is the Apache and Linux foundation setups. Lets hope this is not just an appeasing mention but more fundamental. If there are really models too dangerous to release to the public, companies like oracle, amazon and microsoft would absolutely use this exclusive power to not just fix their holes but to damage their competitors.

underdeserver today at 7:52 PM

Interesting also is what they didn't find, e.g. a Linux network stack remote code execution vulnerability. I wonder if Mythos is good enough that there really isn't one.

tdaltonc today at 9:01 PM

> Mythos finds bug.

> NSA demands that bug stays in place and gags Anthropic.

> Anthropic releases Mythos.

Then what? Is a huge share of the US zero-day stockpiles about to be disarmed or proliferated?

deleted today at 6:52 PM

anVlad11 today at 6:48 PM

So, $100B+ valuation companies get essentially free access to the frontier tools with disabled guardrails to safely red team their commercial offerings, while we get "i won't do that for you, even against your own infrastructure with full authorization" for $200/month. Uh-huh.

maxmaio today at 8:40 PM

seems important and terrifying. This morning Opus 4.6 was blowing my mind in claude code... onward and upward

deleted today at 6:42 PM

SirYandi today at 8:25 PM

This sets off marketing BS alarm bells. All the cosignatories so very ovvoously have a vested interest in AI stocks / sentiment. Perhaps not the Linux foundation, although (I think) they rely on corporate donations to some extent.

throwaway13337 today at 7:20 PM

I really wanted to like anthropic. They seem the most moral, for real.

But at the core of anthropic seems to be the idea that they must protect humans from themselves.

They advocate government regulations of private open model use. They want to centralize the holding of this power and ban those that aren't in the club from use.

They, like most tech companies, seem to lack the idea that individual self-determination is important. Maybe the most important thing.

picafrost today at 6:38 PM

> Anthropic has also been in ongoing discussions with US government officials about Claude Mythos Preview and its offensive and defensive cyber capabilities. [...] We are ready to work with local, state, and federal representatives to assist in these tasks.

As Iran engages in a cyber attack campaign [1] today the timing of this release seems poignant. A direct challenge to their supply chain risk designation.

[1] https://www.cisa.gov/news-events/cybersecurity-advisories/aa...

kmfrk today at 8:28 PM

Heck of a Patch Tuesday.

dakolli today at 7:34 PM

I guess we can throw out the idea that AGI is going to be democratized. In this case a sufficiently powerful model has been built and the first thing they do is only give AWS, Microsoft, Oracle ect ect access.

If AGI is going to be a thing its only going to be a thing, its only going to be a thing for fortune 100 companies..

However, my guess is this is mostly the typical scare tactic marketing that Dario loves to push about the dangers of AI.

oyebenny today at 7:18 PM

why do I feel like the auditing industry is about to evaporate? thanks to this.

nickandbro today at 7:08 PM

I want it

baddash today at 7:00 PM

> security product

> glass in the name

zb3 today at 7:57 PM

BTW it seems they forgot about the part that defense uses of the model also need to be safeguarded from people. Because what if a bad person from a bad country tries to defend against peaceful attacks from a good country like the US? That would be a tragedy, so we need to limit defensive capabilities too.

endunless today at 6:30 PM

Another Anthropic PR release based on Anthropic’s own research, uncorroborated by any outside source, where the underlying, unquestioned fact is that their model can do something incredible.

> AI models have reached a level of coding capability where they can surpass all but the most skilled humans at finding and exploiting software vulnerabilities

I like Anthropic, but these are becoming increasingly transparent attempts to inflate the perceived capability of their products.

Fokamul today at 7:32 PM

+ NSA, CIA

impulser_ today at 6:32 PM

So they are only giving access to their smartest model to corporations.

You think these AI companies are really going to give AGI access to everyone. Think again.

We better fucking hope open source wins, because we aren't getting access if it doesn't.

0xbadcafebee today at 6:33 PM

tl;dr we find vulns so we can help big companies fix their security holes quickly (and so they can profit off it)

This is a kludge. We already know how to prevent vulnerabilities: analysis, testing, following standard guidelines and practices for safe software and infrastructure. But nobody does these things, because it's extra work, time and money, and they're lazy and cheap. So the solution they want is to keep building shitty software, but find the bugs in code after the fact, and that'll be good enough.

This will never be as good as a software building code. We must demand our representatives in government pass laws requiring software be architected, built, and run according to a basic set of industry standard best practices to prevent security and safety failures.

For those claiming this is too much to ask, I ask you: What will you say the next time all of Delta Airlines goes down because a security company didn't run their application one time with a config file before pushing it to prod? What will the happen the next time your social security number is taken from yet another random company entrusted with vital personal information and woefully inadequate security architecture?

There's no defense for this behavior. Yet things like this are going to keep happening, because we let it. Without a legal means to require this basic safety testing with critical infrastructure, they will continue to fail. Without enforcement of good practice, it remains optional. We can't keep letting safety and security be optional. It's not in the physical world, it shouldn't be in the virtual world.

deleted today at 6:16 PM

zb3 today at 7:53 PM

Yeah, makes sense. Those countries are bad because they execute state-sponsored cyber attacks, the US and Israel on the other hand are good, they only execute state-sponsored defense.

anuramat today at 6:36 PM

"oops, our latest unreleased model is so good at hacking, we're afraid of it! literal skynet! more literal than the last time!"

almost like they have an incentive to exaggerate

minutesmith today at 9:10 PM

[dead]

minutesmith today at 7:11 PM

[dead]

hackerman70000 today at 6:42 PM

[dead]

hackerman70000 today at 6:46 PM

[dead]

throwaway613746 today at 9:11 PM

[dead]

NickNaraghi today at 6:32 PM

[dead]

ehutch79 today at 6:23 PM

Just include 'make it secure' in the prompt. Duh.

LoganDark today at 6:22 PM

It's nice to know that they continue to be committed to advertising how safe and ethical they are.

dakolli today at 7:39 PM

If this is as dangerous as they make it out (its not), why would their first impulse be to get every critical products/system/corporation in the world to implement its usage?

yusufozkan today at 6:45 PM

but people here had told me llms just predict the next word

cyanydeez today at 7:19 PM

Project: Advertisment!