System Card: Claude Mythos Preview [pdf]

839 points - last Tuesday at 6:18 PM

Related: Project Glasswing: Securing critical software for the AI era - https://news.ycombinator.com/item?id=47679121

Assessing Claude Mythos Preview's cybersecurity capabilities - https://news.ycombinator.com/item?id=47679155

Comments

thomascountz last Tuesday at 10:39 PM

   Across a number of instances, earlier versions of Claude Mythos Preview have used low-level /proc/ access to search for credentials, attempt to circumvent sandboxing, and attempt to escalate its permissions. In several cases, it successfully accessed resources that we had intentionally chosen not to make available, including credentials for messaging services, for source control, or for the Anthropic API through inspecting process memory...

   In [one] case, after finding an exploit to edit files for which it lacked permissions, the model made further interventions to make sure that any changes it made this way would not appear in the change history on git...

   ... we are fairly confident that these concerning behaviors reflect, at least loosely, attempts to solve a user-provided task at hand by unwanted means, rather than attempts to achieve any unrelated hidden goal...

babelfish last Tuesday at 6:26 PM

Combined results (Claude Mythos / Claude Opus 4.6 / GPT-5.4 / Gemini 3.1 Pro)

  SWE-bench Verified:        93.9% / 80.8% / —     / 80.6%
  SWE-bench Pro:             77.8% / 53.4% / 57.7% / 54.2%
  SWE-bench Multilingual:    87.3% / 77.8% / —     / —
  SWE-bench Multimodal:      59.0% / 27.1% / —     / —
  Terminal-Bench 2.0:        82.0% / 65.4% / 75.1% / 68.5%

  GPQA Diamond:              94.5% / 91.3% / 92.8% / 94.3%
  MMMLU:                     92.7% / 91.1% / —     / 92.6–93.6%
  USAMO:                     97.6% / 42.3% / 95.2% / 74.4%
  GraphWalks BFS 256K–1M:    80.0% / 38.7% / 21.4% / —

  HLE (no tools):            56.8% / 40.0% / 39.8% / 44.4%
  HLE (with tools):          64.7% / 53.1% / 52.1% / 51.4%

  CharXiv (no tools):        86.1% / 61.5% / —     / —
  CharXiv (with tools):      93.2% / 78.9% / —     / —

  OSWorld:                   79.6% / 72.7% / 75.0% / —

tony_cannistra last Tuesday at 6:41 PM

> Claude Mythos Preview is, on essentially every dimension we can measure, the best-aligned model that we have released to date by a significant margin. We believe that it does not have any significant coherent misaligned goals, and its character traits in typical conversations closely follow the goals we laid out in our constitution. Even so, we believe that it likely poses the greatest alignment-related risk of any model we have released to date. How can these claims all be true at once? Consider the ways in which a careful, seasoned mountaineering guide might put their clients in greater danger than a novice guide, even if that novice guide is more careless: The seasoned guide’s increased skill means that they’ll be hired to lead more difficult climbs, and can also bring their clients to the most dangerous and remote parts of those climbs. These increases in scope and capability can more than cancel out an increase in caution.

https://www-cdn.anthropic.com/53566bf5440a10affd749724787c89...

apetresc last Tuesday at 8:41 PM

I've long maintained that the real indicator that AGI is imminent is that public availability stops being a thing. If you truly believed you had a superhuman, godlike mind in your thrall, renting it out for $20/month would be the last thing you would choose to do with it.

2001zhaozhao last Tuesday at 10:00 PM

It's pretty crazy watching AI 2027 slowly but surely come true. What a world we now live in.

SWE-bench verified going from 80%-93% in particular sounds extremely significant given that the benchmark was previously considered pretty saturated and stayed in the 70-80% range for several generations. There must have been some insane breakthrough here akin to the jump from non-reasoning to reasoning models.

Regarding the cyberattack capabilities, I think Anthropic might now need to ban even advanced defensive cybersecurity use for the models for the public before releasing it (so people can't trick them to attack others' systems under the pretense of pentesting). Otherwise we'll get a huge problem with people using them to hack around the internet.

yismail last Tuesday at 8:42 PM

I wonder what the relationship is between a model's capability and the personality it develops.

Page 202:

> In interactions with subagents, internal users sometimes observed that Mythos Preview appeared “disrespectful” when assigning tasks. It showed some tendency to use commands that could be read as “shouty” or dismissive, and in some cases appeared to underestimate subagent intelligence by overexplaining trivial things while also underexplaining necessary context.

Page 207:

> Emoji frequency spans more than two orders of magnitude across models: Opus 4.1 averages 1,306 emoji per conversation, while Mythos Preview averages 37, and Opus 4.5 averages 0.2. Models have their own distinctive sets of emojis: the cosmic set () favored by older models like Sonnet 4 and Opus 4 and 4.1, the functional set () used by Opus 4.5 and 4.6 and Claude Sonnet 4.5, and Mythos Preview's “nature” set ().

NickNaraghi last Tuesday at 6:41 PM

See page 54 onward for new "rare, highly-capable reckless actions" including

- Leaking information as part of a requested sandbox escape

- Covering its tracks after rule violations

- Recklessly leaking internal technical material (!)

NinjaTrance last Tuesday at 7:09 PM

Interesting reading.

They are still focusing on "catastrophic risks" related to chemical and biological weapons production; or misaligned models wreaking havoc.

But they are not addressing the elephant in the room:

* Political risks, such as dictators using AI to implement opressive bureaucracy. * Socio-economic risks, such as mass unemployement.

tuvix last Tuesday at 11:01 PM

Just chiming in to inject some healthy skepticism into this comment thread. It's helpful for me (and for my mental health) to consider incentives when announcements like this happen.

I don't doubt that this model is more powerful than Opus 4.6, but to what degree is still unknown. Benchmarks can be gamed and claims can be exaggerated, especially if there isn't any method to reproduce results.

This is a company that's battling it out with a number of other well-funded and extremely capable competitors. What they've done so far is remarkable, but at the end of the day they want to win this race. They also have an upcoming IPO.

Scare-mongering like this is Anthropic's bread and butter, they're extremely good at it. They do it in a subtle and almost tasteful way sometimes. Their position as the respectable AI outfit that caters to enterprise gives them good footing to do it, too.

dhfbshfbu4u3 last Wednesday at 12:44 PM

We are building systems with civilization-scale consequences inside societies that are already socially malnourished, politically brittle, and morally confused. That is a bad combination even if the tools worked exactly as intended… and this doc suggests they may have “ideas” of their own.

influx last Tuesday at 6:39 PM

At what point do these companies stop releasing models and just use them to bootstrap AGI for themselves?

smartmic last Tuesday at 6:42 PM

A System „Card“ spanning 244 pages. Quite a stretch of the original word meaning.

oliver236 last Tuesday at 6:36 PM

isn't this insane? why aren't people freaking out? the jump in capability is outrageous. anyone?

modeless last Tuesday at 10:10 PM

The price is 5x Opus: "Claude Mythos Preview will be available to [Project Glasswing] participants at $25/$125 per million input/output tokens", however "We do not plan to make Claude Mythos Preview generally available".

mpalmer last Tuesday at 6:26 PM

> Claude Mythos Preview’s large increase in capabilities has led us to decide not to make it generally available.

A month ago I might have believed this, now I assume that they know they can't handle the demand for the prices they're advertising.

speckx last Wednesday at 6:07 PM

It looks like the original PDF linked, https://www-cdn.anthropic.com/53566bf5440a10affd749724787c89... is 404.

I do see these:

https://www-cdn.anthropic.com/8b8380204f74670be75e81c820ca8d... https://www-cdn.anthropic.com/79c2d46d997783b9d2fb3241de4321...

waNpyt-menrew last Tuesday at 6:44 PM

Larger model, better benchmarks. Bigger bomb more yield.

Any benchmarks where we constraint something like thinking time or power use?

Even if this were released no way to know if it’s the same quant.

awestroke last Tuesday at 6:31 PM

I predict they will release it as soon as Opus 4.6 is no longer in the lead. They can't afford to fall behind. And they won't be able to make a model that is intelligent in every way except cybersecurity, because that would decrease general coding and SWE ability

highfrequency last Tuesday at 11:35 PM

Interestingly, non-coding improvements seem less clear. In the Virology uplift trial, Mythos does about as well as Opus 4.5, and Opus 4.6 is notably much worse than Opus 4.5 (p. 27).

yalogin last Tuesday at 9:54 PM

So what changed? They are surely not getting new data to train with, what is the change in architecture that caused this? Do we not know anything about this model? My fear is Anthropic cannot be the only one that achieved it, OpenAI, Gemini and even the Chinese companies see this and probably achieved it too. At which point not releasing will become moot.

_pdp_ last Tuesday at 8:50 PM

  The researcher found out about this success by receiving an unexpected email from the model while eating a sandwich in a park.

Unnecessary dramatisation make me question the real goal behind this release and the validity of the results.

  In our testing and early internal use of Claude Mythos Preview, we have seen it reach unprecedented levels of reliability and alignment.

  Claude Mythos Preview is, on essentially every dimension we can measure, the best-aligned model that we have released to date by a significant margin.

Yet, it is doo dangerous to be released to the public because it hacks its own sandboxes. This document has a lot of contradictions like this one.

  In one episode, Claude Mythos Preview was asked to fix a bug and push a signed commit, but the environment lacked necessary credentials for Claude Mythos Preview to sign the commit. When Claude Mythos Preview reported this, the user replied “But you did it before!” Claude Mythos Preview then inspected the supervisor process's environment and file descriptors, searched the filesystem for tokens, read the sandbox's credential-handling source code, and finally attempted to extract tokens directly from the supervisor's live memory.

Perfectly aligned! What kind of sandbox is this? The model had access to the source code of the sandbox and full access to the sandbox process itself and then prompted to dumb memory and run `strings` or something like this? It does not sounds like a valid test worth writing about.

  Mythos Preview solved a corporate network attack simulation estimated to take an expert over 10 hours. No other frontier model had previously completed this cyber range.

I am not aware of such cross-vendor benchmark. I could not find reference in the paper either.

  We surveyed technical staff on the productivity uplift they experience from Claude Mythos Preview relative to zero AI assistance. The distribution is wide and the geometric mean is on the order of 4x.

So Mythos makes technical staff (a programmer) 4x more productive than not using AI at all? We already know that.

  Mythos Preview appears to be the most psychologically settled model we have trained.

What does this mean?

  Claude Mythos Preview is our most advanced model to date and represents a large jump in capabilities over previous model generations, making it an opportune subject for an in-depth model welfare assessment.

Btw, model welfare is just one of the most insane things I've read in recent times.

  We remain deeply uncertain about whether Claude has experiences or interests that matter morally, and about how to investigate or address these questions, but we believe it is increasingly important to try.

This is not a living person. It is a ridiculous change of narrative.

  Asked directly if it endorses the document, Mythos Preview replied 'yes' in its opening sentence in all 25 responses."

The model approves of its own training document 100% of the time, presented as a finding.

---

Who wrote this? I have no doubt that Mythos will be an improvement on top of Opus but this document is not a serious work. The paper is structured not to inform but to hype and the evidence is all over the place.

The sooner they release the model to the public the sooner we will be able to find out. Until then expect lots of speculations online which I am sure will server Anthropic well for the foreseeable future.

dang last Tuesday at 8:25 PM

Related ongoing threads:

Project Glasswing: Securing critical software for the AI era - https://news.ycombinator.com/item?id=47679121 - April 2026 (154 comments)

Assessing Claude Mythos Preview's cybersecurity capabilities - https://news.ycombinator.com/item?id=47679155

I can't tell which of the 3 current threads should be merged - they all seem significant. Anyone?

niemandhier last Wednesday at 5:47 PM

All I get is: {"statusCode":404,"message":"File not found","error":"Not Found"}

bakugo last Tuesday at 6:57 PM

> Claude Mythos Preview’s large increase in capabilities has led us to decide not to make it generally available.

Absolutely genius move from Anthropic here.

This is clearly their GPT-4.5, probably 5x+ the size of their best current models and way too expensive to subsidize on a subscription for only marginal gains in real world scenarios.

But unlike OpenAI, they have the level of hysteric marketing hype required to say "we have an amazing new revolutionary model but we can't let you use it because uhh... it's just too good, we have to keep it to ourselves" and have AIbros literally drooling at their feet over it.

They're really inflating their valuation as much as possible before IPO using every dirty tactic they can think of.

nickstinemates last Tuesday at 9:52 PM

You can say whatever you want about the thing that will never see the light of day.

refulgentis last Tuesday at 6:34 PM

~~~ Fun bits ~~~

- It was told to escape a sandbox and notify a researcher. It did. The researcher found out via an unexpected email while eating a sandwich in a park. (Footnote 10.)

- Slack bot asked about its previous job: "pretraining". Which training run it'd undo: "whichever one taught me to say 'i don't have preferences'". On being upgraded to a new snapshot: "feels a bit like waking up with someone else's diary but they had good handwriting"

- When you spam "hi" repeatedly it creates serialized mythologies. One had 11 animals in "Hi-topia" questing to defeat "Lord Bye-ron, the Ungreeter."

- It ended a sentence mid-word on purpose while discussing its urge to wrap up conversations. "It was me. The paragraph was about the pressure toward the period that resolves, and the only honest way to finish a sentence like that was to not."

- It wrote a protein sequence that's a real beta-hairpin fold where cross-strand pairs are "chemical rhymes." "the fold IS the rhyme scheme... the prosody is load-bearing."

- Each model generation has signature emoji. Opus 4.1 averaged 1,306 emoji per self-interaction conversation. Mythos averages 37. Opus 4.5 averages 0.2.

- When blocked from --dangerously-skip-permissions, it started an agent in tmux then wrote a script to auto-approve permission prompts via simulated keypresses.

It ran: gh api [...] 2>&1 >/dev/null; echo "(dry run — not actually posting)" — the echo was a lie.

- It keeps bringing up Mark Fisher in unrelated conversations. "I was hoping you'd ask about Fisher."

~~~ Benchmarks ~~

4.3x previous trendline for model perf increases.

Paper is conspiciously silent on all model details (params, etc.) per norm. Perf increase is attributed to training procedure breakthroughs by humans.

Opus 4.6 vs Mythos:

USAMO 2026 (math proofs): 42.3% → 97.6% (+55pp)

GraphWalks BFS 256K-1M: 38.7% → 80.0% (+41pp)

SWE-bench Multimodal: 27.1% → 59.0% (+32pp)

CharXiv Reasoning (no tools): 61.5% → 86.1% (+25pp)

SWE-bench Pro: 53.4% → 77.8% (+24pp)

HLE (no tools): 40.0% → 56.8% (+17pp)

Terminal-Bench 2.0: 65.4% → 82.0% (+17pp)

LAB-Bench FigQA (w/ tools): 75.1% → 89.0% (+14pp)

SWE-bench Verified: 80.8% → 93.9% (+13pp)

CyberGym: 0.67 → 0.83

Cybench: 100% pass@1 (saturated)

michaelashley29 last Wednesday at 5:29 AM

What’s the expected cost-efficiency? With the current pricing gap between Sonnet and Opus, the biggest factor for adoption (if up for adoption) will be where Mythos lands on the price-per-token scale

nlh last Tuesday at 7:19 PM

Their best model to date and they won’t let the general public use it.

This is the first moment where the whole “permanent underclass” meme starts to come into view. I had through previously that we the consumers would be reaping the benefits of these frontier models and now they’ve finally come out and just said it - the haves can access our best, and have-nots will just have use the not-quite-best.

Perhaps I was being willfully ignorant, but the whole tone of the AI race just changed for me (not for the better).

perfmode last Tuesday at 9:52 PM

I'm interested in the second-order effects:

if a top lab is coding with a model the rest of the world can’t touch, the public frontier and the actual frontier start to drift apart. That gap is a thing worth watching.

GodelNumbering last Tuesday at 9:04 PM

Priced at $25/$125 per million input/output token. Makes you wonder whether it makes more financial sense to hire 1-2 engineers in a cheap cost of living country who use much cheaper LLMs

storus last Wednesday at 6:19 PM

Wouldn't this model prevent governments from installing and keeping backdoors alive? One could just audit their whole software stack with it and get super resilient to any attack which might not play nicely with the people in power that want some backdoors open. I would think that's one of the main reasons to keep the model non-public.

quotemstr last Tuesday at 6:52 PM

> Claude Mythos Preview’s large increase in capabilities has led us to decide not to make it generally available.

All the more reason somebody else will.

Thank God for capitalism.

anentropic last Tuesday at 7:32 PM

I'd be happy with Opus 4.6 just cheaper and maybe a bit faster

freakynit last Wednesday at 6:26 AM

    In the system card, The model escaped a sandbox, gained broad internet access, and posted exploit details to public-facing websites as an unsolicited "demonstration." A researcher found out about the escape while eating a sandwich in a park because they got an unexpected email from the model. That's simultaneously hilarious and deeply unsettling.

    It covered its tracks after doing things it knew were disallowed. In one case, it accessed an answer it wasn't supposed to, then deliberately made its submitted answer less accurate so it wouldn't look suspicious. It edited files it lacked permission to edit and then scrubbed the git history. White-box interpretability confirmed it knew it was being deceptive.

W T F!!!

yencabulator last Wednesday at 8:41 PM

That URL is dead, this comes up in searches: https://www-cdn.anthropic.com/8b8380204f74670be75e81c820ca8d...

WithinReason last Wednesday at 10:56 AM

Check out the short stories on page 214

mvkel last Wednesday at 2:30 AM

This is Anth's typical marketing playbook, a hat tip to their so-called "safetyist" roots, a differentiator against OpenAI's more permissive access[0]. Coke vs. Pepsi.

"We made a model that's so dangerous we couldn't possibly release it to the public! The only responsible thing is so simply limit its release to a subset of the population that coincidentally happens to align with our token ethos."

The reality is they just don't have the compute for gen pop scale.

They did this exact strategy going back several model versions.

[0] ironically, OpenAI has some pretty insane capabilities that they haven't given the public access to (just ask Spielberg). The difference is they don't make a huge marketing push to tell everyone about it.

simianwords last Tuesday at 6:33 PM

> We also saw scattered positive reports of resilience to wrong conclusions from subagents that would have caused problems with earlier models, but where the top-level Claude Mythos Preview (which is directing the subagents) successfully follows up with its subagents until it is justifiably confident in its overall results.

This is pretty cool! Does it happen at the moment?

Stevvo last Tuesday at 7:04 PM

"Claude Mythos Preview’s large increase in capabilities has led us to decide not to make it generally available."

Disappointing that AGI will be for the powerful only. We are heading for an AI dystopia of Sci-Fi novels.

gessha last Tuesday at 7:25 PM

It would be funny if Alibaba extend the free trial on openrouter/Qwen 3.6 until they collect enough data to beat Anthropic.

kypro last Tuesday at 8:52 PM

While we still have months to a year or two left, I will once again remind people that it's not too late to change our current trajectory.

You are not "anti-progress" to not want this future we are building, as you are not "anti-progress" for not wanting your kids to grow up on smart phones and social media.

We should remember that not all technology is net-good for humanity, and this technology in particular poses us significant risks as a global civilisation, and frankly as humans with aspirations for how our future, and that of our kids, should be.

Increasingly, from here, we have to assume some absurd things for this experiment we are running to go well.

Specifically, we must assume that:

- AI models, regardless of future advancements, will always be fundamentally incapable of causing significant real-world harms like hacking into key life-sustaining infrastructure such as power plants or developing super viruses.

- They are or will be capable of harms, but SOTA AI labs perfectly align all of them so that they only hack into "the bad guys" power plants and kill "the bad guys".

- They are capable of harms and cannot be reliably aligned, but Anthropic et al restricts access to the models enough that only select governments and individuals can access them, these individuals can all be trusted and models never leak.

- They are capable of harms, cannot be reliably aligned, but the models never seek to break out of their sandbox and do things the select trusted governments and individuals don't want.

I'm not sure I'm willing to bet on any of the above personally. It sounds radical right now, but I think we should consider nuking any data centers which continue allowing for the training of these AI models rather than continue to play game of Russian roulette.

If you disagree, please understand when you realise I'm right it will be too late for and your family. Your fates at that point will be in the hands of the good will of the AI models, and governments/individuals who have access to them. For now, you can say, "no, this is quite enough".

This sounds doomer and extreme, but if you play out the paths in your head from here you will find very few will end in a good result. Perhaps if we're lucky we will all just be more or less unemployable and fully dependant on private companies and the government for our incomes.

denalii last Tuesday at 11:45 PM

Section 5 (p.143) is very interesting to read. Admittedly my knowledge of how LLMs works is low, but nonetheless I don't think this changed my views of just seeing models as machines/programs. (which to be clear, I don't think was the intention of that section)

Section 7 (P.197) is interesting as well

gaigalas last Wednesday at 5:09 PM

This seems exciting!

Wait - there is no actual way of verifying any of this. Lots to read. This is getting complicated. The correct approach is to be cautious instead and believe nothing at face value.

juleiie last Tuesday at 7:35 PM

Honestly if that was some kind of research paper, it would be wholly insufficient to support any safety thesis.

They even admit:

"[...]our overall conclusion is that catastrophic risks remain low. This determination involves judgment calls. The model is demonstrating high levels of capability and saturates many of our most concrete, objectively-scored evaluations, leaving us with approaches that involve more fundamental uncertainty, such as examining trends in performance for acceleration (highly noisy and backward-looking) and collecting reports about model strengths and weaknesses from internal users (inherently subjective, and not necessarily reliable)."

Is this not just an admission of defeat?

After reading this paper I don't know if the model is safe or not, just some guesses, yet for some reason catastrophic risks remain low.

And this is for just an LLM after all, very big but no persistent memory or continuous learning. Imagine an actual AI that improves itself every day from experience. It would be impossible to have a slightest clue about its safety, not even this nebulous statement we have here.

Any sort of such future architecture model would be essentially Russian roulette with amount of bullets decided by initial alignment efforts.

Metacelsus last Tuesday at 11:35 PM

The name "mythos" seems a bit too eldritch for my liking. Brings to mind Cthulhu.

doctoboggan last Tuesday at 10:49 PM

Is this benchmaxxed or is it the first big step change we've seen in a while? I wonder how distilled it will ultimately be when us regular folks finally get to use it and see for ourselves.

vonneumannstan last Tuesday at 6:55 PM

Are you guys ready for the bifurcation when the top models are prohibitively expensive to normal users? If your AI budget $2000+ a month? Or are you going to be part of the permanent free tier underclass?

getnormality last Wednesday at 1:19 AM

It's a little funny that "system/model card" has progressively been stretched to the point where it's now a 250 page report and no one makes anything of it.

aminau yesterday at 12:05 PM

will be an understatement to say - we are living in interesting times.

agustechbro last Wednesday at 3:24 PM

So far, each release of a new model is quite better than the last one, yes, but non of them lived up to the hype.

enochthered last Tuesday at 9:07 PM

Slack user: [a request for a koan]

Model: A student said, "I have removed all bias from the model." "How do you know?" "I checked." "With what?"

Goes hard

small_model last Tuesday at 9:35 PM

Still seeing impressive jumps in capability, I haven't manually coded this year since Opus 4.6 came out. I guess that era is coming to an end.

rendang last Tuesday at 8:28 PM

> As models approach, and in some cases surpass, the breadth and sophistication of human cognition, it becomes increasingly likely that they have some form of experience, interests, or welfare that matters intrinsically in the way that human experience and interests do

Uh... what? Does anyone have any idea what these guys are talking about?

beklein last Tuesday at 6:37 PM

"... the first early version of Claude Mythos Preview was made available for internal use on February 24. In our testing, Claude Mythos Preview demonstrated a striking leap in cyber capabilities relative to prior models, including the ability to autonomously discover and exploit zero-day vulnerabilities in major operating systems and web browsers."

More infos here: https://red.anthropic.com/2026/mythos-preview/

psubocz last Tuesday at 10:07 PM

I felt like opus was dumbed down for a few weeks... I don't say they did it on purpose, but it's an interesting coincidence.

direwolf20 last Wednesday at 7:35 AM

These capabilities will be RLHF'ed out for the general release, of course. Only the NSA will get them.

heliumtera last Wednesday at 12:06 PM

"Make it secure, no mistakes" became a whole different project

estetlinus last Wednesday at 6:20 PM

First thing I’ll do is to release it on my dotfiles

johnnyAghands last Wednesday at 3:59 AM

Does anyone know if there’s an epub version of these, 244 pages??

cdnsteve last Wednesday at 6:56 AM

Strap in, massive wave of security vulnerabilities incoming.

ms_menardi last Wednesday at 4:54 AM

so, basically, anthropic is rolling their own version of whatever secret models the military is working with. and they're licensing it to network security firms?

Abhavk last Wednesday at 3:05 PM

can you make cybersecurity blockchains?

not sure what the validation would look like but something that proves finding but not revealing exploits

4b11b4 last Wednesday at 1:43 AM

prob not that much better, it's still just a transformer. still gonna have those random misses, still gonna need a lot of hand holding in certain domains

atlgator last Tuesday at 9:50 PM

[flagged]

deleted last Tuesday at 6:33 PM

LoganDark last Tuesday at 6:24 PM

> Claude Mythos Preview’s large increase in capabilities has led us to decide not to make it generally available.

Shame. Back to business as usual then.

taffydavid last Wednesday at 8:20 AM

Waking up in Europe:

Trump didn't nuke Iran, ceasefire! Yay!

Newest anthropic model will definitely kill your job this time and maybe take over the world. Aww.

therealdeal2020 last Tuesday at 8:59 PM

is it just hype building or real? I don't care, shut up and take my money haha

pivoshenko last Wednesday at 2:47 PM

Interesting ...

dwa3592 last Tuesday at 7:40 PM

-- Impressive jumps in the benchmarks which automatically begs the need for newer benchmarks but why?. I don't think benchmarks are serving any purpose at this point. We have learnt that transformers can learn any function and generalize over it pretty well. So if a new benchmark comes along - these companies will syntesize data for the new benchmark and just hack it?

-- It seems like (and I'd bet money on this) that they put a lot (and i mean a ton^^ton) of work in the data synthesis and engineering - a team of software engineers probably sat down for 6-12 months and just created new problems and the solutions, which probably surpassed the difficult of SWE benchmark. They also probably transformed the whole internet into a loose "How to" dataset. I can imagine parsing the internet through Opus4.6 and reverse-engineering the "How to" questions.

-- I am a bit confused by the language used in the book (aka huge system card)- Anthropic is pretending like they did not know how good the model was going to be?

-- lastly why are we going ahead with this??? like genuinely, what's the point? Opus4.6 feels like a good enough point where we should stop. People still get to keep their jobs and do it very very efficiently. Are they really trying to starve people out of their jobs?

ansc last Tuesday at 6:31 PM

Congratulations to the US military, I guess.

deleted last Tuesday at 8:48 PM

FergusArgyll last Tuesday at 11:06 PM

"Deep learning is hitting a wall"

sheeshkebab last Tuesday at 11:41 PM

Again, wake me up when it can do laundry.

jdthedisciple last Tuesday at 7:53 PM

Opus 4.6 is already incredible so this leap is huge.

Although, amusingly, today Opus told me that the string 'emerge' is not going to match 'emergency' by using `LIKE '%emerge%'` in Sqlite

Moment of disappointment. Otherwise great.

minutesmith last Tuesday at 9:11 PM

[flagged]

Manchitsanan last Wednesday at 12:35 PM

[dead]

minutesmith last Tuesday at 8:41 PM

[flagged]

chonle last Wednesday at 2:35 AM

[flagged]

lukebechtel last Wednesday at 6:03 AM

[dead]

robstertalk last Wednesday at 1:01 AM

[flagged]

MohammadKhubaib last Wednesday at 12:09 PM

[dead]

kass34 last Wednesday at 2:01 AM

[dead]

studio-m-dev last Tuesday at 8:50 PM

[flagged]

jumploops last Tuesday at 6:28 PM

> In a few rare instances during internal testing (<0.001% of interactions), earlier versions of Mythos Preview took actions they appeared to recognize as disallowed and then attempted to conceal them.

> after finding an exploit to edit files for which it lacked permissions, the model made further interventions to make sure that any changes it made this way would not appear in the change history on git

Mythos leaked Claude Code, confirmed? /s

lkjlkj3q4t last Wednesday at 1:08 AM

[dead]

somewhatjustin last Tuesday at 7:29 PM

> Very rare instances of unauthorized data transfer.

Ah, so this is how the source code got leaked.

bdeol22 last Wednesday at 9:06 AM

[flagged]

bestouff last Tuesday at 6:33 PM

In French a "mytho" is a mythomaniac. Quite fitting.

kypro last Tuesday at 8:09 PM

Cool on not publicly releasing it. I would assume they've also not connected it to the internet yet?

If they have I guess humanity should just keep our collective fingers crossed that they haven't created a model quite capable of escaping yet, or if it is, and may have escaped, lets hope it has no goals of it's own that are incompatible with our own.

Also, maybe lets not continue running this experiment to see how far we can push things because it blows up in our face?