GPT‑5.4 Mini and Nano

148 points - today at 5:07 PM

Source

Comments

Rapzid today at 8:36 PM
Oh.. I thought maybe these would be upgrades to gpt-4.1 and gpt-4.1-mini and etc.. But the latency is way too high compared to the 400-600. Yeah, different models and etc but the naming is confusing.
Tiberium today at 6:17 PM
I checked the current speed over the API, and so far I'm very impressed. Of course models are usually not as loaded on the release day, but right now:

- Older GPT-5 Mini is about 55-60 tokens/s on API normally, 115-120 t/s when used with service_tier="priority" (2x cost).

- GPT-5.4 Mini averages about 180-190 t/s on API. Priority does nothing for it currently.

- GPT-5.4 Nano is at about 200 t/s.

To put this into perspective, Gemini 3 Flash is about 130 t/s on Gemini API and about 120 t/s on Vertex.

This is raw tokens/s for all models, it doesn't exclude reasoning tokens, but I ran models with none/minimal effort where supported.

And quick price comparisons:

- Claude: Opus 4.6 is $5/$25, Sonnet 4.6 is $3/$15, Haiku 4.5 is $1/$5

- GPT: 5.4 is $2.5/$15 ($5/$22.5 for >200K context), 5.4 Mini is $0.75/$4.5, 5.4 Nano is $0.2/$1.25

- Gemini: 3.1 Pro is $2/$12 ($3/$18 for >200K context), 3 Flash is $0.5/$3, 3.1 Flash Lite is $0.25/$1.5

pscanf today at 6:45 PM
I quite like the GPT models when chatting with them (in fact, they're probably my favorites), but for agentic work I only had bad experiences with them.

They're incredibly slow (via official API or openrouter), but most of all they seem not to understand the instructions that I give them. I'm sure I'm _holding them wrong_, in the sense that I'm not tailoring my prompt for them, but most other models don't have problem with the exact same prompt.

Does anybody else have a similar experience?

ibrahim_h today at 8:00 PM
The OSWorld numbers are kind of getting lost in the pricing discussion, but they're probably the most interesting signal here. Mini at 72.1% vs a 72.4% human baseline is basically within noise. At that point it stops being "flagship vs mini" and becomes "where does that last ~3% actually matter". For a lot of agent workflows this is already "good enough", so the default probably flips unless you're hitting specific failure modes.

One thing I'd watch though — context bleed into nano subagents in multi-model pipelines. A lot of orchestrators just forward the entire message history by default (or something like messages[-N:] without any real budgeting), so your "cheap" extraction step suddenly runs with 30–50K tokens of irrelevant context. At that point you've eaten most of the latency/cost win and added truncation risk on top.

Curious if anyone's actually measured where that cutoff is in practice — at what context size nano stops being meaningfully cheaper/faster in real pipelines, not benchmarks.

BoumTAC today at 5:46 PM
To me, mini releases matter much more and better reflect the real progress than SOTA models.

The frontier models have become so good that it's getting almost impossible to notice meaningful differences between them.

Meanwhile, when a smaller / less powerful model releases a new version, the jump in quality is often massive, to the point where we can now use them 100% of the time in many cases.

And since they're also getting dramatically cheaper, it's becoming increasingly compelling to actually run these models in real-life applications.

HugoDias today at 5:34 PM
According to their benchmarks, GPT 5.4 Nano > GPT-5-mini in most areas, but I'm noticing models are getting more expensive and not actually getting cheaper?

GPT 5 mini: Input $0.25 / Output $2.00

GPT 5 nano: Input: $0.05 / Output $0.40

GPT 5.4 mini: Input $0.75 / Output $4.50

GPT 5.4 nano: Input $0.20 / Output $1.25

mikkelam today at 7:23 PM
Why are we treating LLM evaluation like a vibe check rather than an engineering problem?

Most "Model X > Model Y" takes on HN these days (and everywhere) seem based on an hour of unscientific manual prompting. Are we actually running rigorous, version-controlled evals, or just making architectural decisions based on whether a model nailed a regex on the first try this morning?

technocrat8080 today at 7:40 PM
5.4 Mini's OSWorld score is a pleasant surprise. When SOTA scores were still ~30-40 models were too slow and inaccurate for realtime computer use agents (rip Operator/Agent). Curious if anyone's been using these in production.
cbg0 today at 5:58 PM
Based on the SWE-Bench it seems like 5.4 mini high is ~= GPT 5.4 low in terms of accuracy and price but the latency for mini is considerably higher at 254 seconds vs 171 seconds for GPT5.4. Probably a good option to run at lower effort levels to keep costs down for simpler tasks. Long context performance is also not great.
jbellis today at 8:07 PM
Benchmarking these now.

Preregistering my predictions:

Mini: better than Haiku but not as good as Flash 3, especially at reasoning=none.

Nano: worse than Flash 3 Lite. Probably better than Qwen 3.5 27b.

fastpdfai today at 7:16 PM
One thing I really want to find out, is which model and how to process TONS of pdfs very very fast, and very accurate. For prediction of invoice date, accrual accounting and other accounting related purposes. So a decent smart model that is really good at pdf and image reading. While still being very very fast.
derefr today at 7:36 PM
OpenAI don't talk about the "size" or "weights" of these models any more. Anyone have any insight into how resource-intensive these Mini/Nano-variant models actually are at this point?

I assume that OpenAI continue to use words like "mini" and "nano" in the names of these model variants, to imply that they reserve the smallest possible resource-units of their inference clusters... but, given OpenAI's scale, that may well be "one B200" at this point, rather than anything consumers (or even most companies) could afford.

I ask because I'm curious whether the economics of these models' use-cases and call frequency work out (both from the customer perspective, and from OpenAI's perspective) in favor of OpenAI actually hosting inference on these models themselves, vs. it being better if customers (esp. enterprise customers) could instead license these models to run on-prem as black-box software appliances.

But of course, that question is only interesting / only has a non-trivial answer, if these models are small enough that it's actually possible to run them on hardware that costs less to acquire than a year's querying quota for the hosted version.

ryao today at 5:34 PM
I will be impressed when they release the weights for these and older models as open source. Until then, this is not that interesting.
tintor today at 7:32 PM
Several customer testimonials for GPT-5.4 Mini have em dashes in them.

Did GPT write them?

dack today at 6:56 PM
i want 5.4 nano to decide whether my prompt needs 5.4 xhigh and route to it automatically
beklein today at 6:00 PM
As a big Codex user, with many smaller requests, this one is the highlight: "In Codex, GPT‑5.4 mini is available across the Codex app, CLI, IDE extension and web. It uses only 30% of the GPT‑5.4 quota, letting developers quickly handle simpler coding tasks in Codex for about one-third the cost." + Subagents support will be huge.
6thbit today at 6:19 PM
Looking at the long context benchmark results for these, sounds like they are best fit for also mini-sized context windows.

Is there any harness with an easy way to pick a model for a subagent based on the required context size the subagent may need?

bananamogul today at 6:33 PM
They could call them something like “sonnet” and “haiki” maybe.
kseniamorph today at 7:09 PM
wow, not bad result on the computer use benchmark for the mini model. for example, Claude Sonnet 4.6 shows 72.5%, almost on par with GPT-5.4 mini (72.1%). but sonnet costs 4x more on input and 3x more on output
powera today at 5:30 PM
I've been waiting for this update.

For many "simple" LLM tasks, GPT-5-mini was sufficient 99% of the time. Hopefully these models will do even more and closer to 100% accuracy.

The prices are up 2-4x compared to GPT-5-mini and nano. Were those models just loss leaders, or are these substantially larger/better?

simianwords today at 5:35 PM
why isn't nano available in codex? could be used for ingesting huge amount of logs and other such things
yomismoaqui today at 6:09 PM
Not comparing with equivalent models from Anthropic or Google, interesting...
machinecontrol today at 5:14 PM
What's the practical advantage of using a mini or nano model versus the standard GPT model?
varispeed today at 6:40 PM
I stopped paying attention to GPT-5.x releases, they seem to have been severely dumbed down.
casey2 today at 6:10 PM
I googled all the testimonial names and they are all linked-in mouthpieces.
reconnecting today at 6:33 PM
All three ChatGPT models (Instant, Thinking, and Pro) have a new knowledge cutoff of August 2025.

Seriously?

system2 today at 6:06 PM
I am feeling the version fatigue. I cannot deal with their incremental bs versions.
miltonlost today at 6:05 PM
Does it still help drive people to psychosis and murder and suicide? Where's the benchmark for that?
beernet today at 7:29 PM
Crazy how OAI is way behind now and the only one to blame is Sam, his ego and lust for influence. Their downwards trajectory of paying accounts since "the move" (DoW deal) is an open secret. If you had placed a new CEO at OAI six months ago and told him to destroy the company, it would have been hard for that CEO to do a better job at that than Sam did. Should have left when he was let go but decided to go full Greg and MAGA instead. Here we are. Go Dario