DeepSeek v4

1295 points - today at 3:01 AM

Source

Comments

jari_mustonen today at 6:34 AM
Open Source as it gets in this space, top notch developer documentation, and prices insanely low, while delivering frontier model capabilities. So basically, this is from hackers to hackers. Loving it!

Also, note that there's zero CUDA dependency. It runs entirely on Huawei chips. In other words, Chinese ecosystem has delivered a complete AI stack. Like it or not, that's a big news. But what's there not to like when monopolies break down?

hodgehog11 today at 7:11 AM
There are quite a few comments here about benchmark and coding performance. I would like to offer some opinions regarding its capacity for mathematics problems in an active research setting.

I have a collection of novel probability and statistics problems at the masters and PhD level with varying degrees of feasibility. My test suite involves running these problems through first (often with about 2-6 papers for context) and then requesting a rigorous proof as followup. Since the problems are pretty tough, there is no quantitative measure of performance here, I'm just judging based on how useful the output is toward outlining a solution that would hopefully become publishable.

Just prior to this model, Gemini led the pack, with GPT-5 as a close second. No other model came anywhere near these two (no, not even Claude). Gemini would sometimes have incredible insight for some of the harder problems (insightful guesses on relevant procedures are often most useful in research), but both of them tend to struggle with outlining a concrete proof in a single followup prompt. This DeepSeek V4 Pro with max thinking does remarkably well here. I'm not seeing the same level of insights in the first response as Gemini (closer to GPT-5), but it often gets much better in the followup, and the proofs can be _very_ impressive; nearly complete in several cases.

Given that both Gemini and DeepSeek also seem to lead on token performance, I'm guessing that might play a role in their capacity for these types of problems. It's probably more a matter of just how far they can get in a sensible computational budget.

Despite what the benchmarks seem to show, this feels like a huge step up for open-weight models. Bravo to the DeepSeek team!

XCSme today at 11:46 AM
Something is odd with this model, their blog posts shows REALLY good results, but in most other third-party benchmarks, people realize it's not really SOTA, even bellow Kimi K2.6 and GLM-5/5.1

In my tests too[0], it doesn't reach top 10. One issue, which they also mentioned in their post, is that they can't really serve well the model at the moment, so V4-Pro is heavily rate-limited and gives a lot of timeout errors when I try to test it. This shouldn't be an issue though, considering the model is open-source, but it makes it hard to accurately test at the moment.

[0]: https://aibenchy.com/compare/deepseek-deepseek-v4-flash-high...

throwa356262 today at 6:17 AM
Seriously, why can't huge companies like OpenAI and Google produce documentation that is half this good??

https://api-docs.deepseek.com/guides/thinking_mode

No BS, just a concise description of exactly what I need to write my own agent.

orbital-decay today at 6:35 AM
>we implement end-to-end, bitwise batch-invariant, and deterministic kernels with minimal performance overhead

Pretty cool, I think they're the first to guarantee determinism with the fixed seed or at the temperature 0. Google came close but never guaranteed it AFAIK. DeepSeek show their roots - it may not strictly be a SotA model, but there's a ton of low-level optimizations nobody else pays attention to.

xingyi_dev today at 8:11 AM
Deepseek v4 is basically that quiet kid in the back of the class who never says a word but casually ruins the grading curve for everyone else on the final exam.
chenzhekl today at 8:12 AM
It's interesting that they mentioned in the release notes:

"Limited by the capacity of high-end computational resources, the current throughput of the Pro model remains constrained. We expect its pricing to decrease significantly once the Ascend 950 has been deployed into production."

https://api-docs.deepseek.com/zh-cn/news/news260424#api-%E8%...

revolvingthrow today at 5:42 AM
> pricing "Pro" $3.48 / 1M output tokens vs $4.40

I’d like somebody to explain to me how the endless comments of "bleeding edge labs are subsidizing the inference at an insane rate" make sense in light of a humongous model like v4 pro being $4 per 1M. I’d bet even the subscriptions are profitable, much less the API prices.

edit: $1.74/M input $3.48/M output on OpenRouter

fblp today at 3:53 AM
There's something heartwarming about the developer docs being released before the flashy press release.
deleted today at 1:37 PM
sho today at 6:58 AM
So, this is the version that's able to serve inference from Huawei chips, although it was still trained on nVidia. So unless I'm very much mistaken this is the biggest and best model yet served on (sort of) readily-available chinese-native tech. Performance and stability will be interesting to see; openrouter currently saying about 1.12s and 30tps, which isn't wonderful but it's day one after all.

For reference, the huawei Ascend 950 that this thing runs on is supposed to be roughly comparable to nVidia's H100 from 2022. In other words, things are hotting up in the GPU war!

gbnwl today at 3:49 AM
I’m deeply interested and invested in the field but I could really use a support group for people burnt out from trying to keep up with everything. I feel like we’ve already long since passed the point where we need AI to help us keep up with advancements in AI.
primaprashant today at 6:26 AM
While SWE-bench Verified is not a perfect benchmark for coding, AFAIK, this is the first open-weights model that has crossed the threshold of 80% score on this by scoring 80.6%.

Back in Nov 2025, Opus 4.5 (80.9%) was the first proprietary model to do so.

yanis_t today at 11:00 AM
Assuming it is almost as good as Opus 4.6 (which benchmarks seem to give evidence for), and assuming we are having a good enough harness (PI, OpenCode), it's is now more than 5x cheaper.

I just want to remind you that this is happening at the same time as Anthropic A/B tests removal of Code from Pro Plan, and as OpenAI releases gpt-5.5 2x more expensive than gpt-5.4...

yanis_t today at 4:34 AM
Already on Openrouter. Pro version is $1.74/m/input, $3.48/m/output, while flash $0.14/m/input, 0.28/m/output.
seanobannon today at 3:44 AM
amunozo today at 6:52 AM
For those who rely on open source models but don't want to stop using frontier models, how do you manage it? Do you pay any of the Chinese subscription plans? Do you pay the API directly? After GPT 5.5 release, however good it is, I am a bit tired of this price hiking and reduced quota every week. I am now unemployed and cannot afford more expensive plans for the moment.
sidcool today at 4:21 AM
Truly open source coming from China. This is heartwarming. I know if the potential ulterior motives.
mchusma today at 4:43 AM
For comparison on openrouter DeepSeek v4 Flash is slightly cheaper than Gemma 4 31b, more expensive than Gemma 4 26b, but it does support prompt caching, which means for some applications it will be the cheapest. Excited to see how it compares with Gemma 4.
dizhn today at 10:25 AM
I like deepseek. It works very well. I haven't tried v4 yet but on their web chat interface, just typing "Taiwan" causes it to give you a lecture about how Taiwan is part of China. :)
zargon today at 4:08 AM
The Flash version is 284B A13B in mixed FP8 / FP4 and the full native precision weights total approximately 154 GB. KV cache is said to take 10% as much space as V3. This looks very accessible for people running "large" local models. It's a nice follow up to the Gemma 4 and Qwen3.5 small local models.
nthypes today at 3:45 AM
https://huggingface.co/deepseek-ai/DeepSeek-V4-Pro/blob/main...

Model was released and it's amazing. Frontier level (better than Opus 4.6) at a fraction of the cost.

cmitsakis today at 12:29 PM
I just did some quick testing on my own benchmark that tests LLMs as customer support chatbots, and found out that deepseek-v4-flash (scored 90.2%) was better than qwen3.5-27b (89%) and qwen3.5-35b-a3b (89.1%) and roughly equal to gemini-3-flash-preview (90.5%), but deepseek-v4-flash had the lowest cost of all of them by far. Half the cost of gemini-3-flash and an order of magnitude less cost than the qwen models.

Have you noticed the deepseek-v4-pro performing worse than deepseek-v4-flash? It performed even worse than qwen3.5-27b. I found it surprising and I'm wondering if there is a bug on my software because I had to implement sending the `reasoning_content` otherwise the API failed with BadRequestError.

impossiblefork today at 12:51 PM
After testing this for understanding complex stories, text comprehension is definitely comparable to or better than Sonnet, and definitely better than Microsoft's free stuff. Opus is of course very impressive, especially with how Opus is set up with recursive calls that allow it to make rather complete things as if by magic, but the underlying model probably isn't incredibly much better than this.
lobo_tuerto today at 1:28 PM
Glad to see most of the comments here were kept on-topic and didn't deviate at all into geopolitical discussion.
sergiopreira today at 11:15 AM
DeepSeek is commoditizing frontier capability... Opus 4.6-level benchmarks at a fraction of the cost changes also who can access these tools.

Stuff that was prohibitive six months ago is now up for grabs. We keep on working on the infra level now, swithcing models whenever we run out of credits, or want a different result. The question is how do we build context, architecture and ensure the agent is effective and efficient..... wouldn't it be good if we simply used less energy to make these AI calls?

sheeshkebab today at 12:52 PM
Ask it if there was a Tiananmen square massacre. Then decide if you really want to be part of this murderous propaganda.
zkmon today at 5:56 AM
They released 1.6 T pro base model on huggingface. First time I'm seeing a "T" model here.
vinhnx today at 9:19 AM
The king is back! I remember vividly being very amazed and having a deep appreciation reading DeepSeek's reasoning on Chat.DeepSeek.com, even before the DeepSeek moment in January later that year. I can't quite remember the date, but it's the most profound moment I have ever had. After OpenAI O1, no other model has “reasoning” capability yet. And DeepSeek opens the full trace for us. Seeing DeepSeek's “wait, aha…” moments is something hard to describe. I learned strategy and reasoning skills for myself also. I am always rooting for them.
jessepcc today at 4:01 AM
At this point 'frontier model release' is a monthly cadence, Kimi 2.6 Claude 4.6 GPT 5.5, the interesting question is which evals will still be meaningful in 6 months.
DennisP today at 1:02 PM
No CUDA, 1.6T parameters but with 49B active...does that mean you can run it efficiently on a 64GB macbook?
quadruple today at 8:36 AM
In their paper, point 5.2.5 talks about their sandboxing platform(DeepSeek Elastic Compute). It seems like they have 4 different execution methods: function calls, container, microVM and fullVM.

This is a pretty interesting thing they've built in my opinion, and not something I'd expect to be buried in the model paper like this. Does anyone have any details about it? Google doesn't seem to find anything of note, and I'd love to dive a bit deeper into DSec.

sixhobbits today at 6:58 AM
I know people don't like Twitter links here but the main link just goes to their main docs site generic 'getting started' page.

The website now has a link to the announcement on Twitter here https://x.com/deepseek_ai/status/2047516922263285776

Copying text of that below

DeepSeek-V4 Preview is officially live & open-sourced! Welcome to the era of cost-effective 1M context length.

DeepSeek-V4-Pro: 1.6T total / 49B active params. Performance rivaling the world's top closed-source models.

DeepSeek-V4-Flash: 284B total / 13B active params. Your fast, efficient, and economical choice.

Try it now at http://chat.deepseek.com via Expert Mode / Instant Mode. API is updated & available today!

Tech Report: https://huggingface.co/deepseek-ai/DeepSeek-V4-Pro/blob/main...

Open Weights: https://huggingface.co/collections/deepseek-ai/deepseek-v4

Imanari today at 6:02 AM
Just tested it via openrounter in the Pi Coding agent and it regularly fails to use the read and write tool correctly, very disappointing. Anyone know a fix besides prompting "always use the provided tools instead of writing your own call"
simonw today at 4:35 AM
I like the pelican I got out of deepseek-v4-flash more than the one I got from deepseek-v4-pro.

https://simonwillison.net/2026/Apr/24/deepseek-v4/

Both generated using OpenRouter.

For comparison, here's what I got from DeepSeek 3.2 back in December: https://simonwillison.net/2025/Dec/1/deepseek-v32/

And DeepSeek 3.1 in August: https://simonwillison.net/2025/Aug/22/deepseek-31/

And DeepSeek v3-0324 in March last year: https://simonwillison.net/2025/Mar/24/deepseek/

coderssh today at 6:38 AM
Feels like the real story here is cost/performance tradeoff rather than raw capability. Benchmarks keep moving incrementally, but efficiency gains like this actually change who can afford to build on top.
Aliabid94 today at 3:54 AM
MMLU-Pro:

Gemini-3.1-Pro at 91.0

Opus-4.6 at 89.1

GPT-5.4, Kimi2.6, and DS-V4-Pro tied at 87.5

Pretty impressive

rohanm93 today at 5:37 AM
This is shockingly cheap for a near frontier model. This is insane.

For context, for an agent we're working on, we're using 5-mini, which is $2/1m tokens. This is $0.30/1m tokens. And it's Opus 4.6 level - this can't be real.

I am uncomfortable about sending user data which may contain PII to their servers in China so I won't be using this as appealing as it sounds. I need this to come to a US-hosted environment at an equivalent price.

Hosting this on my own + renting GPUs is much more expensive than DeepSeek's quoted price, so not an option.

gardnr today at 5:39 AM
865 GB: I am going to need a bigger GPU.
CJefferson today at 5:12 AM
What's the current best framework to have a 'claude code' like experience with Deepseek (or in general, an open-source model), if I wanted to play?
lifeisstillgood today at 7:48 AM
On a seperate note, I am guessing that all the new models have announced in the space of a few days because the time to train a model is the same for each AI company.

Which strikes me as odd - Inwoukd have assumed someone had an edge in terms of at least 10% extra GPUs.

storus today at 5:09 AM
Oh well, I should have bought 2x 512GB RAM MacStudios, not just one :(
xnx today at 5:49 AM
Such different time now than early 2025 when people thought Deepaeek was going to kill the market for Nvidia.
luyu_wu today at 3:43 AM
For those who didn't check the page yet, it just links to the API docs being updated with the upcoming models, not the actual model release.
yanhangyhy today at 9:06 AM
somehow i canot open the link. but in their chinese version's release article, in the end ,there is a quote from xunzi(https://en.wikipedia.org/wiki/Xunzi_(philosopher))

"Not seduced by praise, not terrified by slander; following the Way in one's conduct, and rectifying oneself with dignity." (不诱于誉,不恐于诽,率道而行,端然正己)

(It is mainly used to express the way a Confucian gentleman conducts himself in the world. It reminds me of an interview I once watched with an American politician, who said that, at its core, China is still governed through a Confucian meritocratic elite system. It seems some things have never really changed.

In some respects, Liang Wenfeng can be compared to Linux. The political parallel here is that the advantages of rational authoritarianism are often overlooked because of the constraints imposed by modern democratic systems. )

nba456_ today at 10:28 AM
Wow, never seen a post with so many comments posted overnight like this.
thefounder today at 9:05 AM
They still don’t support json schema or batch api. It’s like deepseek does not want to make money
bandrami today at 5:54 AM
I don't mind that High Flyer completely ripped off Anthropic to do this so much as I mind that they very obviously waited long enough for the GAB to add several dozen xz-level easter eggs to it.
jdeng today at 3:53 AM
Excited that the long awaited v4 is finally out. But feel sad that it's not multimodal native.
Grp1 today at 9:09 AM
DeepSeek’s docs say V4 has a 1M context length. Is that actually usable in practice, or just the model/API limit?

Codex shows ~258k for me and Claude Code often shows ~200k, so I’m curious how DeepSeek is exposing such a large window.

Oxlamarr today at 9:57 AM
The speed of progress here is wild. It feels like the hard part is shifting from having access to a strong model to actually building trustworthy systems around it.
aquir today at 6:41 AM
It is great! I asked the question what I always ask of new models ("what would Ian M Banks think about the current state of AI") and it gave me a brilliant answer! Funny enough the answer contained multiple criticisms of his own creators ("Chinese state entities", "Social Credit System").
jfxia today at 6:36 AM
Is V4 still not a multi-modal model?
yanis_t today at 6:58 AM
Is there a harness that is as good as cloud code that can be used with open weight models?
taosx today at 3:47 AM
clark1013 today at 5:14 AM
Looking forward to DeepSeek Coding Plan
dannyw today at 8:21 AM
Are there better providers for inferencing this right now? I know it's launch day, but openrouter showing 30tps isn't looking great.
aliljet today at 4:35 AM
How can you reasonably try to get near frontier (even at all tps) on hardware you own? Maybe under 5k in cost?
namegulf today at 4:27 AM
Is there a Quantized version of this?
KaoruAoiShiho today at 3:57 AM
SOTA MRCR (or would've been a few hours earlier... beaten by 5.5), I've long thought of this as the most important non-agentic benchmark, so this is especially impressive. Beats Opus 4.7 here
sibellavia today at 5:32 AM
A few hours after GPT5.5 is wild. Can’t wait to try it.
fbrncci today at 10:12 AM
Take that Anthropic and your shenanigans.
GuardCalf today at 8:14 AM
I like this. The more competitors there are, the more we the users benefit.
JonChesterfield today at 8:32 AM
Anyone worked out how much hardware one needs to self host this one?
apexalpha today at 6:17 AM
This FLash model might be affordable for OpenClaw. I run it on my mac 48gb ram now but it's slowish.
reenorap today at 4:19 AM
Which version fits in a Mac Studio M3 Ultra 512 GB?
swrrt today at 4:04 AM
Any visualised benchmark/scoreboard for comparison between latest models? DeepSeek v4 and GPT-5.5 seems to be ground breaking.
ghstinda today at 11:56 AM
so many models not enough time
WhereIsTheTruth today at 6:21 AM
Interesting note:

"Due to constraints in high-end compute capacity, the current service capacity for Pro is very limited. After the 950 supernodes are launched at scale in the second half of this year, the price of Pro is expected to be reduced significantly."

So it's going to be even cheaper

cztomsik today at 6:54 AM
So is this the first AI lab using MUON for their frontier model?
mariopt today at 4:48 AM
Does deepseek has any coding plan?
raincole today at 4:07 AM
History doesn't always repeat itself.

But if it does, then in the following week we'll see DeepSeek4 floods every AI-related online space. Thousands of posts swearing how it's better than the latest models OpenAI/Anthropic/Google have but only costs pennies.

Then a few weeks later it'll be forgotten by most.

rvz today at 4:00 AM
The paper is here: [0]

Was expecting that the release would be this month [1], since everyone forgot about it and not reading the papers they were releasing and 7 days later here we have it.

One of the key points of this model to look at is the optimization that DeepSeek made with the residual design of the neural network architecture of the LLM, which is manifold-constrained hyper-connections (mHC) which is from this paper [2], which makes this possible to efficiently train it, especially with its hybrid attention mechanism designed for this.

There was not that much discussion around it some months ago here [3] about it but again this is a recommended read of the paper.

I wouldn't trust the benchmarks directly, but would wait for others to try it for themselves to see if it matches the performance of frontier models.

Either way, this is why Anthropic wants to ban open weight models and I cannot wait for the quantized versions to release momentarily.

[0] https://huggingface.co/deepseek-ai/DeepSeek-V4-Pro/blob/main...

[1] https://news.ycombinator.com/item?id=47793880

[2] https://arxiv.org/abs/2512.24880

[3] https://news.ycombinator.com/item?id=46452172

tcbrah today at 6:03 AM
giving meta a run for its money, esp when it was supposed to be the poster child for OSS models. deepseek is really overshadowing them rn
cl08 today at 7:15 AM
Any way to connect this to claude code?
sergiotapia today at 5:40 AM
Using it with opencode sometimes it generates commands like:

    bash({"command":"gh pr create --title "Improve Calendar module docs and clean up idiomatic Elixir" --body "$(cat <<'EOF'
    Problem
    The Calendar modu...
like generating output, but not actually running the bash command so not creating the PR ultimately. I wonder if it's a model thing, or an opencode thing.
tariky today at 5:27 AM
Anyone tried with make web UI with it? How good is it? For me opus is only worth because of it.
zurfer today at 7:32 AM
lots of great stuff, but the plot in the paper is just chart crime. different shades of gray for references where sometimes you see 4 models and sometimes 3.
ls612 today at 4:07 AM
How long does it usually take for folks to make smaller distills of these models? I really want to see how this will do when brought down to a size that will run on a Macbook.
cubefox today at 8:49 AM
Abstract of the technical report [1]:

> We present a preview version of DeepSeek-V4 series, including two strong Mixture-of-Experts (MoE) language models — DeepSeek-V4-Pro with 1.6T parameters (49B activated) and DeepSeek-V4-Flash with 284B parameters (13B activated) — both supporting a context length of one million tokens. DeepSeek-V4 series incorporate several key upgrades in architecture and optimization: (1) a hybrid attention architecture that combines Compressed Sparse Attention (CSA) and Heavily Compressed Attention (HCA) to improve long-context efficiency; (2) Manifold-Constrained Hyper-Connections (mHC) that enhance conventional residual connections; (3) and the Muon optimizer for faster convergence and greater training stability. We pre-train both models on more than 32T diverse and high-quality tokens, followed by a comprehensive post-training pipeline that unlocks and further enhances their capabilities. DeepSeek-V4-Pro-Max, the maximum reasoning effort mode of DeepSeek-V4-Pro, redefines the state-of-the-art for open models, outperforming its predecessors in core tasks. Meanwhile, DeepSeek-V4 series are highly efficient in long-context scenarios. In the one-million-token context setting, DeepSeek-V4-Pro requires only 27% of single-token inference FLOPs and 10% of KV cache compared with DeepSeek-V3.2. This enables us to routinely support one-million-token contexts, thereby making long-horizon tasks and further test-time scaling more feasible. The model checkpoints are available at https://huggingface.co/collections/deepseek-ai/deepseek-v4.

1: https://huggingface.co/deepseek-ai/DeepSeek-V4-Pro/blob/main...

augment_me today at 6:30 AM
Amaze amaze amaze
casey2 today at 8:21 AM
Already over a billion tokens on open router in under 5 hours
gigatexal today at 6:42 AM
Has anyone used it? How does it compare to gpt 5.5 or opus 4.7?
coolThingsFirst today at 6:35 AM
I got an API key without credit card details I didn’t know they had a free plan.
luew today at 5:35 AM
We will be hosting it soon at getlilac.com!
punkpeye today at 5:48 AM
Incredible model quality to price ratio
deleted today at 3:50 AM
donbreo today at 7:06 AM
Aaaand it cant still name all the states in India,or say what happened in 1989
deleted today at 7:49 AM
hongbo_zhang today at 4:35 AM
congrats
creamyhorror today at 4:09 AM
[dead]
Aegis_Labs today at 1:06 PM
[dead]
unit149 today at 1:18 PM
[dead]
hubertzhang today at 4:44 AM
[dead]
maryjeiel today at 4:11 AM
[dead]
Razengan today at 10:56 AM
[dead]
slopinthebag today at 5:10 AM
[flagged]
minhajulmahib today at 4:23 AM
[flagged]
dhruv3006 today at 4:58 AM
Ah now !
shafiemoji today at 3:57 AM
I hope the update is an improvement. Losing 3.2 would be a real loss, it's excellent.