I kind of lost interest in local models. Then Anthropic started saying I’m not allowed to use my Claude Code subscription with my preferred tools and it reminded me why we need to support open tools and models. I’ve cancelled my CC subscription, I’m not paying to support anticompetitive behaviour.
I still haven't experienced a local model that fits on my 64GB MacBook Pro and can run a coding agent like Codex CLI or Claude code well enough to be useful.
It’s hard to elaborate just how wild this model might be if it performs as claimed. The claims are this can perform close to Sonnet 4.5 for assisted coding (SWE bench) while using only 3B active parameters. This is obscenely small for the claimed performance.
Using lmstudio-community/Qwen3-Coder-Next-GGUF:Q8_0 I'm getting up to 32 tokens/s on Strix Halo, with room for 128k of context (out of 256k that the model can manage).
From very limited testing, it seems to be slightly worse than MiniMax M2.1 Q6 (a model about twice its size). I'm impressed.
vessenestoday at 4:15 PM
3B active parameters, and slightly worse than GLM 4.7. On benchmarks. That's pretty amazing! With better orchestration tools being deployed, I've been wondering if faster, dumber coding agents paired with wise orchestrators might be overall faster than using the say opus 4.5 on the bottom for coding. At least we might want to deploy to these guys for simple tasks.
zokiertoday at 7:09 PM
For someone who is very out of the loop with these AI models, can someone explain what I can actually run on my 3080ti (12G)? Is this something like that or is this still too big; is there anything remotely useful runnable with my GPU? I have 64G RAM if that helps (?).
gitpushertoday at 6:38 PM
Pretty cool that they are advertising OpenClaw compatibility. I've tried a few locally-hosted models with OpenClaw and did not get good results – (that tool is a context-monster... the models would get completely overwhelmed them with erroneous / old instructions.)
Granted these 80B models are probably optimized for H100/H200 which I do not have. Here's to hoping that OpenClaw compat. survives quantization
Alifatisktoday at 5:35 PM
As always, the Qwen team is pushing out fantastic content
Going to try this over Kimi k2.5 locally. It was nice but just a bit too slow and a resource hog.
Robdel12today at 5:52 PM
I really really want local or self hosted models to work. But my experience is they’re not really even close to the closed paid models.
Does anyone any experience with these and is this release actually workable in practice?
storustoday at 5:32 PM
Does Qwen3 allow adjusting context during an LLM call or does the housekeeping need to be done before/after each call but not when a single LLM call with multiple tool calls is in progress?
zamadatixtoday at 4:22 PM
Can anyone help me understand the "Number of Agent Turns" vs "SWE-Bench Pro (%)" figure? I.e. what does the spread of Qwen3-Coder-Next from ~50 to ~280 agent turns represent for a fixed score of 44.3%: that sometimes it takes that spread of agent turns to achieve said fixed score for the given model?
fudged71today at 6:30 PM
I'm thrilled. Picked up a used M4 Pro 64GB this morning. Excited to test this out
alexellisuktoday at 4:39 PM
Is this going to need 1x or 2x of those RTX PRO 6000s to allow for a decent KV for an active context length of 64-100k?
It's one thing running the model without any context, but coding agents build it up close to the max and that slows down generation massively in my experience.
orliesaurustoday at 5:01 PM
how can anyone keep up with all these releases... what's next? Sonnet 5?
ionwaketoday at 5:38 PM
will this run on an apple m4 air with 32gb ram?
Im currently using qwen 2.5 16b , and it works really well
endymion-lighttoday at 4:22 PM
Looks great - i'll try to check it out on my gaming PC.
On a misc note: What's being used to create the screen recordings? It looks so smooth!
throwaw12today at 4:33 PM
We are getting there, as a next step please release something to outperform Opus 4.5 and GPT 5.2 in coding tasks
valcron1000today at 5:22 PM
Still nothing to compete with GPT-OSS-20B for local image with 16 VRAM.
syntaxingtoday at 4:53 PM
Is Qwen next architecture ironed out in llama cpp?
moron4hiretoday at 6:59 PM
My IT department is convinced these "ChInEsE cCcP mOdElS" are going to exfiltrate our entire corporate network of its essential fluids and vita.. erh, I mean data. I've tried explaining to them that it's physically impossible for model weights to make network requests on their own. Also, what happened to their MitM-style, extremely intrusive network monitoring that they insisted we absolutely needed?
ossiconestoday at 5:06 PM
What browser use agent are they using here?
raphaelmolly8today at 5:01 PM
[dead]
Soerensentoday at 4:40 PM
The agent orchestration point from vessenes is interesting - using faster, smaller models for routine tasks while reserving frontier models for complex reasoning.
In practice, I've found the economics work like this:
1. Code generation (boilerplate, tests, migrations) - smaller models are fine, and latency matters more than peak capability
2. Architecture decisions, debugging subtle issues - worth the cost of frontier models
3. Refactoring existing code - the model needs to "understand" before changing, so context and reasoning matter more
The 3B active parameters claim is the key unlock here. If this actually runs well on consumer hardware with reasonable context windows, it becomes the obvious choice for category 1 tasks. The question is whether the SWE-Bench numbers hold up for real-world "agent turn" scenarios where you're doing hundreds of small operations.