ZAYA1-8B: An 8B Moe Model with 760M Active Params Matching DeepSeek-R1 on Math
23 points - today at 8:56 AM
SourceComments
I think this is very important to eventually become a viable replacement for coding models. Because most of the time coding harnesses are leveraging tool calls to gather the context and then write a solution.
I am hopeful, that one day we can replace Claude and OpenAI models with local SOTA LLMs
LM studio doesn't let me actually run this yet though: "Unsupported safetensors format: null"
No I am not saying this model is a drop in Claude replacement. But I think in 2 years we might be really surprised what can be done in a desktop with commodity hardware, no connection to the internet, and a few models that span a subset of tasks.
Really happy to see amd put their hat in the ring. It's a good day for amd investors. I know a lot of AI bros will scoff at this, but having your first training run is a big deal for a new lab. AMD is on their way despite Nvidia having years of runway