Apertus – Open Foundation Model for Sovereign AI

504 points - yesterday at 9:29 PM

Comments

maxloh yesterday at 10:10 PM

Other fully open LLMs include Allen AI's OLMo 3.1 and MBZUAI's K2 Think V2, both of which have released their full training pipelines and datasets.

Nvidia Nemotron is also an open training source model, though a portion of its dataset remains proprietary.

Quoting lambda's comment:

> Note that the Nemotron models are generally stronger than Olmo and K2 Think V2 (according to Artificial Analysis benchmarks), and there is a lot of overlap in their datasets (lots of datasets are based on the same sources with different filtering, Olmo and K2 Think V2 both have used some Nemotron datasets).

> But yeah, Nemotron is a modern and fairly capable LLM, even the 122b is more capable than Deepseek R1 (a 671b model) on most benchmarks, and there's also the recently released 550b Ultra now.

https://news.ycombinator.com/item?id=48492439

SwellJoe yesterday at 10:22 PM

I like the idea, and it has become more pressing that everyone outside the US think about tech sovereignty because the US has become an unsafe place to keep your data, but the impression I get from Apertus is that it moves at the speed of a committee. I have no expectation they'll deliver a competitive model. At least, not competitive with current models. Maybe competitive with models a year ago (though they haven't even done that yet, right?).

mrshu yesterday at 11:06 PM

By far the most impactful product of the Apretus project are the people. To quote a memorable line from Dominique Paul (https://www.thisiscrispin.com/):

> What most people miss IMO is that this is not a team who is doing this for the fourth time like virtually any other LLM provider and who could learn from its own past experiences. I bet if the team would do another model training they could get way better results at one fourth of the costs.

pferde yesterday at 10:20 PM

For a model that claims to focus on many languages, it's quite unreliable when it comes to simple questions like "how to say X in language Y" or "how to conjugate verb X in language Y". It keeps hallucinating words that do not exist, and when corrected, it only hallucinates a new lie.

throwaw12 yesterday at 9:56 PM

Looks like their instruct models are Llama3.1 fine tune from last year. Is there any progress on new models?

My last hope for soverign AI is from Chinese open models

wg0 today at 4:48 AM

You might dismiss it as nothing but the Linux analogy does not work here either. It is more than that and direct threat to commercial AI labs and their business model. These labs are milking bunch of foundational papers for years now and the end is near.

Going forward would be such open source, open data and open recipe models possibly someday even with the training being crowd sourced if not inference like the BitTorrent model.

Lastly, even Chinese models (GLM, Deepseek, MiMax) work really really good and any user would testify that they do not miss OpenAI/Anthropic/Gemini at all if they're using those Chinese models which is argument enough that with such models, no one is going to miss Chinese models as well.

zitterbewegung today at 1:30 AM

Sort of interesting license not sure if anyone will do it long term.

The training data and the Apertus LLM may contain or generate information that directly or indirectly refers to an identifiable individual (Personal Data). You process Personal Data as independent controller in accordance with applicable data protection law. SNAI will regularly provide a file with hash values for download which you can apply as an output filter to your use of our Apertus LLM. The file reflects data protection deletion requests which have been addressed to SNAI as the developer of the Apertus LLM. It allows you to remove Personal Data contained in the model output. We strongly advise downloading and applying this output filter from SNAI every six months following the release of the model.

reconnecting yesterday at 11:13 PM

A chat interface where you can try Apertus:

https://chat.publicai.co

Bobaso today at 12:15 PM

Apertus V1 performance were sub-par. The Team is working on v2 ATM. Looking forward to testing it.

yreg yesterday at 9:43 PM

previous thread: https://news.ycombinator.com/item?id=45108401

jawns yesterday at 11:23 PM

I am curious about how opt-outs and PII removal work.

Who confirms those requests are legit?

naklitechie today at 5:23 AM

What's the community's take on Sovereign AI being funded by states around the world?

Why the emphasis on sovereign? Open is good enough. No?

trvz yesterday at 9:49 PM

The previous version of this model has been pretty bad, but claimed to adhere to copyright laws. However, based on my testing, that's not true either. So in my view this is completely useless.

neom today at 12:06 AM

I'm curious to know what stuff like this means for cohere? Their whole value prop is Sovereign AI. It seems they spent a lot of money developing models but own none of their own infra, what is the point of a country spending a lot of money on coheres solutions when stuff like this is becoming increasingly available and usable? Feels like I must be missing something here??

uberex today at 5:47 AM

Being childish I https://oss.zuericitygpt.ch/?q=hello+talk+like+a+pirate

atemerev yesterday at 9:57 PM

I use it extensively. It is not ready for agentic use, but as a generic driving model for RAG use cases, it is pretty competent. You can build useful software with it.

dTal yesterday at 10:53 PM

It's good that there is a movement for open LLMs, but it's not where the battleground is right now. The battleground is local vs service LLMs, and we are losing that battle badly despite all the software being here now and viable, entirely because UX sucks.

How many normal people do you know who use "ChatGPT"? A lot, probably.

How many even know what "Gemma" is, let alone have downloaded llama.cpp, a GGUF file from Hugginface, and run "llama-server" from a text console with all the correct command arguments? How many are thinking about this use case when speccing out their next computer? Where is the breathless marketing copy boasting x tok/s?

We are sleepwalking into slavery.

JSR_FDED today at 2:29 AM

From a sovereign AI perspective, how does this compare to Mistral?

holistio today at 12:29 AM

Knowledge cutoff is March 2024. Incredible.

pizlonator today at 3:14 AM

> compliant at scale

The jokes write themselves.

_pdp_ yesterday at 10:13 PM

I want to believe.

david_shi today at 1:06 AM

These models don't seem very competitive, who's their target audience?

dangoodmanUT today at 12:16 AM

How are they going to be competitive with top models at 70B size?

nisten today at 12:39 AM

As an opesource AI researcher with a lot of models and datasets on huggingface I am very appreciative of these types of project but we are ignoring the elephant in the room here ( or lack of )

the swiss have no gpus

markab21 today at 12:15 AM

I'm mildly surprised that more people aren't using Nemo models for this reason. We've moved most of our processing to a combination of Nemo Ultra and Super, with some support for multi-model-specific tasks on Omni. The setup is working REALLY well for us, and I'm comfortable with the more measured pace of improvements. We work with many long-context problems, and the ecosystem is great.

There were a number of use cases where we needed to use Gemini (audio modality), and Ultra has been a VERY cost-effective alternative once we got through the nuances.

firstrowraver today at 8:04 AM

apertvs.ai? seriously?

andrewshadura today at 5:30 AM

Not to be confused with Apertium and Apertis.

flixspiek today at 4:16 PM

[flagged]

runnig today at 7:38 AM

[dead]

jocelyner today at 7:47 AM

[flagged]

yashthakker today at 4:18 AM

[flagged]

Ainaguade yesterday at 11:56 PM

[dead]

focusgroup0 today at 12:18 AM

[dead]

iamyemeth today at 9:34 AM

> Conclusion There are 2 r's in the word "strawberry".

Not looking good so far

maxloh yesterday at 10:23 PM

Great to see more fully open LLMs.

I think a problem with open-weight models is that while you can improve them, you are not going to create the next generation of LLMs by fine-tuning. We are at the mercy of frontier labs for access to SOTA LLMs. For example, Anthropic recently started requiring identity verification for Claude [0], same for OpenAI [1].

If one day China's distillation labs stop releasing their LLMs as open-weight, I doubt American labs will continue to release free LLM weights without that competition.

That's where fully open pipelines shine: they enable the community to create the next generation of SOTA LLMs. That is the only way LLMs truly become sovereign.

[0]: https://news.ycombinator.com/item?id=48618455

[1]: https://news.ycombinator.com/item?id=48618606