Gemma 4 on iPhone

829 points - yesterday at 6:45 PM

Comments

karimf yesterday at 8:51 PM

This app is cool and it showcases some use cases, but it still undersells what the E2B model can do.

I just made a real-time AI (audio/video in, voice out) on an M3 Pro with Gemma E2B. I posted it on /r/LocalLLaMA a few hours ago and it's gaining some traction [0]. Here's the repo [1]

I'm running it on a Macbook instead of an iPhone, but based on the benchmark here [2], you should be able to run the same thing on an iPhone 17 Pro.

[0] https://www.reddit.com/r/LocalLLaMA/comments/1sda3r6/realtim...

[1] https://github.com/fikrikarim/parlor

[2] https://huggingface.co/litert-community/gemma-4-E2B-it-liter...

PullJosh yesterday at 7:41 PM

This is awesome!

1) I am able to run the model on my iPhone and get good results. Not as good as Gemini in the cloud, but good.

2) I love the “mobile actions” tool calls that allow the LLM to turn on the flashlight, open maps, etc. It would be fun if they added Siri Shortcuts support. I want the personal automation that Apple promised but never delivered.

3) I am so excited for local models to be normalized. I build little apps for teachers and there are stringent privacy laws involved that mean I strongly prefer writing code that runs fully client-side when possible. When I develop apps and websites, I want easy API access to on-device models for free. I know it sort of exists on iOS and Chrome right now, but as far as I’m aware it’s not particularly good yet.

janandonly yesterday at 8:37 PM

OP Here. It is my firm belief that the only realistic use of AI in the future is either locally on-device for almost free, or in the cloud but way more expensive then it is today.

The latter option will only bemusedly for tasks that humans are more expensive or much slower in.

This Gemma 4 model gives me hope for a future Siri or other with iPhone and macOS integration, “Her” (as in the movie) style.

pmarreck yesterday at 7:38 PM

Impressive model, for sure. I've been running it on my Mac, now I get to have it locally in my iPhone? I need to test this. Wait, it does agent skills and mobile actions, all local to the phone? Whaaaat? (Have to check out later! Anyone have any tips yet?)

I don't normally do the whole "abliterated" thing (dealignment) but after discovering https://github.com/p-e-w/heretic , I was too tempted to try it with this model a couple days ago (made a repo to make it easier, actually) https://github.com/pmarreck/gemma4-heretical and... Wow. It worked. And... Not having a built-in nanny is fun!

It's also possible to make an MLX version of it, which runs a little faster on Macs, but won't work through Ollama unfortunately. (LM Studio maybe.)

Runs great on my M4 Macbook Pro w/128GB and likely also runs fine under 64GB... smaller memories might require lower quantizations.

I specifically like dealigned local models because if I have to get my thoughts policed when playing in someone else's playground, like hell am I going to be judged while messing around in my own local open-source one too. And there's a whole set of ethically-justifiable but rule-flagging conversations (loosely categorizable as things like "sensitive", "ethically-borderline-but-productive" or "violating sacred cows") that are now possible with this, and at a level never before possible until now.

Note: I tried to hook this one up to OpenClaw and ran into issues

To answer the obvious question- Yes, this sort of thing enables bad actors more (as do many other tools). Fortunately, there are far more good actors out there, and bad actors don't listen to rules that good actors subject themselves to, anyway.

jeroenhd yesterday at 7:51 PM

English version of the page: https://apps.apple.com/us/app/google-ai-edge-gallery/id67496...

Also on Android: https://play.google.com/store/apps/details?id=com.google.ai....

It's a demo app for Google's Edge project: https://ai.google.dev/edge

amai today at 3:29 PM

The cooperation of Apple and Google is going to crush the competition: https://blog.google/company-news/inside-google/company-annou...

The combination of Apples hardware and Googles software is unbeatable.

rock_artist today at 6:41 AM

I really believe in the future of local models.

From app developer and user, My main concern for now is bloating devices. Until we’ll have something like Apples foundation model where multiple apps could share the same model it means we have something horrible as Electron in the sense, every app is a fully blown model (browser in the electron story) instead of reusing the model.

With desktops we have DLL hell for years. But with sandboxed apps on mobile devices it becomes a bigger issue that I guess will/should be addressed by the OS.

For my app I’ve been trying to add some logic based on large model but for bloating a simple Swift app with 2-3GB of model or even few hundred MBs feels wrong doing and conflicting with code reusability concepts.

al_borland today at 2:52 AM

I find it odd they are using the term “edge” to brand this, if it’s target is the general public.

I’ve been to a few tech conferences and saw the term used there for the first time. It took me a little bit to see the pattern and understand what it meant. I have never heard the term used outside of those circles. It seems like “local” would be the term average users would be familiar with. Normal people don’t call their stuff “edge devices”.

lemonish97 today at 7:14 PM

I hope they add a web search tool to the agent skills too. Most of my llm usage on my phone are just quick lookups and search summarizations. Would love to do these with a local model rather than Google AI mode of any other cloud based inference tools.

dhbradshaw yesterday at 10:08 PM

My son just started using 2B on his Android. I mentioned that it was an impressively compact model and next thing I knew he had figured out how to use it on his inexpensive 2024 Motorolla and was using it to practice reading and writing in foreign languages.

orf today at 10:15 AM

I’d recommend locally.ai[1] - it’s really good and has a wide range of models. Also has shortcuts support.

1. https://apps.apple.com/gb/app/locally-ai-local-ai-chat/id674...

allpratik yesterday at 9:30 PM

Nice! Tried on iPhone 16 pro with 30 TPS from Gemma-4-E2B-it model.

Although the phone got considerably hot while inferencing. It’s quite an impressive performance and cannot wait to try it myself in one of my personal apps.

TGower yesterday at 8:09 PM

These new models are very impressive. There should be a massive speedup coming as well, AI Edge Gallery is running on GPU, but NPUs in recent high end processors should be much faster. A16 chip for example (Macbook Neo and iphone 16 series) has 35 TOPS of Neural Engine vs 7 TFLOPS gpu. Similar story for Qualcomm.

areys today at 6:41 PM

The use cases that open up when inference stays on-device are genuinely different. Health apps, journaling, anything where users are (justifiably) paranoid about their data leaving the phone — that's a big surface area that cloud APIs can't really touch. Surprised this is happening at the speed it is on consumer hardware.

two_handfuls today at 2:03 AM

The description says it's private, but the legalese it makes you agree to makes no promise. Rather, the opposite:

> We collect information about your activity in our services

Source: https://policies.google.com/privacy#infocollect

deckar01 yesterday at 8:47 PM

It doesn’t render Markdown or LaTeX. The scrolling is unusable during generation. E4B failed to correctly account for convection and conduction when reasoning about the effects of thermal radiation (31b was very good). After 3 questions in a session (with thinking) E4B went off the rails and started emitting nonsense fragment before the stated token limit was hit (unless it isn’t actually checking).

_nagu_ today at 9:21 AM

If this works smoothly on iPhone, it could change how we think about mobile apps. Less backend dependency, more on-device intelligence.

haizhung today at 1:46 PM

I encourage everybody to try this, if they have an iPhone. If you’re like me and don’t have the time to tinker with the latest and greatest all the time; this app lowers the barrier to entry significantly and provides a glimpse into what’s possible locally, on device.

Honestly, I was extremely impressed by the speed and quality of the answers considering this thing runs on a phone. It honestly makes me want to sit down and spin up my own homegrown AI setup to go fully independent. Crazy.

hadrien01 yesterday at 7:31 PM

Is it me or does the App Store website look... fake? The text in the header ("Productiviteit", "Alleen voor iPhone") looks pixelated, like it was edited on Paint, the header background is flickering, the app icon and screenshots are very low quality, the title of the website is incomplete ("App Store voor iPho...")

burnto yesterday at 8:41 PM

My iPhone 13 can’t run most of these models. A decent local LLM is one of the few reasons I can imagine actually upgrading earlier than typically necessary.

sshrajesh today at 4:50 PM

> Note: I tried to hook this one up to OpenClaw and ran into issues

Anyone worked on hooking up OpenClaw to gemma4 running locally?

carbocation yesterday at 8:00 PM

It would be very helpful if the chat logs could (optionally) be retained.

davecahill today at 1:38 AM

I really like Enclave for on-device models - looks like they're about to add Gemma 4 too: https://enclaveai.app/blog/2026/04/02/gemma-4-release-on-dev...

rudedogg today at 1:55 AM

This is fun, FYI you don’t have to sign in/up with a Google account. I hesitated downloading it for that reason.

satvikpendem today at 1:34 AM

This is also on Android and has an option to use AICore with the NPU which can run much faster than even the GPU models.

dwa3592 yesterday at 8:38 PM

I think with this google starts a new race- best local model that runs on phones.

danielrmay today at 3:20 AM

I spent some time getting Gemma4-e4b working via llamacpp on iPhone and I'm really impressed so far! I posted a short video of an example application on LinkedIn here https://www.linkedin.com/feed/update/urn:li:activity:7446746... (or x: https://x.com/danielrmay/status/2040971117419192553)

thot_experiment yesterday at 10:47 PM

Gemma 4 E4B is an incredible model for doing all the home assistant stuff I normally just used Qwen3.5 35BA4B + Whisper while leaving me with wayy more empty vram for other bullshit. It works as a drop in replacement for all of my "turn the lights off" or "when's the next train" type queries and does a good job of tool use. This is the really the first time vramlets get a model that's reliably day to day useful locally.

I'm curious/worried about the audio capability, I'm still using Whisper as the audio support hasn't landed in llama.cpp, and I'm not excited enough to temporarily rewire my stuff to use vLLM or whatever their reference impl is. The vision capabilities of Gemma are notably (thus far, could be impl specific issues?) much much worse than Qwen (even the big moe and dense gemma are much worse), hopefully the audio is at least on par with medium whisper.

derwiki today at 12:01 PM

I asked it about the “Altamont Free Concert” (exact name of Wikipedia article), and it’s been a while since I’ve seen an hallucination this rich. Doesn’t give me confidence to use it.

totetsu today at 9:25 AM

I have been looking at ARGmax https://www.argmaxinc.com/#SDK for running on apple devices, but not sure yet at whats involved in porting a model to work with their sdk

MysticOracle today at 3:58 AM

Crashes for me on a couple of different iDevices (2 generations behind) after only a few 2-3 chats. Probably not enough RAM.

Saw this one on X the other day updated with Gemma 4 and they have the built-in Apple Foundation model, Qwen3.5, and other models:

Locally AI - https://locallyai.app/

inzlab today at 8:22 PM

Impressive

neurostimulant yesterday at 10:54 PM

I'm able to sweet talk the gemma-4-e2b-it model in an iphone 15 to solve a hcaptcha screenshot. This small model is surprisingly very capable!

XCSme yesterday at 9:58 PM

Gemma 4 is great: https://aibenchy.com/compare/google-gemma-4-31b-it-medium/go...

I assume it is the 26B A4B one, if it runs locally?

rcarmo today at 7:44 AM

This is fun. I just wish I could add more skills, the UX is too dumbed down but knowing there is a run_js tool there is a lot that can be done here.

rotexo yesterday at 11:40 PM

E4B is pretty good for extracting tables of items from receipt scans and inferring categories, wish this could be called from within a shortcut to just select a photo and add the extracted table to the clipboard

nickvec today at 2:40 AM

Extremely impressed by how fast responses are on iPhone 17 Pro Max. Can’t wait for this to be used for Siri’s brain one of these days (hopefully!)

gdzie-jest-sol today at 10:04 AM

I need normal server too in local network I can run chat in other device and 'counting' on iphone.

Second idea is input audio in other language, like Czech, Polish, French

deleted yesterday at 7:55 PM

modeless today at 6:56 AM

It's so ridiculous that Google made a custom SoC for their phones, touting its AI performance, even calling it Tensor, and Apple is still faster at running Google's own model.

Google really ought to shut down their phone chip team. Literally every chip from them has been a disappointment. As much as I hate to say it, sticking with Qualcomm would have been the right choice.

Sharmaji000 today at 2:39 AM

Still didnt release training recipe, data, methodology etc unlike deepseek. Mostly released to get developer ecosystem across their android built in ai. Still good and interesting, but not exactly philanthropic to the open source progress.

mc7alazoun yesterday at 11:19 PM

Would it work locally on a Mac Pro M4 24gb? If so I'd really appreciate a step-by-step guide.

MagicMoonlight today at 11:32 AM

It seems really capable. A few more iterations of this and you won’t even need a subscription.

All it needs is web search so that it can get up to date information.

jdthedisciple today at 7:27 AM

it's Google, so is it really private?

remember, megacorps are dying for infinite amounts of analytics data

rickdg yesterday at 8:51 PM

How do these compare to Apple's Foundation Models, btw?

Waterluvian yesterday at 11:52 PM

I see a phenomenal opportunity for old phone re-use by arraying them in some dock and making them be my "home AI."

garff yesterday at 9:40 PM

How new of an iPhone model is needed?

tithos today at 12:17 AM

Most of the models are not available. I’m guessing they will become available soon enough… At least I hope.

beeflet yesterday at 8:54 PM

Isn't this already possible in a much more open-ended way with PocketPal?

https://github.com/a-ghorbani/pocketpal-ai

https://apps.apple.com/us/app/pocketpal-ai/id6502579498

https://play.google.com/store/apps/details?id=com.pocketpala...

imadselka today at 10:37 AM

good model!

dzhiurgis yesterday at 9:08 PM

I recently got to a first practical use of it. I was on a plane, filling landing card (what a silly thing these are). I looked up my hotel address using qwen model on my iPhone 16 Pro. It was accurate. I was quite impressed.

After some back and forth the chat app started to crash tho, so YMMV.

nightrate_ai today at 10:53 AM

[dead]

meidad_g yesterday at 9:11 PM

[dead]

areys today at 12:26 PM

[dead]

Sukhesh-QA today at 12:13 PM

[dead]

micmcfly today at 1:23 AM

[dead]

ValveFan6969 today at 3:22 PM

[dead]

darshil2023 yesterday at 7:05 PM

[dead]

LeonTing1010 today at 4:33 AM

[flagged]

lol8675309 yesterday at 10:52 PM

It’s gotta be free!?!? Right!?!? Oh oh wait

__natty__ yesterday at 8:49 PM

That's a great project! I just wondered whether Google would have a problem with you using their trademark

yalogin today at 12:56 AM

Are these models open source? If so this is Google’s attempt to collect user data from their models.