Launch HN: Freestyle – Sandboxes for Coding Agents

137 points - today at 4:32 PM

We’re Ben and Jacob, cofounders of Freestyle (https://freestyle.sh). We’re building a cloud for Coding Agents.

For the first generation of agents it looked like workflows with minimal tools. 2 years ago we published a package to let AI work in SQL, at that time GPT-4 could write simple scripts. Soon after the first AI App Builders started using AI to make whole websites; we supported that with a serverless deploy system.

But the current generation is going much further, instead of minimal tools and basic serverless apps AI can utilize the full power of a computer (“sandbox”). We’re building sandboxes that are interchangeable with EC2s from your agents perspective, with bonus features:

1. We’ve figured out how to fork a sandbox horizontally without more than a 400ms pause in it. That's not forking the filesystem, we mean forking the whole memory of it. If you’re half way down a browser page with animations running, they’ll be in the same place in all the forks. If you’re running a minecraft server every block and player will be in the same place on the forks. If you’re running a local environment and an error comes up in process that error will be there in all the forks. This works for snapshotting as well, you can save your place and come back weeks later.

2. Our sandboxes start in ~500ms.

Demo: https://www.loom.com/share/8b3d294d515442f296aecde1f42f5524

Compared with other sandboxes, our goal is to be the most powerful. We support full Linux + hardware-virtualization, eBPF, Fuse, etc. We run full Debian with multiple users and we use a systemd init instead of runc. Whatever your AI expects to work on debian should work on these vms, and if it doesn’t send a bug report.

In order to make this possible, we’ve moved to our own bare metal racks. Early in our testing we realized that moving VMs across cloud nodes would not have acceptable performance properties. We asked Google Cloud and AWS for a quote on their bare metal nodes and found that the monthly cost was equivalent to the total cost of the hardware so we did that.

Our goal is to build the necessary infrastructure to replicate the human devloop on the massively multi-tenant scale of AI, so these VMs should be as powerful as the ones you’re used to, while also being available to provision in seconds.

Source

Comments

shubhamintech today at 7:42 PM

I think one of the very few who actually support ebpf & xdp, which you do need when you're building low level stuff. + the bare metal setup is like out of the world lol.

messh today at 8:53 PM

Checkout shellbox.dev, you can do pretty much the same, automating it all bia ssh

nyellin today at 8:22 PM

Is it possible to run a Kubernetes cluster inside one? (E.g. via KIND.)

If so, we'd very much like to test this. We make extensive use of Claude Code web but it can't effectively test our product inside the sandbox without running a K8s cluster

fawabc today at 8:59 PM

how does this differ from daytona or e2b?

stingraycharles today at 6:01 PM

I’m super interested since it seems like you have given everything a lot of thought and effort but I am not sure I understand it.

When I’m thinking of sandboxes, I’m thinking of isolated execution environments.

What does forking sandboxes bring me? What do your sandboxes in general bring me?

Please take this in the best possible way: I’m missing a use case example that’s not abstract and/or small. What’s the end goal here(

MarcelinoGMX3C today at 5:44 PM

The technical challenges in getting memory forking to deliver those sub-second start and fork times are significant. I've seen the pain of trying to achieve that level of state transfer and rapid provisioning. While "EC2-like" gets the point across for many, going bare metal reveals the practical limits of cloud virtualization for high-performance, complex workloads like these. It shows a real understanding of where cloud abstraction helps and where it just adds overhead.

The cost argument for owning the hardware for this specific use case also makes sense, considering the scale these agent environments will demand. Also worth noting, sandboxes are effectively an open attack surface; architecting them not to be in your main VPC is a sound security decision from the start.

TheTaytay today at 7:40 PM

Wow, forking memory along with disk space this quickly is fascinating! That's something that I haven't seen from your competitors.

If the machine can fork itself, it could allow for some really neat auto-forking workflows where you fuzz the UI testing of a website by forking at every decision point. I forget the name of the recent model that used only video as its latent space to control computers and cars, but they had an impressive demo where they fuzzed a bank interface by doing this, and it ended up with an impressive number of permutations of reachable UI states.

qainsights today at 8:04 PM

Is this similar to https://instavm.io/?

jFriedensreich today at 7:43 PM

Non open source and non local SAAS sandboxes are offensive to even try to launch. No one needs this and the only customers will be vibe coders who just don't know any better. There are teams building actual sandboxes like smolmachines, podman, colima and mre. At least be honest and put the virtualisation tech you are using as well as that its closed source SAAS on the landing page to safe people time.

_jayhack_ today at 5:10 PM

Would love to understand how you compare to other providers like Modal, Daytona, Blaxel, E2B and Vercel. I think most other agent builders will have the same question. Can you provide a feature/performance comparison matrix to make this easier?

vimota today at 6:30 PM

This is awesome - the snapshotting especially is critical for long running agents. Since we run agents in a durable execution harness (similar to Temporal / DBOS) we needed a sandboxing approach that would snapshot the state after every execution in order to be able to restore and replay on any failure.

We ended up creating localsandbox [0] with that in mind by using AgentFS for filesystem snapshotting, but our solution is meant for a different use case than Freestyle - simpler FS + code execution for agents all done locally. Since we're not running a full OS it's much less capable but also simpler for lots of use cases where we want the agent execution to happen locally.

The ability to fork is really interesting - the main use case I could imagine is for conversations that the user forks or parallel sub-agents. Have you seen other use cases?

[0] https://github.com/coplane/localsandbox

stocktech today at 6:06 PM

I built something like this at work using plain Docker images. Can you help me understand your value prop a little better?

The memory forking seems like a cool technical achievement, but I don't understand how it benefits me as a user. If I'm delegating the whole thing to the AI anyway, I care more about deterministic builds so that the AI can tackle the problem.

umarcyber today at 8:29 PM

Your UI design is really nice.

n2d4 today at 4:52 PM

Cool! I've been using your API for running sandboxed JS. Nice to see you also support VMs now.

    > we mean forking the whole memory of it

How does this work? Are you copying the entire snapshot, or is this something fancy like copy-on-write memory? If it's the former, doesn't the fork time depend on the size of the machine?

skybrian today at 7:21 PM

Any ideas for locking down remote access from an untrusted VM? Cloudflare has object-based capabilities and some similar thing might be useful to let a VM make remote requests without giving it API keys. (Keys could be exfiltrated via prompt injection.)

jbethune today at 8:16 PM

Congratulations on the launch! Will definitely test this out.

skybrian today at 5:45 PM

It doesn't seem very easy to calculate how much it would cost per month to keep a mostly-idle VM running (for example, with a personal web app). The $20/month plan from exe.dev seems more hobbyist-friendly for that. Maybe that's not the intended use, though?

jnstrdm05 today at 6:12 PM

how many seconds to provision are we talking about here? 1 sec vs 60 is a dealbreaker for me, some clarity on that would be nice.

schopra909 today at 7:02 PM

Honestly never considered the forking use case; but it makes a ton of sense when explained

Congrats on the launch. This is cool tech

maxmaio today at 6:25 PM

Congrats Ben and Jacob!

rasengan today at 5:56 PM

Interesting!

We're working on a similar solution at UnixShells.com [1]. We built a VMM that forks, and boots, in < 20ms and is live, serving customers! We have a lot of great tools available, via MIT, on our github repo [2] as well!

[1] https://unixshells.com

[2] https://github.com/unixshells

benatkin today at 7:12 PM

It's hard to tell what this is or how it compares to other things that are out there, but what I latched onto is this:

> Freestyle is the only sandbox provider with built-in multi-tenant git hosting — create thousands of repos via API and pair them directly with sandboxes for seamless code management. On top of that, Freestyle VMs are full Linux virtual machines with nested virtualization, systemd, and a complete networking stack, not containers.

It makes me think of the git automation around rigs in Gas Town: https://steve-yegge.medium.com/welcome-to-gas-town-4f25ee16d...

Edit: I realize the Loom is a way to look at it. Loom interrupted me twice and I almost skipped it. However it gave me a better idea of what it does, it "invents" snapshotting and restoring of VMs in a way that appears faster. That actually makes sense and I know it isn't that hard to do with how VMs work and that it greatly benefits from having only part of the VM writable and having little memory used (maybe it has read-only memory too?).

Fraaaank today at 5:17 PM

Your pricing page is broken

dominotw today at 6:31 PM

dumb question. none of these protect your from prompt injection. yes?

siva7 today at 5:47 PM

I have so many interesting problems on Ai, sandboxing isn't one of them. It's a pointless excercise yet disproportionately so many people love to to do this. Probably because sandboxing doesn't feel as magic as Agents itself and more like the old times of "traditional" software development.

tuhgdetzhh today at 8:09 PM

[dead]

borakostem today at 5:44 PM

[flagged]

aplomb1026 today at 5:30 PM

[dead]

n1tro_lab today at 6:13 PM

[dead]

johnwhitman today at 6:00 PM

[dead]