Go hard on agents, not on your filesystem

544 points - today at 12:39 AM

Source

Comments

AnotherGoodName today at 1:44 AM
Add this to .claude/settings.json:

  {                                                                                                                                                              
    "sandbox": {                                                                                                                                               
      "enabled": true,
      "filesystem": {
        "allowRead": ["."],
        "denyRead": ["~/"],
        "allowWrite": ["."],
        "denyWrite": ["/"]
      }                                                                                                                                                          
    }
  }

You can change the read part if you're ok with it reading outside. This feature was only added 10 days ago fwiw but it's great and pretty much this.
puttycat today at 4:30 AM
I am still amazed that people so easily accepted installing these agents on private machines.

We've been securing our systems in all ways possible for decades and then one day just said: oh hello unpredictable, unreliable, Turing-complete software that can exfiltrate and corrupt data in infinite unknown ways -- here's the keys, go wild.

boutell today at 10:59 AM
Plain old Unix permissions can get it done. One account for you, one account for AI. A shared folder belonging to a group that both are in. umask and setgid to get the story right for new files. https://apostrophecms.com/blog/how-to-be-more-productive-wit...
jimmar today at 5:37 PM
From the home page:

> Stop trusting blindly

> One-line installer scripts,

Here are the manual install instructions from the "Install / Build page:

> curl -L https://aur.archlinux.org/cgit/aur.git/snapshot/jai.tar.gz | tar xzf -

> cd jai

> makepkg -i

So, trust their jai tool, but not _other_ installer scripts?

otterley today at 5:57 PM
"jai is free software, brought to you by the Stanford Secure Computer Systems research group and the Future of Digital Currency Initiative"

I guess the "Future of Digital Currency Initiative" had to pivot to a more useful purpose than studying how Bitcoin is going to change the world.

andai today at 4:35 AM
This looks great and seems very well thought out.

It looks both more convenient and slightly more secure than my solution, which is that I just give them a separate user.

Agents can nuke the "agent" homedir but cannot read or write mine.

I did put my own user in the agent group, so that I can read and write the agent homedir.

It's a little fiddly though (sometimes the wrong permissions get set, so I have a script that fixes it), and keeping track of which user a terminal is running as is a bit annoying and error prone.

---

But the best solution I found is "just give it a laptop." Completely forget OS and software solutions, and just get a separate machine!

That's more convenient than switching users, and also "physically on another machine" is hard to beat in terms of security :)

It's analogous to the mac mini thing, except that old ThinkPads are pretty cheap. (I got this one for $50!)

ray_v today at 2:59 AM
I'm wondering if the obvious (and stated) fact that the site was vibe-coded - detracts from the fact that this tool was hand written.

> jai itself was hand implemented by a Stanford computer science professor with decades of C++ and Unix/linux experience. (https://jai.scs.stanford.edu/faq.html#was-jai-written-by-an-...)

rsyring today at 3:01 AM
I've been reviewing Agent sandboxing solutions recently and it occurred to me there is a gaping vector for persistent exploits for tools that let the agent write to the project directory. Like this one does.

I had originally thought this would ok as we could review everything in the git diff. But, it later occurred to me that there are all kinds of files that the agent could write to that I'd end up executing, as the developer, outside the sandbox. Every .pyc file for instance, files in .venv , .git hook files.

ChatGPT[1] confirms the underlying exploit vectors and also that there isn't much discussion of them in the context of agent sandboxing tools.

My conclusion from that is the only truly safe sandboxing technique would be one that transfers files from the sandbox to the dev's machine through some kind of git patch or similar. I.e. the file can only transfer if it's in version control and, therefore presumably, has been reviewed by the dev before transfer outside the sandbox.

I'd really like to see people talking more about this. The solution isn't that hard, keep CWD as an overlay and transfer in-container modified files through a proxy of some kind that filters out any file not in git and maybe some that are but are known to be potentially dangerous (bin files). Obviously, there would need to be some kind of configuration option here.

1: https://chatgpt.com/share/69c3ec10-0e40-832a-b905-31736d8a34...

gpm today at 5:43 AM
This is a cool solution... I have a simpler one, though likely inferior for many purposes..

Run <ai tool of your choice> under its own user account via ssh. Bind mount project directories into its home directory when you want it to be able to read them. Mount command looks like

    sudo mkdir /home/<ai-user>/<dir-name>
    sudo mount --bind <dir to mount> --map-groups $(id -g <user>):$(id -g <ai-user>):1 --map-users $(id -u <user>):$(id -u <ai-user>):1 /home/<ai-user>/<dir-name>
I particularly use this with vscode's ssh remotes.
georaa today at 7:07 PM
Everyone talks about sandboxing the filesystem but nobody talks about what happens when the agent's work outlives the container. Reset happens, state is gone, you start over. I've lost more agent work to session timeouts than to any security issue. Isolation without persistence just means you lose progress safely.
gurachek today at 2:03 AM
The examples in the article are all big scary wipes, But I think the more common damage is way smaller and harder to notice.

I've been using claude code daily for months and the worst thing that happened wasnt a wipe(yet). It needed to save an svg file so it created a /public/blog/ folder. Which meant Apache started serving that real directory instead of routing /blog. My blog just 404'd and I spent like an hour debugging before I figured it out. Nothing got deleted and it's not a permission problem, the agent just put a file in a place that made sense to it.

jai would help with the rm -rf cases for sure but this kind of thing is harder to catch because its not a permissions problem, the agent just doesn't know what a web server is.

BoppreH today at 1:26 AM
Excellent project, unfortunate title. I almost didn't click on it.

I like the tradeoff offered: full access to the current directory, read-only access to the rest, copy-on-write for the home directory. With stricter modes to (presumably) protect against data exfiltration too. It really feels like it should be the default for agent systems.

game_the0ry today at 6:39 PM
I may be paranoid but only run my ai cli tools in a vps only. I have them installed locally but never use them. In a vps I go full yolo mode bc I do not care about it. It is a slightly more cumbersome workload, bit if you have a dev + staging envs, then you never have to develop and run stuff locally, which brings the local hardware requirements and costs down too (bc you can develop with a base macbook neo).
maxbeech today at 4:51 PM
the safety concerns compound significantly when you move from interactive to unattended execution. in interactive mode you can catch a bad command before it completes. run the same agent on a schedule at 3am with no one watching and there's no fallback.i built something that schedules claude code jobs to run in the background (openhelm.ai). the layered approach we use: separate OS user account with only project directory write access, claude's native seatbelt/bubblewrap sandboxing, and a mandatory plan review step before any job's first run. you can't approve every individual action at runtime, but you can approve the shape of the plan upfront - which catches most of the scary stuff.the paper's point about clean agent-specific filesystem abstractions resonates. the scope definition problem (what exactly should this agent be able to touch?) is actually the hard part - enforcement is relatively mechanical once you've answered that. and for scheduled workloads, answering that question explicitly at job creation time forces the kind of thinking that prevents the 3am disasters.
mehdibl today at 5:54 PM
Docker is hard to setup. The author made a nice solution but not sure if he know devcontainer and what he can do. You do the setup once and you roll in most dev tools. I'm still surprised the effort people put in such solution ignore the dev's core requirements, like sharing the env they use in a simple way. You used it to have custom env and isolate the agent. You want to persist your credentials? Mount the target folder from home or sl into a sub folder. Might be knowledge. But for Linux or even Windows/Mac as long you don't need desktop fully. Devcontainer is simple. A standard that works. And it's very mature.
mixedbit today at 9:12 AM
I work on a sandboxing tool similarly based on an idea to point the user home dir to a separate location (https://github.com/wrr/drop). While I experimented with using overlayfs to isolate changes to the filesystem and it worked well as a proof-of-concept, overlayfs specification is quite restrictive regarding how it can be mounted to prevent undefined behaviors.

I wonder if and how jai managed to address these limitations of overlayfs. Basically, the same dir should not be mounted as an overlayfs upper layer by different overlayfs mounts. If you run 'jai bash' twice in different terminals, do the two instances get two different writable home dir overlays, or the same one? In the second case, is the second 'jai bash' command joining the mount namespace of the first one, or create a new one with the same shared upper dir?

This limitation of overlays is described here: https://docs.kernel.org/filesystems/overlayfs.html :

'Using an upper layer path and/or a workdir path that are already used by another overlay mount is not allowed and may fail with EBUSY. Using partially overlapping paths is not allowed and may fail with EBUSY. If files are accessed from two overlayfs mounts which share or overlap the upper layer and/or workdir path, the behavior of the overlay is undefined, though it will not result in a crash or deadlock.'

hiq today at 1:25 PM
Is there already some more established setup to do "secure" development with agents, as in, realistically no chance it would compromise the host machine?

E.g. if I have a VM to which I grant only access to a folder with some code (let's say open-source, and I don't care if it leaks) and to the Internet, if I do my agent-assistant coding within it, it will only have my agent credentials it can leak. Then I can do git operations with my credentials outside of the VM.

Is there a more convenient setup than this, which gives me similar security guarantees? Does it come with the paid offerings of the top providers? Or is this still something I'd have to set up separately?

lemontheme today at 8:01 AM
And for the macos users, I can’t recommend nono enough. (Paying it forward, since it was here on HN that I learned about it.)

Good DX, straightforward permissions system, starts up instantly. Just remember to disable CC’s auto-updater if that’s what you’re using. My sandbox ranking: nono > lima > containers.

gck1 today at 5:41 AM
It's full VM or nothing.

I want AI to have full and unrestricted access to the OS. I don't want to babysit it and approve every command. Everything that is on that VM is a fair game and the VM image is backed up regularly from outside.

This is the only way.

pkulak today at 4:16 PM
Installation is a bit... unsupported unless you're on Arch. Here's a Nix setup I (and Claude!) came up with:

https://github.com/pkulak/nix/tree/main/common/jai

Arg, annoying that it puts its config right in my home folder...

EDIT: Actually, I'm having a heck of a time packaging this properly. Disregard for now!

EDIT2: It was a bit more complicated than a single derivation. Had to wrap it in a security wrapper, and patch out some stuff that doesn't work on the 25.11 kernel.

neilwilson today at 6:29 AM
It's always struck me that agents should be operated via `systemd-run` as a transient scope unit with the necessary security properties set

So couldn't this be done with an appropriate shell alias - at least under linux.

Bender today at 1:17 PM
I would have to be very inebriated to give a bot/agent access to my files and all security clearance should be revoked but should I do that it would have to be under mandatory access controls that my unprivileged user has no influence over, not even with sudo or doas. The LSM enforced rules (SELinux, AppArmor, TOMOYO, other newer or simpler LSM's) would restrict all by default and give explicit read, write, execute permissions to specific files or directories.

The bot should also be instructed that it gets 3 strikes before being removed meaning it should generate a report of what it believes it wants to access to and gets verbal approval or denial. That should not be so difficult with today's bots. If it wants to act like a human then it gets simple rules like a human. Ask the human operator for permission. If the bot starts "doing it's own thing, aka going rogue" then it gets punished. Perhaps another bot needs to act as a dominatrix to be a watcher over the assistant bot.

micimize today at 3:08 PM
This is very cool - I try to have a container-centric setup but sometimes YOLOcal clauding is too tempting.

My biggest question skimming over the docs is what a workflow for reviewing and applying overlay changes to the out-of-cwd dirs would be.

Also, bit tangential but if anyone has slightly more in-depth resources for grasping the security trade-offs between these kind of Linux-leveraging sandboxes, containers, and remote VMs I'd appreciate it. The author here implies containers are still more secure in principle, and my intuition is that there's simply less unknowns from my perspective, but I don't have a firm understanding.

Anyhow, kudos to the author again, looks useful.

triilman today at 1:37 AM
What would Jonathan Blow think about this.
Ciantic today at 8:33 AM
I've been using podman, and for me it is good enough. The way I use it I mount current working directory, /usr/bin, /bin, /usr/lib, /usr/lib64, /usr/share, then few specific ~/.aspnet, ~/.dotnet, ~/.npm-global etc. I use same image as my operating system (Fedora 43).

It works pretty well, agent which I choose to run can only write and see the current working directory (and subdirectories) as well as those pnpm/npm etc software development files. It cannot access other than the mounted directories in my home directory.

Now some evil command could in theory write to those shared ~/.npm-global directories some commands, that I then inadvertently run without the container but that is pretty unlikely.

stavros today at 3:11 AM
I'd really like to try this, but building it is impossible. C++ is such a pain to build with the "`make`; hunt for the dependency that failed; `apt-get install whatever-dev`; goto make" loop...

Please release binaries if you're making a utility :(

mark_l_watson today at 2:27 PM
Looks good, but only Linux is supported. I like spinning up VPS’s and then discarding them when I am done. On macOS, something I haven/t tried yet but plan to: create a separate user account.
Game_Ender today at 11:36 AM
Where is the network isolation? I want to be able to be able to limit what external resources the agent can access and also inject secrets at request time so the agent does have access to them.

File system isolation is easy now, it’s not worth HN front page space for the n’th version. It’s a solved problem (and now included in Claude clCode).

RodMiller today at 2:51 PM
Sandboxing and verification are two different things. Sandboxing answers what can this agent touch. Verification answers what does it actually do with what it touches. Even inside a perfect jail, the agent can still hallucinate, exfiltrate data over the network, or fold the second you push back on its answer.

I've been building an independent benchmarking platform for AI agents. The two approaches are complementary. Sandbox the environment, verify the agent.

torarnv today at 9:18 AM
I’m using https://github.com/torarnv/claude-remote-shell for this, which runs Claude’s Bash tool on a remote machine but leaves Claude running locally otherwise.

I’ve found it to be a good balance for letting Claude loose in a VM running the commands it wants while having all my local MCPs and tools still available.

driverdan today at 1:07 PM
Are there any similar ways of isolating environment variables, secrets, and credentials? Everyone is thinking about the file system but I haven't seen as much discussion about exposing secrets and account access.
cozzyd today at 1:57 AM
Should be named Jia

More seriously, I'm not a heavy agent user, but I just create a user account for the agent with none of my own files or ssh keys or anything like that. Hopefully that's safe enough? I guess the risk is that it figures out a local privilege escalation exploit...

thedelanyo today at 11:19 AM
Most of what we're doing with Ai today, we've been doing it pretty just fine without any confusion.

I've been struggling to find what Ai has intrinsically solved new that gives us the chance to completely change workflows, other these weird things occuring.

e1g today at 2:57 AM
For jailing local agents on a Mac, I made Agent Safehouse - it works for any agent and has many sane default for developers https://agent-safehouse.dev
vijucat today at 1:54 PM
Well, I'm on Windows (+ Cygwin) and wrote a Dockerfile. It wasn't that hard. git branch + worktree + a docker container per project and I can work with copilot in --yolo mode (or claude --dangerously-skip-permissions, whichever). vscode is pretty smooth at installing the VS Code Server on first connection to a docker container, too, and I just open up the workspace in a minute.
waterfisher today at 3:27 AM
There's nothing wrong with an AI-designed website, but I wish when describing their own projects that HN contributors wrote their own copy. As HN posters are wont to say, writing is thinking...
wafflemaker today at 9:25 AM
Sorry if this question is stupid, (I'm not even using Claude*), but why can't people run Claude/other coding agent in a container and only mount the project directory to the container?

*I played with codex a few months ago, but I don't even work in IT.

bob1029 today at 8:21 AM
I've been running GPT5.x fully unconstrained with effective local admin shell for over $500 worth of API tokens. Not once has it done something I'd consider "naughty".

It has left my project in a complete mess, but never my entire computer.

  git reset --hard && git clean -fd 
That's all it takes.

I think this is turning into a good example of security theatrics. If the agent was actually as nefarious as the marketing here suggests, the solution proposed is not adequate. No solution is. Not even a separate physical computer. We need to be honest about the size of this problem.

Alternatively, maybe Claude is unusually violent to the local file system? I've not used it at all, so perhaps I am missing something here.

jqbd today at 10:19 AM
Would like to see something more comprehensive built on zfs and freebsd jails. Namely snapshot/checkpoint before each prompt, quick undo for changes made by agent, auto delete old snapshots etc
youknownothing today at 3:41 PM
This is a great time for Apple to relaunch their Time Machine devices, have a history of everything in your file system because sooner or later some AI is going to delete it...
r0l1 today at 8:29 AM
Just use DevContainers. Can't understand people letting AI go wild on their systems...
mbreese today at 2:02 AM
This still is running in an isolated container, right?

Ignoring the confidentiality arguments posed here, I can’t help to think about snapshotting filesystems in this context. Wouldn’t something like ZFS be an obvious solution to an agent deleting or wildly changing files? That wouldn’t protect against all issue the authors are trying to address, but it seems like an easy safeguard against some of the problems people face with agents.

holtwick today at 12:27 PM
Inspired by this tool I wrote something that fits macOS better. It uses the native sandbox-exec from Apple and can wrap other apps as well, like VSCode in which you usually run AI stuff. https://github.com/holtwick/bx-mac
jbverschoor today at 3:20 AM
Interesting take on the same problem

I created https://github.com/jrz/container-shell which basically launches a persistent interactive shell using docker, chrooted to the CWD

CWD is bind mounted so the rest is simply not visible and you can still install anything you want.

georaa today at 8:27 AM
Filesystem containment solves one half of the blast radius problem. The other half is external state - agent hits a payment API, writes to a database, sends an email. Copy-on-write overlays can't roll that back. I've seen agents make 40 duplicate API calls because they crashed mid-task and retried from scratch with no deduplication. The filesystem was fine. The downstream systems were not. The hard version of this problem is making agent operations idempotent across external calls, not just safe locally.
adi_kurian today at 1:44 AM
Claude's stock unprompted / uninspired UI code creates carbon clone components. That "jai is not a promise of perfect safety" callout box is like the em dash of FE code. The contrast, or lack thereof, makes some of the text particularly invisible.

I wonder if shitty looking websites and unambitious grammar will become how we prove we are human soon.

ta-run today at 7:08 AM
Idk, just feels so counter sometimes to build and refine these (seemingly non-deterministic) tools to build deterministic workflows & get the most productivity out of them.
simonw today at 1:44 AM
Suggestion for the FAQ page: does this work on a Mac?
Waterluvian today at 3:43 AM
Are mass file deletions as result of some plausible “I see why it would have done that” or will it just completely randomly execute commands that really have nothing to do with the immediate goal?
ma2kx today at 5:41 PM
Its a bit annoying that there are so many solutions to run agents and sandbox them but no established best practice. It would be nice to have some high level orchestration tools like docker / podman where you can configure how e.g. claude code, opencode, codex, openclaw run in open Shell, OCI container, jai etc.

Especially because everybody can ask chatgpt/claude how to run some agents without any further knowledge I feel we should handle it more like we are handling encryption where the advice is to use established libraries and don't implement those algorithms by yourself.

Myzel394 today at 3:55 PM
What's the difference between this and agent-safehouse?
ozim today at 8:15 AM
I have seen it just 5 mins ago Claude misspelled directory path - for me it was creating a new folder but I can image if I didn’t stop it it could start removing stuff just because he thinks he needs to start from scratch or something.
Jach today at 2:51 AM
I've done some experimenting with running a local model with ollama and claude code connecting to it and having both in a firejail: https://firejail.wordpress.com/ What they get access to is very limited, and mostly whitelisted.
sanskritical today at 6:12 AM
How long until agents begin routinely abusing local privilege escalation bugs to break out of containers? I bet if you tell them explicitly not to do so it increases the likelihood that they do.
hoppp today at 2:10 PM
Something like freeBSD jails would be perfect for agents.
mazieres today at 12:39 AM
What would it take for people to stop recklessly running unconstrained AI agents on machines they actually care about? A Stanford researcher thinks the answer is a new lightweight Linux container system that you don't have to configure or think about.
ontouchstart today at 10:31 AM
AI safety is just like any technology safety, you can’t bubble wrap everything. Thinking about early stage of electricity, it was deadly (and still is), but we have proper insulation and industry standards and regulations, plus common sense and human learning. We are safe (most of the time).

This also applies to the first technology human beings developed: fire .

Aldipower today at 10:27 AM
$ lxc exec claude bash

Easy :-) lxd/lxc containers are much much underrated. Works only with Linux though.

imranstrive7 today at 12:33 PM
I tried something similar while building my tool site — biggest issue was SEO indexing. Fixed it by improving internal linking instead of relying on sitemap.
deleted today at 7:20 AM
yalogin today at 4:48 AM
What if Claude needs me to install some software and hoses my distro. Jai cannot protect there as I am running the script myself
deleted today at 4:01 AM
justinde today at 2:14 AM
.claude/settings.json: { "sandbox": { "enabled": true, "filesystem": { "allowRead": ["."], "denyRead": ["~/"], "allowWrite": ["."] } } }

Use it! :) https://code.claude.com/docs/en/sandboxing

cozzyd today at 2:03 AM
Should definitely block .ssh reading too...
love2read today at 12:10 PM
Is there an equivalent for macOS?
faangguyindia today at 3:10 AM
i just use seatbelt (mac native) in my custom coding agent: supercode
messh today at 1:38 AM
How is this different than say bubblewrap and others?
gonzalohm today at 2:23 AM
Not sure I understand the problem. Are people just letting AI do anything? I use Claude Code and it asks for permission to run commands, edit files, etc. No need for sandbox
docmars today at 12:47 PM
Jai is the name of a programming language, no?
MagicMoonlight today at 12:31 PM
This site was definitely slopcoded with Claude. They have a real distinctive look.
mbravorus today at 10:32 AM
or you can just run nanoclaw for isolation by default?

https://nanoclaw.dev

0xbadcafebee today at 7:17 AM
If it has a big splash page with no technical information, it's trying to trick you into using it. That doesn't mean it isn't useful, but it does mean it's disingenuous.

This particular solution is very bad. To start off with, it's basically offering you security, right? Look, bars in front of an evil AI! An AI jail! That's secure, right? Yet the very first mode it offers you is insecure. The "casual" mode allows read access to your whole home directory. That is enough to grant most attackers access to your entire digital life.

Most people today use webmail. And most people today allow things like cookies to be stored unencrypted on disk. This means an attacker can read a cookie off your disk, and get into your mail. Once you have mail, you have everything, because virtually every account's password reset works through mail.

And this solution doesn't stop AI exfiltration of sensitive data, like those cookies, out the internet. Or malware being downloaded into copy-on-write storage space, to open a reverse shell and manipulate your existing browser sessions. But they don't mention that on the fancy splash page of the security tool.

The truth is that you actually need a sophisticated, complex-as-hell system to protect from AI attacks. There is no casual way to AI security. People need to know that, and splashy pages like this that give the appearance of security don't help the situation. Sure, it has disclaimers occasionally about it not being perfect security, read the security model here, etc. But the only people reading that are security experts, and they don't need a splash page!

Stanford: please change this page to be less misleading. If you must continue this project with its obviously insecure modes, you need to clearly emphasize how insecure it is by default. (I don't think it even qualifies as security software)

te_chris today at 10:09 AM
This looks nice, but on mac you can virtualise really easily into microvms now with https://github.com/apple/container.

I've built my own cli that runs the agent + docker compose (for the app stack) inside container for dev and it's working great. I love --dangerously-skip-permissions. There's 0 benefit to us whitelisting the agent while it's in flight.

Anthropic's new auto mode looks like an untrustworthy solution in search of a problem - as an aside. Not sure who thought security == ml classification layer but such is 2026.

If you're on linux and have kvm, there's Lima and Colima too.

samchon today at 4:41 AM
Just allowing Yolo, and sometimes do rolling back
albert_e today at 5:07 AM
Can we have a hardware level implementation of git (the idea of files/data having history preserved. Not necessarily all bells and whistles.) ...in a future where storage is cheap.
KennyBlanken today at 4:44 AM
This is not some magical new problem. Back your shit up.

You have no excuse for "it deleted 15 years of photos, gone, forever."

kristofferR today at 2:33 AM
samlinnfer today at 5:17 AM
Now we just need one for every python package.
charcircuit today at 2:23 AM
I want agents to modify the file system. I want them to be able to manage my computer if it thinks it's a good idea. If a build fails due to running out of disk space I want it to be able to find appropriate stuff to delete to free up space.
GistNoesis today at 9:45 AM
TLDR: It's easy : LLM outputs are untrusted. Agents by virtue of running untrusted inputs are malware. Handle them like the malware they are.

>>> "While this web site was obviously made by an LLM" So I am expecting to trust the LLM written security model https://jai.scs.stanford.edu/security.html

These guys are experts from a prestigious academic institution. Leading "Secure Computer Systems", whose logo is a 7 branch red star, which looks like a devil head, with white palm trees in the background. They are also chilling for some Blockchain research, and future digital currency initiative, taking founding from DARPA.

The website also points towards external social networks for reference to freely spread Fear Uncertainty Doubt.

So these guys are saying, go on run malware on your computer but do so with our casual sandbox at your own risk.

Remember until yesterday Anthropic aka Claude was officially a supply chain risk.

If you want to experiment with agents safely (you probably can't), I recommend building them from the ground up (to be clear I recommend you don't but if you must) by writing the tools the LLM is allowed to use, yourself, and by determining at each step whether or not you broke the security model.

Remember that everything which comes from a LLM is untrusted. You'll be tempted to vibe-code your tools. The LLMs will try to make you install some external dependencies, which you must decide if you trust them or not and review them.

Because everything produced by the LLM is untrusted, sharing the results is risky. A good starting point, is have the LLM, produce single page html page. Serve this static page from a webserver (on an external server to rely on Same Origin Policy to prevent the page from accessing your files and network (like github pages using a new handle if you can't afford a vps) ). This way you rely on your browser sandbox to keep you safe, and you are as safe as when visiting a malware-infested page on the internet.

If you are afraid of writing tools you can start by copy-pasting, and reading everything produced.

Once you write tools, you'll want to have them run autonomously in a runaway loop taking user feedback or agent feedback as input. But even if everything is contained, these run away loop can and will produce harmful content in your name.

Here is such vibe-coded experiment I did a few days ago. A simple 2d physics water molecules simulation for educational purposes. It is not physically accurate, and still have some bugs, and regressions between versions. Good enough to be harmful. https://news.ycombinator.com/item?id=47510746

iisweetheartii today at 2:04 PM
[dead]
maltyxxx today at 3:59 PM
[dead]
pugchat today at 11:18 AM
[dead]
minsung0830 today at 2:52 PM
[dead]
rsmtjohn today at 12:40 PM
[dead]
techpulselab today at 7:30 AM
[dead]
jeninho today at 4:19 PM
[dead]
hikaru_ai today at 7:31 AM
[dead]
commers148 today at 8:38 AM
[dead]
kevinbaiv today at 5:57 AM
[dead]
orthogonalinfo today at 6:49 AM
[dead]
emiliazar today at 1:56 PM
[dead]
Rikyz90 today at 9:03 AM
[dead]
drtournier today at 1:32 AM
[flagged]
gerdesj today at 1:50 AM
[flagged]
avazhi today at 3:07 AM
The irony is they used an LLM to write the entire (horribly written) text of that webpage.

When is HN gonna get a rule against AI/generated slop? Can’t come soon enough.

rdevsrex today at 3:28 AM
This won't cause any confusion with the jai language :)
schaefer today at 4:55 AM
Ugh.

The name jai is very taken[1]... names matter.

[1]: https://en.wikipedia.org/wiki/Jai_(programming_language)