I'm glad this article includes the only credible fix for the HTTP leak problems: CSP.
A useful thing I learned recently is that, while CSP headers are usually set using HTTP headers, you can also reliably set them directly in HTML - for example for HTML generated directly on a page where HTTP headers don't come into play:
It feels like this shouldn't work, because JavaScript in the untrusted content could use the DOM to delete or alter that meta tag... but it turns out all modern browsers specifically lock that down, treating those CSP rules as permanent as soon as that meta tag has loaded before any malicious code has the chance to subvert them.
> Stacking more and more complexity into sanitization is clearly a doomed approach. We are more than 5 major revisions deep and yet there are still known holes. People are actively sharing projects on the Scratch website bypassing SVG sanitization. And the moment browsers decide to implement the latest CSS specs, even more holes will open up.
This kind of stuff will
get way worse with LLMs. They like just stacking more and more code on top of workarounds.
andybakyesterday at 3:55 PM
My first thought is "support a tiny subset of svg that probably still covers 90% of real-world use cases".
I do feel that's there's two distinct types of svg - "bunch of paths with fills" and "clever dangerous stuff" where most real SVGs are of the former type.
Fully expect this to be shot down by someone that's thought about this problem for longer than the 120 seconds I just spent. :)
nmiloyesterday at 10:41 PM
I'm sorry because I love the scratch project but this has to be said: they found XSS in SVGs in a surface with attacker-controlled access to Node and their fix was sanitizing it using regex??? And this was discovered by a user on scratch?
Even worse, OP's latest post "Every version of Scratch is vulnerable to arbitrary code execution" just tells you how exactly to exploit something similar today in the current version with no mention of responsible disclosure except a plug to say, "hey, check out my project, this one doesn't have RCE!" This is so irresponsible it borders on malicious.
evilpieyesterday at 5:22 PM
The HTML Sanitizer API has a subset of SVG that is allowed by the default configuration. It won't help you with sanitizing CSS at all however, style is simply not allowed by default.
It'd be nice if there was a sandbox attribute you could add to inline <svg> tags, like the <iframe sandbox> attribute that'd let you opt out of all the potentially "dynamic" stuff inside of an SVG like scripts and event handlers, or even just literally sandbox the entire thing from accessing the "parent" HTML page's context/cookies/etc just like an iframe.
I'm sure it'd just open up a whole other can of worms though... not to mention having to wait for browsers to actually support it.
The real solution here is definitely CSP + basic sanitisation though.
ikkunyesterday at 4:27 PM
I do wish tinyVG or similar would take off, but I don't see that ever actually happening. the only thing I think it's missing is animation support, which is pretty niche but not as niche as <script> tags.
It seems the reason they're inlined in the page at all is to measure things briefly like bounding boxes (not sure the full extent as it didn't cover that), before subsequent removal. I'm not familiar with Scratch and its use of user-submitted SVGs but I'd be curious to read more about what they're doing that required it be inlined specifically.
(This isn't a comment on the challenges in proper sanitization fwiw, as I've needed to do various of the same things myself)
codedokodetoday at 4:25 AM
I don't like that SVG uses things like CSS and JS and requires pulling in the whole browser to display. Instead of being a simple vector image format, it became just an extension of HTML. Maybe we need a new format, and if someone decides to do it, please add ability to embed fonts, wrap text and decent animations.
bawolffyesterday at 7:05 PM
These aren't really SVG specific issues. They are all pretty standard XSS that apply to html and are very well known vectors.
Like this post didn't even mention presentational attributes, like how cursor attribute can contain a url that gets loaded. Or any of the other tricky parts of svg sanitization, like using dtd to bypass things.
Liftyeeyesterday at 9:01 PM
I'm not familiar with the details of real software development, so I don't know why it's not possible to just "not give the SVG part of the code internet access" or "perform sanitization on post-decoding (url, hex, etc) data".
Is it because the SVG parser/renderer being used is an entire library, and it would be prohibitive to write your own SVG parser/renderer or insert your own code into the existing one?
wingitoday at 9:26 AM
thank you for this post.
kevinmgrangeryesterday at 5:22 PM
> This was fixed by using a regular expression to remove script tags.
The infamous you can't parse (X)HTML with regex¹ meme is from 2009, yet this fix was done in 2019. I guess the SO answer never mentioned SVG.
For the "<script>" stuff: regardless of how the thing is spelled or otherwise obscured, the HTML5 parser eventually knows when it's gotten hold of a script tag. Oops, we got one in a NOSCRIPTTAG context. Let's poop out.
Tag names, attributes, attribute values, event callback default-cancelers... so many ways to declare that this node and its children shouldn't parse/evaluate scripts.
As Jay-Z said: "I've got 99 solutions, fixing a problem ain't one"
etchalonyesterday at 4:41 PM
I don't understand why it wasn't immediately understood that SVG is as dangerous as HTML.
It is not, and never was, an image format. It's a markup language.
NooneAtAll3yesterday at 5:51 PM
wait... scratch is just a browser?
Theodoresyesterday at 6:22 PM
Maybe we need a dumbed down version 3 of SVG where the browser knows it is not to do anything that requires fetching a URL, to make the image as harmless as a JPG.
This version 3 could have the version number changed to 2 in order to do cool SVG things, so full-fat SVG as version 2 is now. But you could just flip to 2 to a 3 on upload, so any embedded URLs are harmless.
This could be useful for the creator too, as it is helpful to have layers of source images in bitmap format to work with, and you can easily export such things accidentally.
Devastayesterday at 6:13 PM
> In 2019, a few months after the initial release of Scratch 3, Scratch discovered that SVGs can contain <script> tags that Scratch would cause to be executed when the SVG loads. This is known as an XSS.