Mounting tar archives as a filesystem in WebAssembly
39 points - today at 10:13 AM
SourceComments
It uses IndexedDB for the filesystem.
Rather Dumbly it is loading the files from a tar archive that is encoded into a PNG because tar files are one of the forbidden file formats.
It lets you mount .tar files as a read only filesystem.
Itβs cool because you basically get random access to the tarball without paying any decompression costs. (It builds an index saying exactly where so-and-so is for every file.)
The gzip-random-access problem one is a lot more difficult because the gzip has internal state. But in any case, solutions exist! Apparently the internal state is only 32kB, so if you save this at 1MB offsets, you can reduce the amount of data you need to decompress for one file access to a constant. https://github.com/mxmlnkn/ratarmount does this, apparently using https://github.com/pauldmccarthy/indexed_gzip internally. zlib even has an example of this method in its own source tree: https://github.com/gcc-mirror/gcc/blob/master/zlib/examples/...
All depends on the use case of course. Seems like the author here has a pretty specific one - though I still don't see what the advantage of this is vs extracting in JS and adding all files individually to memfs. "Without any copying" doesn't really make sense because the only difference is copying ONE 1MB tar blob into a Uint8Array vs 1000 1kB file blobs
One very valid constraint the author makes is not being able to touch the source file. If you can do that, there's of course a thousand better solutions to all this - like using zip, which compresses each file individually and always has a central index at the end.