Chris Bugg

7 - πfs

(2021-11-05)

Today I discovered πfs! "πfs is a revolutionary new file system that, instead of wasting space storing your data on your hard drive, stores your data in π!". This mostly joking tool is (probably) intended for a proof of 'because we can' and isn't meant to be taken seriously, sadly, my brain wouldn't drop it.

A few months ago I listened to a podcast that went through the history of an old, storied algorithm. I would say it's a compression algorithm (as is often mentioned in the show) or an encoding algorithm (as some claim) or a data sharing technique (as Wikipedia says) but sadly the algorithm is lost to time and nobody knows if it even was real. The 'idea' generally seems to be to encode data in a way that the data can be reconstructed from a shared look-up table. Although at the time this was a pretty novel concept, it's not the only instance of this concept, and forms of Delta encoding are widely used for many things today (Git, Chrome & Android updates, Dropbox, Debian updates, etc.). Everybody seems to understand that when a client has a file that contains most of what you're sending them, it's always better to just send the diff.

While that message seems to have been heard, actually getting everyone on-board with sending initial data in such an encoded fashion has had less uptake. In Google's SDCH they focused only on compressing HTTP Headers (metadata) and, of course, you still needed to download the lookup table first. This meant the power of this incredible 'compression' wasn't being used on the actual content (data) and the (sometimes much bigger) lookup table needed to be downloaded prior to being able to use it...to avoid downloading so much. This contradictory stance may indicate why the concept hasn't been readily adopted for everything (yet).

And thus my brain finds πfs, reads the documentation, and thinks "Gosh, this is exactly what's missing from that compression idea!" To me, this solves one of the fundamental problems of a shared lookup table, sharing it. The smaller the lookup table, the worse the 'compression' will be, the larger the lookup table, the more time, wasted bandwidth, and wasted space there will be. It's a fine balancing act that helps preclude usage. With π however, there's no need to waste bandwidth transferring the table, because it can always be (pre)computed by the client! Further, because it's π and not some vendor-specific blob, you don't need to (re-)download it, ever! Then there's storage and time. Assuming we could get enough folks on-board with the idea, it would be pre-computed on boot/launch and stored in RAM (or on disk if you really wanted). Again, if popular enough, it would be used by many programs, and so there wouldn't be any additional time/storage for any particular program. Presumably it would go the way of the NN and start getting it's own chip/optimizations in hardware to negate these costs even further.

What do you think? Is π the next big thing? Is π going to end our bandwidth woes and bring-about 24k Netflix in a few years? Ya, almost certainly that's a no. Google itself moved on from SDCH to make Brotli. Aside from generally being a better compression algorithm, Brotli also uses a 120 KiB lookup table of "common words, phrases and other substrings". Although it's not an infinite corpus, it gives Brotli a ~20% 'compression' advantage over gzip in HTML pages. That's huge, and a big win for 'compression' in general. I'll take it, for now.

Happy Hacking!
- Chris

Chris Bugg

Developer, Tech-Engineer, Woodworker

(and other stuff)

7 - πfs