[WIP] "packing" object storage design documentation
This diff adds a design considerations document to the objstorage documentation.
It outlines the problem our object storage is trying to solve, the solutions we've come up with so far, as well as a draft of a new design for a more disk-efficient "packed" object storage, based on experimentations and some literature review around Ceph.
And yes, this is the description of a (somewhat crude) filesystem, trying to balance cramming tiny objects together to avoid wasting space with the ability to store files (way) larger than RADOS supports efficiently.
There's a few TODO points that need to be cleared before this can be implemented:
-
How to efficiently handle index blocks. There is some literature regarding B-Trees backed with RADOS/Ceph which might be interesting to investigate: https://ceph.com/wp-content/uploads/2017/01/CawthonKeyValueStore.pdf. The only issue I can see is that Erasure Coded pools don't support OMAP metadata, which would force the index to be written to a separate, replicated pool.
-
When adding a small object, how to select which data block to write it to. Easy to solve for a single writer (just keep a list of the last block you've written to for the given object size), harder to do properly with several distributed writers.
-
How to handle object restores (i.e. overwriting data on an index node) and deletions. Erasure coded data pools don't support overwriting objects unless you turn a knob on, only create and append.
-
Add some more links to the documents that inspired the design.
Test Plan
cd docs; make html
Migrated from D398 (view on Phabricator)