Write keys to the shard to make it self-contained
The keys associated with each object is currently only used to create and later to query the perfect-hash function. This means there is no way to easily learn about all keys present in a given shard. As long as the keys are hashes, we could recover them by computing the hash of each object, but this would be a fairly operation given the target size.
To make the shard self-contained, we should write the keys (32 bytes) for each object. If shards are 100 GiB, with 3 kiB for each objects, this would amount to 1 GiB of extra space (1%).
I see two places where the keys can be written:
Before each object content
Currently, for each object, we write the size of the data, followed by data itself. We could write the key before the size.
Pros:
- We can more easily salvage objects if the image is corrupted and split in half.
- We could implement recovery from an incomplete packing operation.
- We still have the key even if the index is missing or corrupted.
Cons:
- It’s harder to get a list of every keys.
In the index
The index is currently only the position of each object entry in the file. We could write the key in the index before the position.
Pros:
- We can scan the index to get all keys.
- We can do a linear scan to retrieve an object if the perfect-hash function is broken in any way.
Cons:
- If we are missing the index we lose all keys.