Staging instance, all changes can be removed at any time

Skip to content

Add support for removing RawExtrinsicMetadata and ExtID

While we were able to backup, restore and remove most object types already, we were missing support for two other object types present in our storage: RawExtrinsicMetadata and ExtID.

Both are primarily accessed not through their own identifier but by the identifiers of their target: the object they are associated with. When we remove an object, we want to remove all RawExtrinsicMetadata or ExtID associated with them.

While they share this common property, they are handled slightly differently:

RawExtrinsicMetadata can be addressed themselves using Extended SWHIDs. swh.alter.inventory.get_raw_extrinsic_metadata() will take a list of targets and return a list of SWHIDS for the associated RawExtrinsicMetadata. It makes sure to recursively add any RawExtrinsicMetadata referencing a relevant RawExtrinsicMetadata.

This list of RawExtrinsicMetadata SWHIDs can thus be added to the list of objects to be removed in Remover.get_removable(). The ObjectionDeletionInterface.object_delete() method of swh-storage will then take care of removing RawExtrinsicMetadata objects.

ExtID are not directly addressable. We thus handle them a bit like OriginVisit and OriginVisitStatus objects: we find them and add them to the recovery bundle while adding their target. They are deleted using ObjectDeletionInterface.extid_delete_for_target() after we have deleted the targets.

In both cases, we have to accept that there is the possibility that new RawExtrinsicMetadata or ExtID objects are added in between the listing and their deletion. In the case of RawExtrinsicMetadata, these objects would still be present in the storage, while hard to reach. A scrubber job could look for these and remove them. For ExtID however, they will be entirely lost: some information will be missing from the archive in case of a recovery.

Adding RawExtrinsicMetadata and ExtID objects to the recovery bundle requires to bump the format version. The tests are updated to make sure that we can still restore older bundles.

Removing and restoring RawExtrinsicMetadata objects raise questions on how to handle MetatadaAuthority and MetadataFetcher objects. Currently the latter are not removed by swh-alter. This means that we can assume that the required objects will be present in the storage when adding RawExtrinsicMetadata objects from a recovery bundle. But this means we might create dangling MetadataAuthority and MetadataFetcher objects. The issue is tracked as #21

Closes #11 (closed)

Edited by Jérémy Bobbio (Lunar)

Merge request reports

Loading