Replace the Nixguix loader with a lister
Currently, loading Nix and Guix as single origins with a huge snapshot, with each branch name being a URL is wrong. We need to replace the Nixguix loader with a lister, which creates as many origins referenced by Nix and Guix public manifests. This would be closer to what we do with Debian/Ubuntu.
Define the following (see the hedgedoc [1] which details a proposition):
-
target structure sketch of the data in the archive -
define origin urls -
what kind of extrinsic metadata and/or extids are we storing -
what kind of snapshots we're generating
Plan:
-
swh/devel/swh-lister!427 (closed): Implement lister - [ ] swh/devel/swh-loader-core!446 (closed), ...: Adapt archive loader (package loader) to accept tarball from nixguix manifests(cannot work [2]) -
swh/devel/swh-loader-core!447 (closed): Implement ContentLoader (possibly as a package[2] core loader) to deal with content file with intrinsic metadata (out of nixguix manifests) -
swh/devel/swh-loader-core!436 (closed): Implement DirectoryLoader (possibly as a package[2] core loader~~) to deal with tarball with intrinsic metadata (out of nixguix manifests) -
swh/devel/swh-loader-core!437 (closed): Update implementations ^ dealing with unsupported integrity hash (sha512) -
#3781 (closed): lister run through docker -
swh/devel/swh-loader-core!438 (closed), #3781 (closed): loaders run through docker (directory ok, contents ok too but they are creating mismatchs due to faulty manifest integrity references) -
swh/devel/swh-lister!428 (closed): lister: Randomize origins order to ingest -
swh/devel/swh-lister!429 (closed): lister: Deal with mistyped origins -
swh/devel/swh-lister!430 (closed): lister: Fix expired ssl certificate -
swh/devel/swh-lister!432 (closed): lister: Fix connection error -
swh/devel/swh-lister!433 (closed): lister: Deal with pseudo url with missing schema -
swh/devel/swh-lister!435 (closed): lister> Deal with exotic urls so tarballs are recognized -
swh/devel/swh-lister!436 (closed): lister: Deal with misplaced git urls -
swh/devel/swh-lister!437 (closed): nixguix: Improve content type detection (those with charset were off) -
swh/devel/swh-core!332 (closed): swh.core.tarball: Add missing mimetype application/x-gzip -
swh/devel/swh-lister!438 (closed): lister: Refactor to simplify some computations -
swh/infra/ci-cd/swh-jenkins-dockerfiles!49 (closed): Make jenkins build with nix-store inside so future builds that needs it run correctly -
#3781 (closed): Fix mismatched computations for nixpkgs manifests -> nar hash support (impacts both lister and loader) -
swh/devel/swh-lister!434 (closed): lister adaptation to provide the correct information to the loaders -
swh/devel/swh-loader-core!440 (closed): {Content|Directory}Loader adaptation to be able to check this ^ -
swh/devel/swh-loader-core!441 (closed): Adapt standard/nar hash mismatch computation behavior (so they fail loading) -
swh/devel/swh-loader-core!439 (closed): Content "nar" checksum computation. files with "recursive" hashOutputMode exist - [ ] #3781 (closed): $1477: $1478: hash mismatch edge cases (so far) we cannot do anything about (yet?!),see next point
-
-
#4608 (closed): swh/devel/swh-lister!448 (closed): lister: Exclude faulty origins -
#4608 (closed): Notify upstream nixpkgs community about the missing information on "faulty" origins -
#4609 (closed): Notify upstream nixpkgs community about the misqualified "git" repositories as urls -
$1470: ContentLoader run in docker -
$1471: DirectoryLoader run in docker -
swh/devel/swh-environment!248 (closed), swh/devel/swh-environment!216 (closed): Deploy in docker -
$1474: Fix misqualified repositories detected as file (see pastes) -
$1475: Contents -
$1476: Directories
-
-
swh/devel/swh-lister!449 (closed): Add support for more tarball/zip extension -
swh/devel/swh-core!333 (closed): swh.core: Wire war support (and check other tarballs are already supported) -
swh/devel/swh-lister!450 (closed): Harden tarball support test dataset -
swh/devel/swh-lister!451 (closed): lister: Add another diff to filter out irrelevant origins (.iso, .bin, ...) -
#3781 (closed): Status -> further fixes (/me sighs) -
swh/devel/swh-lister!441 (closed): nixguix: Deal with edge case url with version instead of extension -
swh/devel/swh-lister!442 (closed): Use content-disposition -
infra/sysadm-environment#4655 Deploy in staging -
#4979 (closed): Store NAR hashes in ExtID mapping while loading -
Call for public review -
swh/infra/sysadm-environment#5223 (closed): Deploy in production when ok ^ -
swh/devel/swh-loader-core!518 (merged), swh/devel/docker!8 (merged): Drop no longer relevant nixguix loader -
swh/devel/swh-loader-core#4749 (closed): Document nixguix lister & loader
[1] Draft pad: https://hedgedoc.softwareheritage.org/2AQFbVB0S-OrOtkJV2yNJw
[2] It cannot. We may not have any versions received and package loader are currently relying on that particular data for its main ingestion algorithm.
Migrated from T3781 (view on Phabricator)
Edited by Antoine R. Dumont