Staging instance, all changes can be removed at any time

Skip to content

git: Ingest ordered tags then ordered branches references

Antoine R. Dumont requested to merge improve-loader-git-sort-refs into master

Note: This is yet another round of rebases and adaptations from latest changes. Created out of my fork this time so I had to close !136 (closed) and !137 (closed).


This introduces the means to configure the packfile fetching policy. The default, as before, is to fetch one packfile to ingest everything unknown out of it. When fetch_multiple_packfiles is True (and the ingestion passes through the 'smart' protocol), the ingestion uses packfiles (with a given number_of_heads_per_packfile). After each packfile is loaded, a 'partial' (because incomplete) and 'incremental' (as in gathering seen refs so far) snapshot is created.

Even if the new fetching policy were activated, this should not impact how small to medium repositories are ingested.

The end goal is to decrease the potential issues of failure during loading large repositories (with large packfiles) and to allow the eventual next loading to pick up where the last loading failure occurred.

It's not perfect yet because it also depends on how the repository git graph connectivity. For example, if it happens that first 200 references are fully connected, then we will retrieve everything in one round anyway.

Implementation wise, this adapts the current graph walker (which is the one resolving the missing local references from the remote references) so it won't walk over already fetched references when multiple iterations is needed.

This also makes the loader git explicitely create partial visit when fetching packfiles. That is, the loader now creates partial visits with snapshot after each packfile consumed. The end goal being to decrease the work the loader would have to do again if the initial visit would not complete for some reasons.

This should allow to gradually ingest the git history. If we don't deal with {master/main/HEAD} references first, it sounds highly plausible we won't have a mostly full connected graph at first (except for repositories with no tags). Started from tag v0.1.0, v0.2.0, etc...

If we couple this with the option to ingest repository with multiple packfiles, this should allow to ingest more incrementally large repositories (as it creates partial snapshots at each fetch/store loop).

Related to #3625 (closed) (among other things)

Edited by Antoine R. Dumont

Merge request reports

Loading