cookers/git_bare: Speedup repository cooking with multi-threading
Previously when cooking a git bare repository, contents bytes were fetched sequentially which could take a good amount of time for an origin with a large revisions history.
In order to speedup the cooking process, retrieve the contents bytes
in parallel with the help of the concurrent.futures
module from the
Python standard library which fits particularly well for making loops
of I/O-bound tasks concurrent and for issuing tasks asynchronously.
Below are the timings for cooking a git bare repository for the currently archived tip revision of swh-model.
- Without multi-threading:
$ time swh -l DEBUG vault cook -C /tmp/vault.yml --bundle-type git_bare swh:1:rev:51b5aa94f13c4bd7358475d78fb7d5684cfb6fd1 /tmp/git_repo.tar
real 16m43,282s
user 0m13,142s
sys 0m1,462s
- With multi-threading:
$ time swh -l DEBUG vault cook -C /tmp/vault.yml --bundle-type git_bare swh:1:rev:51b5aa94f13c4bd7358475d78fb7d5684cfb6fd1 /tmp/git_repo.tar
real 2m23,676s
user 0m13,520s
sys 0m1,310s
The code that retrieves directory data in parallel has also been ported to concurrent.futures
in another commit.