crates: Fixes and improvements
I tested the crates lister current state in the docker environment and noticed several issues and improvements to bring.
Below is the commit log of the changes.
commit 6dd62b35f8b690559f639c8a375d93855aaced94 (HEAD -> crates-lister-fixes, anlambert/crates-lister-fixes)
Author: Antoine Lambert <anlambert@softwareheritage.org>
Date: Fri Aug 23 11:46:24 2024 +0200
crates: Remove crates metadata as loader argument
Those extrinsic metadata can be directly fetched by the loader
through the crates Web API, plus it contains more metadata fields.
commit 0af8a332a5bc0d8fc3cfc49df8009eb524153cbd
Author: Antoine Lambert <anlambert@softwareheritage.org>
Date: Thu Aug 22 16:38:44 2024 +0200
crates: Speedup listing by processing crates in batch
Instead of having a single crate and its versions info per page,
prefer to have up to 1000 crates per page to significantly speedup
the listing process.
commit 7ca067e82e4a19e5381ccc45579c019ad6963ec2
Author: Antoine Lambert <anlambert@softwareheritage.org>
Date: Wed Aug 21 16:12:37 2024 +0200
crates: Record lister state only if all crates were processed
Previously, the lister state was recorded regardless if errors occurred
when listing crates as the finalize method is called regardless of raised
exception during listing.
As a consequence some crates could be missed as the incremental listing
restarts from the dump date of the last processed crate database.
So ensure all crates have been processed by the lister before recording
its state.
commit e1f9ec540c66e4e05627f54f7354c785c33806fd
Author: Antoine Lambert <anlambert@softwareheritage.org>
Date: Wed Aug 21 15:17:28 2024 +0200
crates: Use looseversion.LooseVersion2 to parse crate versions
packaging.version.parse is dedicated to parse Python package version
numbers but crate versions do not necessarily respect Python version
number conventions and thus some crate versions cannot be parsed.
Prefer to use looseversion.LooseVersion2 instead which in a drop-in
replacement for deprecated distutils.version.LooseVersion and enables
to parse all kind of version numbers.
commit 6c16aeea7ed4dadc7ac309fe4b3ce33b46e9d36c
Author: Antoine Lambert <anlambert@softwareheritage.org>
Date: Wed Aug 21 13:46:03 2024 +0200
crates: Bump csv field size limit
A size limit of 1000000 was not enough to properly process
all CSV crates data so bump to a higher value.