lister.gnu: Standardize arguments to pass to the loader tar
This:
- Moves version parsing logic from loader-gnu to lister
- Parses the timestamps provided by gnu tree to isoformat dates (to provide to loader)
- (Aligns the timestamps stored in the lister model to be also a date in the right format to clarify reading from it)
This allows to have a generic "tar" loader. That way, the loader expects some required arguments in the right type already. No need for it to try and do specific computations depending on the origin of the archives.
Related !385 (closed)
Note: This actually implies a schema migration (timestamp model migration) As the loader gnu never ran though, we might as well drop the existing data and trigger a new run when time comes. So it's not a blocker.
Test Plan
tox
Migrated from D2147 (view on Phabricator)
Merge request reports
Activity
Build has FAILED
Link to build: https://jenkins.softwareheritage.org/job/DLS/job/tox/442/ See console output for more information: https://jenkins.softwareheritage.org/job/DLS/job/tox/442/console
Build is green See https://jenkins.softwareheritage.org/job/DLS/job/tox/443/ for more details.
Build is green See https://jenkins.softwareheritage.org/job/DLS/job/tox/444/ for more details.
mentioned in commit swh-loader-core@4c364648
Build is green See https://jenkins.softwareheritage.org/job/DLS/job/tox/446/ for more details.
Build has FAILED
Link to build: https://jenkins.softwareheritage.org/job/DLS/job/tox/448/ See console output for more information: https://jenkins.softwareheritage.org/job/DLS/job/tox/448/console
Build has FAILED
Link to build: https://jenkins.softwareheritage.org/job/DLS/job/tox/449/ See console output for more information: https://jenkins.softwareheritage.org/job/DLS/job/tox/449/console
Build has FAILED
Yes, it now depends on swh-scheduler!112 (closed)
I don't think this regexp is the right solution, it's quite fragile in case a package name uses one of the keywords.
Yes, it's fragile and we don't see any better solution right now. That's how we did it initially, we improved the current version to be a tad better though. And also it's tested even if it's limited.
A possible solution is for get_version to only accept URLs, and split the path to find the package name.
I'm not getting the difference with the current implem. filename is extracted from the uri in any case...
Do you have an example in mind, that'd help, i think.
TIA
! In !386 (closed), @ardumont wrote: I don't think this regexp is the right solution, it's quite fragile in case a package name uses one of the keywords.
Yes, it's fragile and we don't see any better solution right now. That's how we did it initially, we improved the current version to be a tad better though. And also it's tested even if it's limited.
A possible solution is for get_version to only accept URLs, and split the path to find the package name.
I'm not getting the difference with the current implem. filename is extracted from the uri in any case...
I meant splitting on
/
so you get the package name in the folder name. But I just realized it's not a reliable method, so nvmBuild was aborted
Link to build: https://jenkins.softwareheritage.org/job/DLS/job/tox/450/ See console output for more information: https://jenkins.softwareheritage.org/job/DLS/job/tox/450/console