Implement a lister for the Bulk On-demand Archival feature
This is a sub-task for implementing a MVP for the Bulk On-demand Archival feature.
The lister execution will be triggered when a list of origins and their visit types
were successfully submitted through the new /save/bulk/
Web API end point (swh-web#4802 (closed)).
This new lister takes a list of origin URLs and their visit types as input parameter.
In order to avoid polluting the ListedOrigin
table with invalid origins, it performs additional checks to filter out bogus origins:
- is the origin URL a 404 ?
- is the origin URL public ?
- is the origin visit type is valid ? VCS validity could be checked with git, hg or svn client for instance.
If an origin is identified as valid, a new ListedOrigin
object is added or upserted in scheduler database.
Info about rejected origins are stored in lister state at the end of the process.
Related to swh/meta#5091.