Implement new Web API endpoint /save/bulk/
This is a sub-task for implementing a MVP on the Bulk On-demand Archival feature.
To do so, a new Web API endpoint /save/bulk/
must be added.
It must be protected with authentication and special user permission.
User can submit a list of origins and their visit types (in CSV or JSON format) through a POST
request:
-
webapp is first performing basic checks on the URLs (is their format valid ?) and their visit types (are they supported by SWH ?) => return 400 if an issue is detected
-
webapp then creates a scheduler task for the new
save-bulk
lister with parameters:-
instance
= user identifier -
origins
= a list of origins and their visit types (list of tuples or list of dicts)
from swh.scheduler.utils import create_oneshot_task scheduler.create_tasks([ create_one_shot_task("list-save-bulk", priority="high", instance=user_id, origins=origins) ])
=> return 200 if the task was successfully created
-
-
user can track the statuses of the origins he submitted for archival through a
GET
request:- after the
save-bulk
lister was scheduled and executed, the list of origins validated by it and their loading statuses can be retrieved using the scheduler API
from swh.core.api.classes import stream_results lister = scheduler.get_lister(name="save-bulk", instance_name=user_id) listed_origins = list( stream_results(scheduler.get_listed_origins, lister_id=lister.id) ) origin_visit_stats = scheduler.origin_visit_stats_get( ids=[(o.url, o.visit_type) for o in listed_origins] )
- info about origins rejected by the lister (404, invalid visit type, ...) can be found in the lister state (for the MVP case, ideally such info should be stored in a database table not as a JSON blob in lister table)
lister = scheduler.get_lister(name="save-bulk", instance_name=user_id) rejected_origins = lister.current_state["rejected_origins"]
- Web API should return a JSON response with some loading statistics about submitted origins but also detailed info about processed and rejected origins (maybe in an on-demand way using a query parameter ?)
- after the
Related to swh/meta#5091.