Automate subdataset creation
Currently running on Maxxi with this command:
TMPDIR=/srv/softwareheritage/tmp/ AWS_PROFILE=prod swh graph luigi --base-directory /srv/softwareheritage/ssd/data/vlorentz/datasets/ --dataset-name 2024-08-23_popular-10-lua --parent-dataset-name 2024-08-23 --grpc-api localhost:50091 --s3-prefix s3://softwareheritage/graph/ --s3-athena-output-location s3://softwareheritage/tmp/athena/ --athena-prefix swh CreateSubdatasetOnAthena -- --scheduler-url http://localhost:50092/ --SelectTopGithubOrigins-num-origins 10 --SelectTopGithubOrigins-query "language:lua"
cc @ardumont
Edited by vlorentz