Staging instance, all changes can be removed at any time

Quantify and monitor in real-time the lag, especially for major platforms

  • improve Actual Grafana Dashboard

  • How To quantify the lag ? >> which metrics

    • Already available :

      • number of origins never visited
      • number of origins with known changes
      • sum of the two previous numbers: number of origins in the loading queue
    • simple to add in the current way we generate the metrics (swh.scheduler.update_metrics)

      • earliest origin that we know and have not loaded: coarse grain lag estimator min(first_seen) where last_visit is null
      • last listing date (are the listers working properly?) max(last_seen)
    • could be added in the scheduler journal client

      • histogram of first-listing-to-first-archival duration measurements: how much time did origins spend in the archival queue?
    • need adaptations in the swh.scheduler model, and analysis of whether the forges provide the information at all

      • histogram of creation-to-first-archival duration measurements: how old (date of creation in the forge) are the origins we're archiving now?
        • needs a new field "creation time" (!= first listing time) in the lister table

Migrated from T4130 (view on Phabricator)

To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information