Implement the scheduling policy for the recurrent visit scheduler
When both the lister API and the recent visit cache have been seeded, we should be able to implement the actual scheduling policy for the new scheduler.
-
generate the list of the "next" origin urls to load from the scheduler tables (according to the scheduling policy); -
take a list of urls and generate "legacy" one-shot tasks; -
"visit simulator" which updates the scheduler database according to a simulated loading time for each origin, and allows us to monitor the behavior of the full simulated scheduling/loading infrastructure. -
get a model of current loading time distribution -
determine which metrics we want to - optimize the scheduler policy
- check for runaway edge cases, e.g. origins that never get loaded even if the "average" behavior is okay
- reduce the "number of useless visits"
- lag between actual commit and next visit
- ...
-
Migrated from T2444 (view on Phabricator)
Edited by Phabricator Migration user