Investigate scheduler journal client discrepancies
swh/infra/sysadm-environment#2993 (closed) deployed and first run completed.
But, we face discrepancies:
-
have 98M [1] origins referenced in the cache table but we should have around 131M [2] origins. That difference ~33M is currently unexplained and high.
-
Checking for example on "linux" origin, the cache data computed is off as well [3]
-
[1]
softwareheritage-scheduler=> select now(), count(*) from origin_visit_stats;
now | count
-------------------------------+----------
2021-01-28 08:34:40.152554+00 | 98231002
- [2]
softwareheritage=> select count(distinct origin) from origin_visit_status where status in ('full', 'partial');
- [3]
softwareheritage-scheduler=> select * from origin_visit_stats where url='https://github.com/torvalds/linux';
url | visit_type | last_eventful | last_uneventful | last_failed | last_notfound | last_snapshot | last_scheduled
-----------------------------------+------------+-------------------------------+-----------------+-------------------------------+---------------+--------------------------------------------+----------------
https://github.com/torvalds/linux | git | 2017-09-07 18:43:13.021746+00 | | 2018-08-23 11:53:06.553328+00 | | \x3e3045be901bacc7594176e79ba13fe030f601e2 |
(1 row)
softwareheritage=> select * from origin_visit_status where origin=2 order by date desc limit 10;
origin | visit | date | status | metadata | snapshot | type
--------+-------+-------------------------------+---------+----------+--------------------------------------------+------
2 | 67 | 2020-09-21 21:55:01.586191+00 | full | | \xc7beb2432b7e93c4cf6ab09cd194c7c1998df2f9 |
2 | 67 | 2020-09-21 19:15:24.238712+00 | created | | |
2 | 66 | 2020-09-21 17:12:11.930011+00 | partial | | |
2 | 66 | 2020-09-21 17:07:41.94459+00 | created | | |
2 | 65 | 2020-08-24 11:51:54.472736+00 | full | | \xb16664848afbd3e867e8fce516ef15c1772679b2 |
2 | 65 | 2020-08-24 09:22:41.181224+00 | created | | |
2 | 64 | 2020-03-19 23:29:59.614232+00 | full | | \x89eed60d46be8b8963a1a2268762aee5bbb41038 |
2 | 63 | 2020-01-20 19:50:46.750039+00 | full | | \xcabcc7d7bf639bbe1cc3b41989e1806618dd5764 |
2 | 62 | 2019-12-16 13:44:56.685885+00 | ongoing | | |
2 | 61 | 2019-08-25 14:04:07.603463+00 | full | | \xeb8087624d47f6e8ee89692df041b2f568fb0e5f |
Related to swh/infra/sysadm-environment#2993 (closed)
Migrated from T3000 (view on Phabricator)
Edited by Phabricator Migration user