staging instance, all data can be removed at all time

Skip to content

Migrate deposit SWHIDs (data) to the new specification

Migrate deposit SWHIDs (data) to the new specification

Migrate both "recent" and "old" format deposits [1] to the new specification.

That means the deposit swh_id* fields will be set to:

  • swh_id: directory SWHID (no context)
  • swh_id_context: directory SWHID (with context, origin, visit, anchor path)

Optionally, those 2 fields will be kept (for now) and realigned where it was not set ("old" deposits) to:

  • swh_anchor_id: revision SWHID (no context)
  • swh_anchor_id_context: revision SWHID (context with only origin)

It's expected some very "old" deposits won't be migrated as we cannot resolve those values. They will be rescheduled when it will be possible to do so (deploy [2]).

  • [1] "recent" format means all swh_id fields are set:
  • swh_id: directory SWHID (no context)
  • swh_id_context: directory SWHID (context with only origin)
  • swh_anchor_id: revision SWHID (no context)
  • swh_anchor_id_context: revision SWHID (context with only origin)

"old" format:

  • swh_id: revision SWHID (no context)

  • swh_id_context: not set

  • swh_anchor_id: not set

  • swh_anchor_id_context: not set

  • [2] Related to !70 (closed)

Related to #2398 (closed)

Test Plan

Dump out of production db restored in staging db. And run the migration scripts:

$ SWH_CONFIG_FILENAME=/etc/softwareheritage/deposit/server.yml django-admin migrate --settings=swh.deposit.settings.production --verbosity 3

"Recent" deposits

From

 id  | status |                       swh_id                       |                                                           swh_id_context
-----+--------+----------------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------
 608 | done   | swh:1:dir:c3d06da1a556900e295b64aea1cc5a413b374ae9 | swh:1:dir:c3d06da1a556900e295b64aea1cc5a413b374ae9;origin=https://hal.archives-ouvertes.fr/hal-02560320
 607 | done   | swh:1:dir:c3d06da1a556900e295b64aea1cc5a413b374ae9 | swh:1:dir:c3d06da1a556900e295b64aea1cc5a413b374ae9;origin=https://hal.archives-ouvertes.fr/hal-02560320
 606 | done   | swh:1:dir:d85591aeefea2c1c58142e34683fd1923b19c895 | swh:1:dir:d85591aeefea2c1c58142e34683fd1923b19c895;origin=https://doi.org/10.5201/ipol.2018.236
 605 | done   | swh:1:dir:ef04a768181417fbc5eef4243e2507915f24deea | swh:1:dir:ef04a768181417fbc5eef4243e2507915f24deea;origin=https://www.softwareheritage.org/check-deposit-2020-05-14T08:28:05.683282
 603 | done   | swh:1:dir:ef04a768181417fbc5eef4243e2507915f24deea | swh:1:dir:ef04a768181417fbc5eef4243e2507915f24deea;origin=https://www.softwareheritage.org/check-deposit-2020-05-09T14:09:50.098364
 602 | done   | swh:1:dir:a10423592dd061a00f7d34e4a3c102ba00c3d2ab | swh:1:dir:a10423592dd061a00f7d34e4a3c102ba00c3d2ab;origin=https://doi.org/10.5201/ipol.2018.236
 601 | done   | swh:1:dir:ef04a768181417fbc5eef4243e2507915f24deea | swh:1:dir:ef04a768181417fbc5eef4243e2507915f24deea;origin=https://www.softwareheritage.org/check-deposit-2020-05-07T16:05:49.106202
 600 | done   | swh:1:dir:ef04a768181417fbc5eef4243e2507915f24deea | swh:1:dir:ef04a768181417fbc5eef4243e2507915f24deea;origin=https://www.softwareheritage.org/check-deposit-2020-05-07T14:09:14.062873
 599 | done   | swh:1:dir:ef04a768181417fbc5eef4243e2507915f24deea | swh:1:dir:ef04a768181417fbc5eef4243e2507915f24deea;origin=https://www.softwareheritage.org/check-deposit-2020-05-07T12:52:53.361776
 598 | done   | swh:1:dir:43b7a45a89c836b1baad8849215a51e65a67f80e | swh:1:dir:43b7a45a89c836b1baad8849215a51e65a67f80e;origin=https://hal.archives-ouvertes.fr/hal-02546057
 597 | done   | swh:1:dir:a10423592dd061a00f7d34e4a3c102ba00c3d2ab | swh:1:dir:a10423592dd061a00f7d34e4a3c102ba00c3d2ab;origin=https://doi.org/10.5201/ipol.2018.236
...

to

 id  | status |                       swh_id                       |                                                                                                                        swh_id_context                                    $
-----+--------+----------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------$
 608 | done   | swh:1:dir:c3d06da1a556900e295b64aea1cc5a413b374ae9 | swh:1:dir:c3d06da1a556900e295b64aea1cc5a413b374ae9;origin=https://hal.archives-ouvertes.fr/hal-02560320;visit=swh:1:snp:e5e82d064a9c3df7464223042e0c55d72ccff7f0;anchor=s$
 607 | done   | swh:1:dir:c3d06da1a556900e295b64aea1cc5a413b374ae9 | swh:1:dir:c3d06da1a556900e295b64aea1cc5a413b374ae9;origin=https://hal.archives-ouvertes.fr/hal-02560320;visit=swh:1:snp:3e95ef6e04c381a34cc2f314576bc5644f2c797f;anchor=s$
 606 | done   | swh:1:dir:d85591aeefea2c1c58142e34683fd1923b19c895 | swh:1:dir:d85591aeefea2c1c58142e34683fd1923b19c895;origin=https://doi.org/10.5201/ipol.2018.236;visit=swh:1:snp:07c80b96ab64e714fb69ed725f6b18caf87763ba;anchor=swh:1:rev$
 605 | done   | swh:1:dir:ef04a768181417fbc5eef4243e2507915f24deea | swh:1:dir:ef04a768181417fbc5eef4243e2507915f24deea;origin=https://www.softwareheritage.org/check-deposit-2020-05-14T08:28:05.683282;visit=swh:1:snp:4577ab1375d35bab6e316$
 603 | done   | swh:1:dir:ef04a768181417fbc5eef4243e2507915f24deea | swh:1:dir:ef04a768181417fbc5eef4243e2507915f24deea;origin=https://www.softwareheritage.org/check-deposit-2020-05-09T14:09:50.098364;visit=swh:1:snp:7e09ab0433291e2c5ea14$
 602 | done   | swh:1:dir:a10423592dd061a00f7d34e4a3c102ba00c3d2ab | swh:1:dir:a10423592dd061a00f7d34e4a3c102ba00c3d2ab;origin=https://doi.org/10.5201/ipol.2018.236;visit=swh:1:snp:994f6ca7c49b1012768c4a5a6470f17f28d0e294;anchor=swh:1:rev$
 601 | done   | swh:1:dir:ef04a768181417fbc5eef4243e2507915f24deea | swh:1:dir:ef04a768181417fbc5eef4243e2507915f24deea;origin=https://www.softwareheritage.org/check-deposit-2020-05-07T16:05:49.106202;visit=swh:1:snp:7c6ad0d82051bce0d5ebd$
 600 | done   | swh:1:dir:ef04a768181417fbc5eef4243e2507915f24deea | swh:1:dir:ef04a768181417fbc5eef4243e2507915f24deea;origin=https://www.softwareheritage.org/check-deposit-2020-05-07T14:09:14.062873;visit=swh:1:snp:8f2341e340bd883300885$
 599 | done   | swh:1:dir:ef04a768181417fbc5eef4243e2507915f24deea | swh:1:dir:ef04a768181417fbc5eef4243e2507915f24deea;origin=https://www.softwareheritage.org/check-deposit-2020-05-07T12:52:53.361776;visit=swh:1:snp:ce3d7eb9b08b839171c01$
 598 | done   | swh:1:dir:43b7a45a89c836b1baad8849215a51e65a67f80e | swh:1:dir:43b7a45a89c836b1baad8849215a51e65a67f80e;origin=https://hal.archives-ouvertes.fr/hal-02546057;visit=swh:1:snp:526c43a6e4459f2c72c67031adf931ed6d3bdca7;anchor=s$
 597 | done   | swh:1:dir:a10423592dd061a00f7d34e4a3c102ba00c3d2ab | swh:1:dir:a10423592dd061a00f7d34e4a3c102ba00c3d2ab;origin=https://doi.org/10.5201/ipol.2018.236;visit=swh:1:snp:f7decde6a26a4fa5f0886d71c010ceae827bae92;anchor=swh:1:rev$
 ...

"Old" deposits:

From

 id  | status |                       swh_id                       | swh_id_context
-----+--------+----------------------------------------------------+----------------
 156 | done   | swh:1:rev:698771f9ca7ce7605fdcabf27b5851f322ea692c |
 155 | done   | swh:1:rev:6c9bdcaac6b1b22726752d5d46d04865313d78aa |
 154 | done   | swh:1:rev:8127063816bd4f75e00c2986c0a95fd95d78d876 |
 153 | done   | swh:1:rev:2176d2be0d7e13e89a90447d7d0853af5cbab973 |
 152 | done   | swh:1:rev:e2655c5b28552465a7be15c06f31aa066f64535a |
 151 | done   | swh:1:rev:504a90c58872a8a594886fcf75fc5bfebe151e68 |
 150 | done   | swh:1:rev:c648730299c2a4f4df3c1fe6e527ef3681f9527e |
 149 | done   | swh:1:rev:bb8d72c6646316967ac08a7bc4acc95c50c14d79 |
 147 | done   | swh:1:rev:c8fca417ee9eefe25683042192da67470147be07 |
 146 | done   | swh:1:rev:cccf789c12617208fe188ad3dbc2746d4c884ab7 |

to

 id  | status |                       swh_id                       |                                                                                                          swh_id_context                                                  $
-----+--------+----------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------$
 156 | done   | swh:1:dir:2c01e745c6d89e0eeb9a6ec9590f7ef0750b7002 | swh:1:dir:2c01e745c6d89e0eeb9a6ec9590f7ef0750b7002;origin=https://hal.archives-ouvertes.fr/hal-01831369;visit=swh:1:snp:42f0897956e700a23f5b8aafce43360b8699c0f1;anchor=s$
 155 | done   | swh:1:rev:6c9bdcaac6b1b22726752d5d46d04865313d78aa |
 154 | done   | swh:1:dir:3cb45c908fdad87542c5090e9464fc7f504e1509 | swh:1:dir:3cb45c908fdad87542c5090e9464fc7f504e1509;origin=https://hal.archives-ouvertes.fr/hal-01836266;visit=swh:1:snp:1fbb294bc458809e043bba9073f9d7a8b0b40fc9;anchor=s$
 153 | done   | swh:1:dir:95486800004625900d8365ee968683c7608a3b9d | swh:1:dir:95486800004625900d8365ee968683c7608a3b9d;origin=https://hal.archives-ouvertes.fr/hal-01837101;visit=swh:1:snp:2c2c2e4dcd61753b61739a45669ffbb89104d17a;anchor=s$
 152 | done   | swh:1:dir:f23a9f9d65671aaad715012a1781cb5de6451a3e | swh:1:dir:f23a9f9d65671aaad715012a1781cb5de6451a3e;origin=https://hal.archives-ouvertes.fr/hal-01831364;visit=swh:1:snp:f34ffc4d2fb57ba19a8586b88091fe99714a970a;anchor=s$
 151 | done   | swh:1:dir:f5cba66f896192d98641cf2d801de11dfca9f2a7 | swh:1:dir:f5cba66f896192d98641cf2d801de11dfca9f2a7;origin=https://hal.archives-ouvertes.fr/hal-01836189;visit=swh:1:snp:0e0f73db37ae7d26bf4b29d5599da2bfced30d63;anchor=s$
 150 | done   | swh:1:dir:accc6076ec6104d2125567e4a0c7685fb91f71e7 | swh:1:dir:accc6076ec6104d2125567e4a0c7685fb91f71e7;origin=https://hal.archives-ouvertes.fr/hal-01836169;visit=swh:1:snp:e3640bbfa187762803f29012b02693dd48e0ac88;anchor=s$
 149 | done   | swh:1:rev:bb8d72c6646316967ac08a7bc4acc95c50c14d79 |
 147 | done   | swh:1:dir:f23a9f9d65671aaad715012a1781cb5de6451a3e | swh:1:dir:f23a9f9d65671aaad715012a1781cb5de6451a3e;origin=https://hal.archives-ouvertes.fr/hal-01831364;visit=swh:1:snp:2cce797c46e9d06eb424e2f806a8d7d1fab6bf38;anchor=s$
 146 | done   | swh:1:dir:8a9521f0228d4f79a20d8d20f28523d557f9d2f8 | swh:1:dir:8a9521f0228d4f79a20d8d20f28523d557f9d2f8;origin=https://hal.archives-ouvertes.fr/hal-01831369;visit=swh:1:snp:a0f733bb6f16d6fe65c95194ad76c471fe739e75;anchor=s$

Expectedly, there could be some deposits that are not migrated (see description)

Leftover to reschedule

swh-deposit=> select id, status, swh_id, swh_id_context from deposit where status='done' and swh_id_context is null order by id desc;
 id  | status |                       swh_id                       | swh_id_context
-----+--------+----------------------------------------------------+----------------
 155 | done   | swh:1:rev:6c9bdcaac6b1b22726752d5d46d04865313d78aa |
 149 | done   | swh:1:rev:bb8d72c6646316967ac08a7bc4acc95c50c14d79 |
 127 | done   | swh:1:rev:d76cf5c02ce421f157d3fa624ad134a2efd18193 |
 126 | done   | swh:1:rev:84567c10d3c2383a878a9d8ab6773c1665e08419 |
 125 | done   | swh:1:rev:35ff14e6e4514adae3f950825a4b8b9b9f22767f |
 124 | done   | swh:1:rev:279a8ea930ddd6ef54f10f2f0784ea14a2205215 |
 123 | done   | swh:1:rev:e2a3373925db0f9f4307699e913b9fea9516cf6b |
 116 | done   | swh:1:rev:e2cdf2d3ce49f933ac6d23054183f92eacc4faef |
 114 | done   | swh:1:rev:a5e8b3d276e3a05989d00628e6e611ec7c51252a |
 112 | done   | swh:1:rev:b167902daf3a8a163d947adb62ad4269df471597 |
 110 | done   | swh:1:rev:b260ac6c02987fdf66e7dd1d2e647134cc3bed72 |
 108 | done   | swh:1:rev:d3f9947006289c67be6fd2a5081e466d61a80996 |
  93 | done   | swh:1:rev:734786ca12ca626b3a82a9d2a6fb5f6b968e7bd6 |
  92 | done   | swh:1:rev:4eb1d36683af77b946cdcb5875798d03bd6b775a |
  86 | done   | swh:1:rev:a0b9fc8f8a8bd7e1d29a18b9ac1a7d6e402d31cd |
  85 | done   | swh:1:rev:c29acbad74bb6cc01f9b7d61dd4f01ac747d771d |
  84 | done   | swh:1:rev:afb67a44c5de98891f4f21d04c449cc200b7e739 |
  83 | done   | swh:1:rev:bc3a12c0a288d74eafeb564ba03d8466f5fdb0f2 |
  82 | done   | swh:1:rev:31578998456025e4ebdb396b08dda0a63777b80e |
  81 | done   | swh:1:rev:85a127f023c84b2326c72fa669f0e3ad73a4fb68 |
  80 | done   | swh:1:rev:2a97f21995bab29548d7b41ec75fdd5639dbd325 |
  79 | done   | swh:1:rev:03987f056eaf4596cd20d7b2ee01c9b84ceddfa8 |
  78 | done   | swh:1:rev:7b844a98f54466cb189d27dbc1eede17f39e1c52 |
  77 | done   | swh:1:rev:4cf243a0645d5cd10c689eafd22ab38d685ad2d4 |
(24 rows)

Migrated from D3153 (view on Phabricator)

Merge request reports