Use cassandra as main backend in production
Migration Plan is defined in an hedgedoc document [1]
Plan/Status:
-
1. Duplicate rpc services -
objstorage [1] [2] -
indexer-storage [2] -
storage -
postgresql-azure-readonly [3] -
winery [4] (duplicated & disabled)
-
-
scheduler [5] -
graphql -
archive [6] -
webapp1[7] (do not duplicate)
-
-
vault [8] -
search [9] -
counters [10] -
deposit [11] -
provenance [12] -
web -
webhook [13] -
archive [14] -
app1(do not duplicate)
-
-
-
2. Migrate read-only workload to use the rpc services in swh-cassandra -
indexer & listers [15] (latter has a subset configuration of the former) -
vault cookers [16] -
counters [17] -
search [18] -
scheduler [20] -
scrubber [21](later)
-
-
3. Enable rpc support services in swh-cassandra & then disable the swh ones -
scheduler support services (runner, runner-priority, listener, journal-client, ...) -
counters (journal client) -
search (journal client)
-
-
4. Switch worker definition from swh ns to swh-cassandra ns (once we know the previous instance in swh is still functional) and deactivate the swh one -
indexers -
duplicate worker definition in ns swh-cassandra, enable worker, then check worker is functional -
disable in ns swh
-
-
vault cookers -
counter journal client -
search journal client -
listers -
scrubber journal checker
-
-
5. Update toolbox configuration in swh-cassandra ns -
6. Migrate frontends in swh-cassandra (to use swh-cassandra rpc services) -
web frontend in swh ns should still use storage postgresql instance from the swh ns (web-archive now should have ingress webapp-postgresql.internal.s.o) -
web-archive (with ingress archive.s.o) -
Rename swh.web-archive to swh.web-postgresql -
deposit (with ingress deposit.s.o) -
migrate checkerDeposit worker service to ns swh-cassandra then deactivate the swh one
-
-
7. Disable the no longer needed ones (Emulate what we have defined in staging) -
Deactivate no longer used rpc services -
indexer-storage -
objstorage -
vault -
search -
counters -
deposit -
provenance
-
-
-
8. Prepare migration of writing workload -
Configure the webhooks in swh-cassandra
-
Create a webhook webapp and the ingress -
migrate the journal client in swh-cassandra ns -
Disable the journal client in swh ns
-
-
Migrate alterWe do not migrate it as we need the record-reference to be implemented in cassandra
-
-
9. Migrate writing workload (to complete): -
Open status.io maintenance window to notify the services are in maintenance -
Stop loaders -
Checks -
Check cassandra-replayers have no lag (since the loaders stop writing, there is no longer messages in topics) -
Check no cassandra-replayers are running either
-
-
Stop storage rw services in swh ns (ro storage stays) -
Activate kafka -> postgresql replayers in swh ns (offset: latest) -
Stop kafka -> cassandra replayers in swh-cassandra ns -
Migrate loaders incrementally -
Activate oneshot loader and schedule origins for ingestion -
Checks -
Checks the visits are ok -
Checks the replayer did their job
-
-
save-code-now -
deposit
-
-
-
Refactoring/Clean up swh namespace -
Drop disabled swh service worker definitions -
Align swh ns in production with the ns swh in staging (e.g. objstorage-replayer, ...) -
Clean up no longer used swh ns service (web-app1, web-webhooks) -
Make another round of checks and activate any missed deployments (e.g. loader-metadata, loader-pubdev)
-
-
swh/devel/swh-docs!446 (merged): docs: Update if any changes in ingress (https://docs.s.o/.../service-urls) -
Add webapp-postgresql.internal.softwareheritage.org with vpn access -
Delete webapp-cassandra.internal.softwareheritage.org -
Drop internal kubernetes service
-
[1] https://hedgedoc.softwareheritage.org/G9W-HZfARIekuSzAb6ldyw#