Register kafka1.staging in the staging kafka cluster
Once the installation is done [1]:
-
Stop puppet in storage1.staging -
(blocking) #4840 (closed): unstuck resource collection naming -
Unstuck puppet run on kafka1.staging (manifest's kafka version is no longer found) [4] -
Configure firewall ( https://glyptotek.internal.softwareheritage.org ) -
Declare vip address 128.93.166.41 -
Install new broker aliases -
Update the staging broker alias to reference the new broker -
Add new NAT port forward from vlan1300 on this vip with kafka port [5] -
Add firewall rule to allow traffic (if needed, not sure)
-
-
swh/infra/puppet/puppet-swh-site!615 (merged): Register node in kafka cluster staging -
Run puppet on pergamon then kafka node ^ ('til it's no longer complaining) -
Request DSI to open 1300 vlan routing (or some such) for new node [6] -
Activate puppet back in storage1.staging -
Unstuck problems -
Add missing CNAME -
Add broker2.journal.staging.swh.network alias
-
-
Doc: How to reallocate partitions/topics? [2] [3] -
Stop puppet on kafka1.staging -
Stop kafka on kafka1.staging -
Destroy zfs pool/mountpoint with stripping disks -
Create a new one with mirror disks -
Start back kafka with the new zfs mountpoint -
[blocking] #4849 (closed): Fix kafka1.staging dns record issue -
Reallocate partitions/topics to both storage1 & kafka1 nodes ("set replica factor to 2") -
(ongoing) Reallocate partitions to only kafka1 node -
Internally change cname/vip from storage1 to kafka1 -
Set replica factor to 1 -
Drop broker profile from storage1 (also drops the syncoid backup on that node) -
Clean up db1 syncoid backup from that storage1.staging ^ -
swh-docs: Update service urls page [7]
[1] #4758 (closed)
[2] See issue #3630 (closed) for hint
[3] https://kafka.apache.org/documentation/#basic_ops_cluster_expansion
[4]
Error: Execution of '/usr/bin/curl https://dlcdn.apache.org/kafka/3.3.1/kafka_2.13-3.3.1.tgz -o /tmp/kafka_2.13-3.3.1.tgz_20230407-2441831-158gmfi -fsSLg --max-redirs 5' returned 22: curl: (22) The requested URL returned error: 404
Error: /Stage[main]/Kafka/Archive[/var/tmp/kafka/kafka_2.13-3.3.1.tgz]/ensure: change from 'absent' to 'present' failed: Execution of '/usr/bin/curl https://dlcdn.apache.org/kafka/3.3.1/kafka_2.13-3.3.1.tgz -o /tmp/kafka_2.13-3.3.1.tgz_20230407-2441831-158gmfi -fsSLg --max-redirs 5' returned 22: curl: (22) The requested URL returned error: 404
[5] check storage1.staging configuration and do the same ;)
[6] already done as per ml exchange https://sympa.inria.fr/sympa/arc/swh-sysadm/2023-03/msg00000.html
[7] https://docs.softwareheritage.org/sysadm/network-architecture/service-urls.html
Note for self: button "Apply" after action, and then "System > High availability > status", hit the "cloud" button on the first synchronize line.