Review and uniformize redis deployments between clusters (admin, staging, prod, ...)
We have issues regarding our redis deployment:
- discrepancy in the redis deployment
- different redis/redis-operators versions in clusters
- flooding logs about redis-operator webhook miscommunication [1/
There are at least 2 deployments:
- in k8s-cluster-config (somewhat manual with duplicated files)
- redis-operator.yaml per cluster [0] which declares the versioned redis operator to install
- redis-$name.yaml: A many redis file as there is redis instance to install (lifts the redis operator and associated crds ^ to install the redis instance)
- swh-charts (helm chart)
- cluster-configuration chart: which allows to declare per cluster the versioned redis operator to install
- cluster-components chart: which declares the redis instances to run per cluster (lifts the redis operator and associated crds to install the redis instances).
We need to align the deployment method so it's done by the same responsible tool (cluster-configuration/cluster-components). We also need to realign the redis/redis-operator version in all clusters.
Finally, it'd be great if we could subside the flooding logs.
Redis instances:
- admin (through k8s-cluster-config/cluster-components)
- sccache
- production (through k8s-cluster-config all the way)
- svix-server
- winery-replay-application
- cassandra-replayers
- staging (through cluster-config/cluster-components)
- svix-server
- counters
- postgresql-replayer
- redis
- next-version (through staging's cluster-config/cluster-components)
- svix-server
- test-staging (no redis instance)
[0] Remains production, test-staging-rke2 & recently added the admin clusters
[1] Redis log issues
ᐅ stern --context cluster-admin-rke2 -n kube-system . --since 1s --no-follow --only-log-lines
kube-apiserver-rancher-node-admin-rke2-mgmt2 kube-apiserver W1010 16:32:41.484579 1 reflector.go:535] storage/cacher.go:/redis.redis.opstreelabs.in/redis: failed to list redis.redis.opstreelabs.in/v1beta1, Kind=Redis: conversion webhook for redis.redis.opstreelabs.in/v1beta2, Kind=Redis failed: Post "https://webhook-service.redis.svc:443/convert?timeout=30s": proxy error from 127.0.0.1:9345 while dialing 10.42.209.146:9443, code 502: 502 Bad Gateway
kube-apiserver-rancher-node-admin-rke2-mgmt2 kube-apiserver E1010 16:32:41.484604 1 cacher.go:479] cacher (redis.redis.redis.opstreelabs.in): unexpected ListAndWatch error: failed to list redis.redis.opstreelabs.in/v1beta1, Kind=Redis: conversion webhook for redis.redis.opstreelabs.in/v1beta2, Kind=Redis failed: Post "https://webhook-service.redis.svc:443/convert?timeout=30s": proxy error from 127.0.0.1:9345 while dialing 10.42.209.146:9443, code 502: 502 Bad Gateway; reinitializing...
kube-apiserver-rancher-node-admin-rke2-mgmt3 kube-apiserver W1010 16:32:41.648209 1 reflector.go:535] storage/cacher.go:/redis.redis.opstreelabs.in/redis: failed to list redis.redis.opstreelabs.in/v1beta1, Kind=Redis: conversion webhook for redis.redis.opstreelabs.in/v1beta2, Kind=Redis failed: Post "https://webhook-service.redis.svc:443/convert?timeout=30s": proxy error from 127.0.0.1:9345 while dialing 10.42.209.146:9443, code 502: 502 Bad Gateway
kube-apiserver-rancher-node-admin-rke2-mgmt3 kube-apiserver E1010 16:32:41.648231 1 cacher.go:479] cacher (redis.redis.redis.opstreelabs.in): unexpected ListAndWatch error: failed to list redis.redis.opstreelabs.in/v1beta1, Kind=Redis: conversion webhook for redis.redis.opstreelabs.in/v1beta2, Kind=Redis failed: Post "https://webhook-service.redis.svc:443/convert?timeout=30s": proxy error from 127.0.0.1:9345 while dialing 10.42.209.146:9443, code 502: 502 Bad Gateway; reinitializing...
kube-apiserver-rancher-node-admin-rke2-mgmt1 kube-apiserver W1010 16:32:41.957132 1 reflector.go:535] storage/cacher.go:/redis.redis.opstreelabs.in/redis: failed to list redis.redis.opstreelabs.in/v1beta1, Kind=Redis: conversion webhook for redis.redis.opstreelabs.in/v1beta2, Kind=Redis failed: Post "https://webhook-service.redis.svc:443/convert?timeout=30s": proxy error from 127.0.0.1:9345 while dialing 10.42.209.146:9443, code 502: 502 Bad Gateway
kube-apiserver-rancher-node-admin-rke2-mgmt1 kube-apiserver E1010 16:32:41.957167 1 cacher.go:479] cacher (redis.redis.redis.opstreelabs.in): unexpected ListAndWatch error: failed to list redis.redis.opstreelabs.in/v1beta1, Kind=Redis: conversion webhook for redis.redis.opstreelabs.in/v1beta2, Kind=Redis failed: Post "https://webhook-service.redis.svc:443/convert?timeout=30s": proxy error from 127.0.0.1:9345 while dialing 10.42.209.146:9443, code 502: 502 Bad Gateway; reinitializing...
root@rancher-node-admin-rke2-mgmt1:~# journalctl -u rke2-server.service --since '2s ago'
-- Journal begins at Mon 2024-09-23 07:11:17 UTC, ends at Thu 2024-10-10 16:37:33 UTC. --
Oct 10 16:37:32 rancher-node-admin-rke2-mgmt1 rke2[2103469]: time="2024-10-10T16:37:32Z" level=error msg="Sending HTTP 502 response to 127.0.0.1:52850: dial tcp 10.42.209.146:9443: connect: connection refused"
Oct 10 16:37:33 rancher-node-admin-rke2-mgmt1 rke2[2103469]: time="2024-10-10T16:37:33Z" level=error msg="Sending HTTP 502 response to 127.0.0.1:52864: dial tcp 10.42.209.146:9443: connect: connection refused"
## staging
ᐅ stern --context cluster-admin-rke2 -n kube-system . --since 1s --no-follow --only-log-lines
kube-apiserver-rancher-node-admin-rke2-mgmt2 kube-apiserver W1010 16:32:41.484579 1 reflector.go:535] storage/cacher.go:/redis.redis.opstreelabs.in/redis: failed to list redis.redis.opstreelabs.in/v1beta1, Kind=Redis: conversion webhook for redis.redis.opstreelabs.in/v1beta2, Kind=Redis failed: Post "https://webhook-service.redis.svc:443/convert?timeout=30s": proxy error from 127.0.0.1:9345 while dialing 10.42.209.146:9443, code 502: 502 Bad Gateway
kube-apiserver-rancher-node-admin-rke2-mgmt2 kube-apiserver E1010 16:32:41.484604 1 cacher.go:479] cacher (redis.redis.redis.opstreelabs.in): unexpected ListAndWatch error: failed to list redis.redis.opstreelabs.in/v1beta1, Kind=Redis: conversion webhook for redis.redis.opstreelabs.in/v1beta2, Kind=Redis failed: Post "https://webhook-service.redis.svc:443/convert?timeout=30s": proxy error from 127.0.0.1:9345 while dialing 10.42.209.146:9443, code 502: 502 Bad Gateway; reinitializing...
kube-apiserver-rancher-node-admin-rke2-mgmt3 kube-apiserver W1010 16:32:41.648209 1 reflector.go:535] storage/cacher.go:/redis.redis.opstreelabs.in/redis: failed to list redis.redis.opstreelabs.in/v1beta1, Kind=Redis: conversion webhook for redis.redis.opstreelabs.in/v1beta2, Kind=Redis failed: Post "https://webhook-service.redis.svc:443/convert?timeout=30s": proxy error from 127.0.0.1:9345 while dialing 10.42.209.146:9443, code 502: 502 Bad Gateway
kube-apiserver-rancher-node-admin-rke2-mgmt3 kube-apiserver E1010 16:32:41.648231 1 cacher.go:479] cacher (redis.redis.redis.opstreelabs.in): unexpected ListAndWatch error: failed to list redis.redis.opstreelabs.in/v1beta1, Kind=Redis: conversion webhook for redis.redis.opstreelabs.in/v1beta2, Kind=Redis failed: Post "https://webhook-service.redis.svc:443/convert?timeout=30s": proxy error from 127.0.0.1:9345 while dialing 10.42.209.146:9443, code 502: 502 Bad Gateway; reinitializing...
kube-apiserver-rancher-node-admin-rke2-mgmt1 kube-apiserver W1010 16:32:41.957132 1 reflector.go:535] storage/cacher.go:/redis.redis.opstreelabs.in/redis: failed to list redis.redis.opstreelabs.in/v1beta1, Kind=Redis: conversion webhook for redis.redis.opstreelabs.in/v1beta2, Kind=Redis failed: Post "https://webhook-service.redis.svc:443/convert?timeout=30s": proxy error from 127.0.0.1:9345 while dialing 10.42.209.146:9443, code 502: 502 Bad Gateway
kube-apiserver-rancher-node-admin-rke2-mgmt1 kube-apiserver E1010 16:32:41.957167 1 cacher.go:479] cacher (redis.redis.redis.opstreelabs.in): unexpected ListAndWatch error: failed to list redis.redis.opstreelabs.in/v1beta1, Kind=Redis: conversion webhook for redis.redis.opstreelabs.in/v1beta2, Kind=Redis failed: Post "https://webhook-service.redis.svc:443/convert?timeout=30s": proxy error from 127.0.0.1:9345 while dialing 10.42.209.146:9443, code 502: 502 Bad Gateway; reinitializing...
~/_swh_src/puppet-environment (master ✔) ᐅ stern --context archive-staging-rke2 -n kube-system . --since 1s --no-follow --only-log-lines
kube-apiserver-rancher-node-staging-rke2-mgmt1 kube-apiserver W1010 16:33:43.236184 1 reflector.go:535] storage/cacher.go:/redis.redis.opstreelabs.in/redis: failed to list redis.redis.opstreelabs.in/v1beta1, Kind=Redis: conversion webhook for redis.redis.opstreelabs.in/v1beta2, Kind=Redis failed: Post "https://webhook-service.redis-operator.svc:443/convert?timeout=30s": service "webhook-service" not found
kube-apiserver-rancher-node-staging-rke2-mgmt1 kube-apiserver E1010 16:33:43.236237 1 cacher.go:479] cacher (redis.redis.redis.opstreelabs.in): unexpected ListAndWatch error: failed to list redis.redis.opstreelabs.in/v1beta1, Kind=Redis: conversion webhook for redis.redis.opstreelabs.in/v1beta2, Kind=Redis failed: Post "https://webhook-service.redis-operator.svc:443/convert?timeout=30s": service "webhook-service" not found; reinitializing...
kube-apiserver-rancher-node-staging-rke2-mgmt2 kube-apiserver W1010 16:33:43.774190 1 reflector.go:535] storage/cacher.go:/redis.redis.opstreelabs.in/redis: failed to list redis.redis.opstreelabs.in/v1beta1, Kind=Redis: conversion webhook for redis.redis.opstreelabs.in/v1beta2, Kind=Redis failed: Post "https://webhook-service.redis-operator.svc:443/convert?timeout=30s": service "webhook-service" not found
kube-apiserver-rancher-node-staging-rke2-mgmt2 kube-apiserver E1010 16:33:43.774215 1 cacher.go:479] cacher (redis.redis.redis.opstreelabs.in): unexpected ListAndWatch error: failed to list redis.redis.opstreelabs.in/v1beta1, Kind=Redis: conversion webhook for redis.redis.opstreelabs.in/v1beta2, Kind=Redis failed: Post "https://webhook-service.redis-operator.svc:443/convert?timeout=30s": service "webhook-service" not found; reinitializing...
kube-apiserver-rancher-node-staging-rke2-mgmt3 kube-apiserver W1010 16:33:44.196515 1 reflector.go:535] storage/cacher.go:/redis.redis.opstreelabs.in/redis: failed to list redis.redis.opstreelabs.in/v1beta1, Kind=Redis: conversion webhook for redis.redis.opstreelabs.in/v1beta2, Kind=Redis failed: Post "https://webhook-service.redis-operator.svc:443/convert?timeout=30s": service "webhook-service" not found
kube-apiserver-rancher-node-staging-rke2-mgmt3 kube-apiserver E1010 16:33:44.196546 1 cacher.go:479] cacher (redis.redis.redis.opstreelabs.in): unexpected ListAndWatch error: failed to list redis.redis.opstreelabs.in/v1beta1, Kind=Redis: conversion webhook for redis.redis.opstreelabs.in/v1beta2, Kind=Redis failed: Post "https://webhook-service.redis-operator.svc:443/convert?timeout=30s": service "webhook-service" not found; reinitializing...
Edited by Antoine R. Dumont