swh/staging: limit the max number of connection of svix
Ensure svix will not consume all the workers if something goes wrong in the infra.
It could also be a cascading effect if the webapp has some downtime, svix will try to send the backlog as fast as it can avoiding the webapp to recover.
The (no so) guesstimate is ~10 connections per webhook journal client instance of 1 connections per ~15 origin_visit_status/s.
This is true only for the scn endpoint, it will need to be adjusted of another endpoint is deployed or the current behavior modified
The estimation was done by trying to discover the limit with a big backlog if messages and 5 webhooks journal clients
Related to swh/infra/sysadm-environment#5339 (closed)
Another MR will follow to add the possibility to configure some alerting when a response time threshold is reached or a given percentage of requests are in error.
Helm diff
Using the good old diff because dyff hides the important '"' on the annotation value
[swh] Comparing changes between branches production and svix-rate-limiting (per environment)...
Your branch is up to date with 'origin/production'.
[swh] Generate config in production branch for environment staging, namespace swh...
[swh] Generate config in production branch for environment staging, namespace swh-cassandra...
[swh] Generate config in production branch for environment staging, namespace swh-cassandra-next-version...
[swh] Generate config in svix-rate-limiting branch for environment staging...
[swh] Generate config in svix-rate-limiting branch for environment staging...
[swh] Generate config in svix-rate-limiting branch for environment staging...
Your branch is up to date with 'origin/production'.
[swh] Generate config in production branch for environment production, namespace swh...
[swh] Generate config in production branch for environment production, namespace swh-cassandra...
[swh] Generate config in production branch for environment production, namespace swh-cassandra-next-version...
[swh] Generate config in svix-rate-limiting branch for environment production...
[swh] Generate config in svix-rate-limiting branch for environment production...
[swh] Generate config in svix-rate-limiting branch for environment production...
------------- diff for environment staging namespace swh -------------
_ __ __
_| |_ _ / _|/ _| between /tmp/swh-chart.swh.Xr4mMIqD/staging-swh.before, 113 documents
/ _' | | | | |_| |_ and /tmp/swh-chart.swh.Xr4mMIqD/staging-swh.after, 113 documents
| (_| | |_| | _| _|
\__,_|\__, |_| |_| returned no differences
|___/
------------- diff for environment staging namespace swh-cassandra -------------
_ __ __
_| |_ _ / _|/ _| between /tmp/swh-chart.swh.Xr4mMIqD/staging-swh-cassandra.before, 400 documents
/ _' | | | | |_| |_ and /tmp/swh-chart.swh.Xr4mMIqD/staging-swh-cassandra.after, 401 documents
| (_| | |_| | _| _|
\__,_|\__, |_| |_| returned three differences
|___/
(file level)
---
# Source: swh/templates/webhooks/autoscaling.yaml
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
name: webhooks-origin-visit-status-scaledobject
namespace: swh-cassandra
spec:
scaleTargetRef:
name: webhooks-origin-visit-status
pollingInterval: 120
minReplicaCount: 1
maxReplicaCount: 2
idleReplicaCount: 0
triggers:
- type: kafka
metadata:
bootstrapServers: journal2.internal.staging.swh.network
consumerGroup: swh-archive-stg-webhooks
lagThreshold: 1000
offsetResetPolicy: earliest
spec (apps/v1/Deployment/swh-cassandra/webhooks-origin-visit-status)
- one map entry removed:
replicas: 1
metadata.annotations (networking.k8s.io/v1/Ingress/swh-cassandra/web-cassandra-ingress-webhooks)
+ one map entry added:
nginx.ingress.kubernetes.io/limit-connections: 3
------------- diff for environment staging namespace swh-cassandra-next-version -------------
_ __ __
_| |_ _ / _|/ _| between /tmp/swh-chart.swh.Xr4mMIqD/staging-swh-cassandra-next-version.before, 168 documents
/ _' | | | | |_| |_ and /tmp/swh-chart.swh.Xr4mMIqD/staging-swh-cassandra-next-version.after, 168 documents
| (_| | |_| | _| _|
\__,_|\__, |_| |_| returned one difference
|___/
metadata.annotations (networking.k8s.io/v1/Ingress/swh-cassandra-next-version/web-cassandra-ingress-webhooks)
+ one map entry added:
nginx.ingress.kubernetes.io/limit-connections: 3
------------- diff for environment production namespace swh -------------
_ __ __
_| |_ _ / _|/ _| between /tmp/swh-chart.swh.Xr4mMIqD/production-swh.before, 426 documents
/ _' | | | | |_| |_ and /tmp/swh-chart.swh.Xr4mMIqD/production-swh.after, 427 documents
| (_| | |_| | _| _|
\__,_|\__, |_| |_| returned three differences
|___/
(file level)
---
# Source: swh/templates/webhooks/autoscaling.yaml
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
name: webhooks-origin-visit-status-scaledobject
namespace: swh
spec:
scaleTargetRef:
name: webhooks-origin-visit-status
pollingInterval: 120
minReplicaCount: 1
maxReplicaCount: 5
idleReplicaCount: 0
triggers:
- type: kafka
metadata:
bootstrapServers: "kafka1.internal.softwareheritage.org,kafka2.internal.softwareheritage.org,kafka3.internal.softwareheritage.org,kafka4.internal.softwareheritage.org"
consumerGroup: swh-archive-prod-webhooks
lagThreshold: 5000
offsetResetPolicy: earliest
spec (apps/v1/Deployment/swh/webhooks-origin-visit-status)
- one map entry removed:
replicas: 1
metadata.annotations (networking.k8s.io/v1/Ingress/swh/web-archive-ingress-webhooks)
+ one map entry added:
nginx.ingress.kubernetes.io/limit-connections: 10
------------- diff for environment production namespace swh-cassandra -------------
_ __ __
_| |_ _ / _|/ _| between /tmp/swh-chart.swh.Xr4mMIqD/production-swh-cassandra.before, 96 documents
/ _' | | | | |_| |_ and /tmp/swh-chart.swh.Xr4mMIqD/production-swh-cassandra.after, 96 documents
| (_| | |_| | _| _|
\__,_|\__, |_| |_| returned no differences
|___/