Staging instance, all changes can be removed at any time

Skip to content

components/alerting: Create an alert for unreachable cassandra nodes

Guillaume Samson requested to merge cassandra_svc_alerting into production

Related to swh/infra/sysadm-environment#5026 (closed)

This template will deploy a PrometheusRule in staging and production to trigger an alert if a cassandra cluster node is unreachable.

ᐅ helm template -f values.yaml -f values/archive-production-rke2.yaml cassandra-alerting . | grep -A 12 groups 
  groups:
  - name: critical-cassandra-service.rules
    rules:
    - alert: Cassandra_Service_Degraded_In_Production
      annotations:
        description: "The {{ $labels.instance }} node is unreachable for more than 5 minutes. This node seems down."
        summary: "The {{ $labels.service }} is degraded. Please check the {{ $labels.instance }} status."
      expr: up{service="cassandra-servers-svc"} == 0
      for: 5m
      labels:
        severity: critical
        namespace: cattle-monitoring-system
ᐅ helm template -f values.yaml -f values/archive-production-rke2.yaml cassandra-alerting . | \
grep -A 12 groups | promtool check rules
Checking standard input
  SUCCESS: 1 rules found

Merge request reports

Loading