Make container-based services' push their log to swh log infrastructure
Current static systemd services are currently configured with Environment=SWH_LOG_TARGET=journal
.
Which when routed to the journald handler adds some custom swh prefixed metadata [1].
New container-based service do not use systemd. We need to find a solution which allows to behave similarly.
Current work:
-
Adapt swh tools to allow some more log customization through configuration (see previous merge requests) -
1. Investigate opentelemetry solution as it's unifying management for logs, trace and (prometheus) metrics. -
Entertain locally the solution (minikube) -
Entertain in staging cluster an equivalent solution as ^ -
Install a staging elasticsearch instance to avoid polluting the main elasticsearch instance (+ no persistence) -
Make some workers push their logs to that elasticsearch instance through opentelemetry -
conclusion: no dynamic index, only 1 possible which is limited (our current setup uses dynamic daily indices)
-
-
-
-
2. Entertain use of opentelemetry, fluentbit, elasticsearch -
opentelemetry -> fluentbit: ok -
blocked: fluentbit -> elasticsearch: not yet working
-
-
1. Falling back to solution 1. and let it run to determine if that's enough or not -
swh service: Indexed data can be seen in kibana [2] -
system service: Indexed data kibana search [3] -
Iterate over the logs pattern to parse
-
-
Determine how to deploy this -
in each cluster (enabling of a chart per cluster, which allows index parameters to be set) -
or in admin cluster (clueless about this one, vince knows more)
-
-
Use opentelemetry-helm-chart [5] -
Make it work -
Adapt to avoid memory limit killing pods -
Activate metrics and let it run to determine if it's worth continuing using it (it's using a high level of memory currently) -
Analyze metrics
-
-
swh/infra/ci-cd/swh-charts!49 (merged): Templatize the iteration work as a cluster-configuration chart (deployment per cluster) -
Deploy to production cluster
[1] https://gitlab.softwareheritage.org/swh/meta/-/snippets/1436
[2] (idx: staging-logs) http://kibana0.internal.softwareheritage.org:5601/goto/655500e1078b6b56e4f1be24968780ae
[3] (idx: staging-system-logs) http://kibana0.internal.softwareheritage.org:5601/goto/3067b92e9bbeec627fa2e3b906dc4ea3
[4]
$ date; curl -s ${ES_SERVER}/_cat/indices | grep staging
Fri 21 Apr 2023 03:55:56 PM CEST
green open staging-system-logs xLvGyopTSaqjuwh5C0iFoQ 1 1 11957486 0 9.7gb 4.4gb
green open staging-logs AYiBFvYmQtCN8UdnA5KexQ 1 1 115878 0 154.7mb 59.6mb
date; curl -s ${ES_SERVER}/_cat/indices | grep staging
Fri 12 May 2023 11:49:43 AM CEST
green open staging-system-logs xLvGyopTSaqjuwh5C0iFoQ 1 1 50884569 0 36.1gb 16.7gb # system logs
green open staging-swh-logs GAAlhiCFR5OU1LDTlUc4mw 1 1 259133 0 164.8mb 84mb # hit and miss setup
green open staging-logs AYiBFvYmQtCN8UdnA5KexQ 1 1 12110295 0 6.9gb 3.5gb # swh logs
[5] https://github.com/open-telemetry/opentelemetry-helm-charts
Migrated from T4524 (view on Phabricator)