Skip to content

Commit

Permalink
[mixin/alerts]: Enable configuring job prefix for alerts to prevent c…
Browse files Browse the repository at this point in the history
…lashes with metrics from Loki/Tempo (grafana#9659)

Co-authored-by: Nick Pillitteri <[email protected]>
  • Loading branch information
mtweten and 56quarters authored Oct 25, 2024
1 parent 90b7f97 commit 97a7589
Show file tree
Hide file tree
Showing 7 changed files with 12 additions and 7 deletions.
2 changes: 2 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -61,8 +61,10 @@

* [CHANGE] Remove backwards compatibility for `thanos_memcached_` prefixed metrics in dashboards and alerts removed in 2.12. #9674
* [ENHANCEMENT] Unify ingester autoscaling panels on 'Mimir / Writes' dashboard to work for both ingest-storage and non-ingest-storage autoscaling. #9617
* [ENHANCEMENT] Alerts: Enable configuring job prefix for alerts to prevent clashes with metrics from Loki/Tempo. #9659
* [ENHANCEMENT] Dashboards: visualize the age of source blocks in the "Mimir / Compactor" dashboard. #9697
* [ENHANCEMENT] Dashboards: Include block compaction level on queried blocks in 'Mimir / Queries' dashboard. #9706

* [BUGFIX] Dashboards: Fix autoscaling metrics joins when series churn. #9412 #9450 #9432
* [BUGFIX] Alerts: Fix autoscaling metrics joins in `MimirAutoscalerNotActive` when series churn. #9412
* [BUGFIX] Alerts: Exclude failed cache "add" operations from alerting since failures are expected in normal operation. #9658
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -533,7 +533,7 @@ spec:
expr: |
max by (cluster, namespace) (memberlist_client_cluster_members_count)
>
(sum by (cluster, namespace) (up{job=~".+/(admin-api|alertmanager|compactor.*|distributor.*|ingester.*|querier.*|ruler|ruler-querier.*|store-gateway.*|cortex|mimir|mimir-write.*|mimir-read.*|mimir-backend.*)"}) + 10)
(sum by (cluster, namespace) (up{job=~".*/(admin-api|alertmanager|compactor.*|distributor.*|ingester.*|querier.*|ruler|ruler-querier.*|store-gateway.*|cortex|mimir|mimir-write.*|mimir-read.*|mimir-backend.*)"}) + 10)
for: 20m
labels:
severity: warning
Expand Down
2 changes: 1 addition & 1 deletion operations/mimir-mixin-compiled-baremetal/alerts.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -511,7 +511,7 @@ groups:
expr: |
max by (cluster, namespace) (memberlist_client_cluster_members_count)
>
(sum by (cluster, namespace) (up{job=~".+/(admin-api|alertmanager|compactor.*|distributor.*|ingester.*|querier.*|ruler|ruler-querier.*|store-gateway.*|cortex|mimir|mimir-write.*|mimir-read.*|mimir-backend.*)"}) + 10)
(sum by (cluster, namespace) (up{job=~".*/(admin-api|alertmanager|compactor.*|distributor.*|ingester.*|querier.*|ruler|ruler-querier.*|store-gateway.*|cortex|mimir|mimir-write.*|mimir-read.*|mimir-backend.*)"}) + 10)
for: 20m
labels:
severity: warning
Expand Down
2 changes: 1 addition & 1 deletion operations/mimir-mixin-compiled/alerts.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -521,7 +521,7 @@ groups:
expr: |
max by (cluster, namespace) (memberlist_client_cluster_members_count)
>
(sum by (cluster, namespace) (up{job=~".+/(admin-api|alertmanager|compactor.*|distributor.*|ingester.*|querier.*|ruler|ruler-querier.*|store-gateway.*|cortex|mimir|mimir-write.*|mimir-read.*|mimir-backend.*)"}) + 10)
(sum by (cluster, namespace) (up{job=~".*/(admin-api|alertmanager|compactor.*|distributor.*|ingester.*|querier.*|ruler|ruler-querier.*|store-gateway.*|cortex|mimir|mimir-write.*|mimir-read.*|mimir-backend.*)"}) + 10)
for: 20m
labels:
severity: warning
Expand Down
4 changes: 2 additions & 2 deletions operations/mimir-mixin/alerts/alerts-utils.libsonnet
Original file line number Diff line number Diff line change
Expand Up @@ -10,10 +10,10 @@
$._config.product + name,

jobMatcher(job)::
'job=~".*/%s"' % formatJobForQuery(job),
'%s=~"%s%s"' % [$._config.per_job_label, $._config.alert_job_prefix, formatJobForQuery(job)],

jobNotMatcher(job)::
'job!~".*/%s"' % formatJobForQuery(job),
'%s!~"%s%s"' % [$._config.per_job_label, $._config.alert_job_prefix, formatJobForQuery(job)],

local formatJobForQuery(job) =
if std.isArray(job) then '(%s)' % std.join('|', job)
Expand Down
4 changes: 2 additions & 2 deletions operations/mimir-mixin/alerts/alerts.libsonnet
Original file line number Diff line number Diff line change
Expand Up @@ -775,8 +775,8 @@ local utils = import 'mixin-utils/utils.libsonnet';
|||
max by (%s) (memberlist_client_cluster_members_count)
>
(sum by (%s) (up{%s=~".+/%s"}) + 10)
||| % [$._config.alert_aggregation_labels, $._config.alert_aggregation_labels, $._config.per_job_label, simpleRegexpOpt($._config.job_names.ring_members)],
(sum by (%s) (up{%s}) + 10)
||| % [$._config.alert_aggregation_labels, $._config.alert_aggregation_labels, $.jobMatcher($._config.job_names.ring_members)],
'for': '20m',
labels: {
severity: 'warning',
Expand Down
3 changes: 3 additions & 0 deletions operations/mimir-mixin/config.libsonnet
Original file line number Diff line number Diff line change
Expand Up @@ -207,6 +207,9 @@
// Used to add extra annotations to all alerts, Careful: takes precedence over default annotations.
alert_extra_annotations: {},

// Used as the job prefix in alerts that select on job label (e.g. GossipMembersTooHigh, RingMembersMismatch). This can be set to a known namespace to prevent those alerts from firing incorrectly due to selecting similar metrics from Loki/Tempo.
alert_job_prefix: '.*/',

// Whether alerts for experimental ingest storage are enabled.
ingest_storage_enabled: true,

Expand Down

0 comments on commit 97a7589

Please sign in to comment.