Alerting and Rule Management - thanos-io/thanos

Thanos, an open-source CNCF Sandbox project, extends Prometheus to create a global-scale, highly available monitoring system. It uses rule components for alerting and rule execution. This document explains the possible options and provides examples for each option, using only the provided sources.

Alerting Rules

Thanos uses alerting rules for alert execution. The syntax for alerting rules is as follows:

alert: <string>
expr: <string>
for: <duration>
labels:
[ <labelname>: <tmpl_string> ]
annotations:
[ <labelname>: <tmpl_string> ]

Examples of alerting rules can be found in the examples/alerts/alerts.yaml and examples/alerts/rules.yaml files.

To ensure alerting works, monitor the Ruler and alert from another Scraper (Prometheus + sidecar) that sits in the same cluster. The most important metrics to alert on are:

  • thanos_alert_sender_alerts_dropped_total. If greater than 0, it means that alerts triggered by Rule are not being sent to Alertmanager, which might indicate connection, incompatibility, or misconfiguration problems.

Rule Manager

The rule manager is responsible for rule execution. It can be configured using the --rule-file flag, which accepts a list of rule files in glob format. The rule manager evaluates rules periodically, as determined by the --eval-interval flag.

The following is an example of a rule manager configuration in the thanos rule command:

thanos rule \
  --data-dir="data/" \
  --rule-file=rules/ \
  --resend-delay=1m \
  --eval-interval=30s \
  --tsdb.block-duration=2h \
  --tsdb.retention=48h \
  --tsdb.wal-compression \
--alertmanagers.url=ALERTMANAGERS.URL

Rule Components

Thanos includes several rule components:

  • Ruler: Responsible for rule and alert evaluation on top of a given Thanos Querier.
  • Sidecar: Provides long-term storage for Prometheus data and enables querying across multiple Prometheus instances.
  • Querier: Enables querying across multiple Prometheus instances and Thanos Sidecars.
  • Store Gateway: Provides long-term storage for Prometheus data and enables querying across multiple Prometheus instances and Thanos Sidecars.

Alert Rule Types

Thanos supports the following alert rule types:

  • Grafana-managed alert rules
  • Loki/Mimir-managed alert rules

These alert rule types can be based on data from any of Thanos’ supported data sources, such as Prometheus, Grafana Mimir, and Grafana Loki. However, only Prometheus data sources are supported for creating alert rules in Thanos.

Alert Rule Evaluation and Delivery

Thanos supports alert rule evaluation and delivery from within Grafana, using an external Alertmanager, or both. Alert rule evaluation and delivery is done from within Grafana, using an external Alertmanager, or both.

Sources: