Cross-Cluster Federation - thanos-io/thanos

Cross-Cluster Federation with Thanos

Thanos is an open-source project that allows users to build a highly available Prometheus setup with long-term storage capabilities. It is designed to monitor multiple Kubernetes clusters and provides a unified way to query over multiple clusters. Thanos is part of the Cloud Native Computing Foundation (CNCF) and is used by many organizations, including Alibaba Cloud, LastPass, and Medallia.

Thanos components

Thanos is a clustered system of components with distinct and decoupled purposes. Clustered components can be categorized as follows:

  • Metric sources: These are the Prometheus servers that are running in each cluster. Thanos uses a sidecar pattern to connect to these servers and scrape metrics from them.
  • Stores: These are the components that store the scraped metrics. Thanos uses object storage, such as S3 or GCS, to store the metrics.
  • Queriers: These are the components that allow users to query the stored metrics. Thanos uses a gRPC-based protocol to query the stores and aggregates the results.

Cross-cluster communication

Thanos supports cross-cluster communication using TLS encryption. This allows users to securely access resources, such as StoreAPIs of sidecars, located externally. To configure cross-cluster TLS communication, users need to perform the following steps:

  1. Configure a service for the envoy sidecar.
  2. Point the querier at the service and the correct port.
  3. Make sure the remote cluster has TLS setup and an appropriate HTTP2 supported ingress.

Envoy can be implemented as a sidecar container within the Thanos Querier pod on the Observer Cluster. It will perform TLS origination to connect to secure remote sidecars while forwarding their communications unencrypted back, locally to Thanos Querier.

Multicluster setup

Thanos can be used to monitor multiple Kubernetes clusters. To set up Thanos for multicluster monitoring, users need to deploy Thanos components in each cluster and configure cross-cluster communication.

Thanos provides a unified way to query over multiple clusters. Users can use the Thanos Querier to query metrics from multiple clusters. The Thanos Querier will fan out queries to the existing Prometheus servers in each cluster. The existing Prometheus servers need an added sidecar (the Thanos Sidecar) deployed alongside them in order to handle these queries.

Thanos provides several benefits over traditional Prometheus setups:

  • Scalability: Thanos allows users to scale Prometheus horizontally by adding more metric sources and stores.
  • High availability: Thanos provides a highly available setup by replicating metrics across multiple stores.
  • Long-term storage: Thanos allows users to store metrics for a long time by using object storage.

Conclusion

Thanos is a powerful tool for monitoring multiple Kubernetes clusters. It provides a unified way to query over multiple clusters and allows users to scale Prometheus horizontally. Thanos also provides high availability and long-term storage capabilities.

Sources: