Scaling thanos-io/thanos

This document provides an in-depth guide on how to scale the Thanos project in production environments. It is targeted at expert developers who are looking to optimize their implementation and deployment of Thanos for better performance and reliability.

Overview of Production Scaling

Scaling Thanos horizontally involves distributing workloads across multiple instances of its components to handle increased data ingestion rates and query loads. This section covers fundamental principles and specific configurations necessary for effective scaling.

Basic Considerations

Before proceeding with scaling, consider the following:

Horizontal vs. Vertical Scaling: Favor horizontal scaling of Prometheus instances to handle increased ingestion without adding more resources to a single instance. Thanos is designed to work optimally with horizontally scaled Prometheus instances.
Identical Architecture: Deploying identical architecture across data centers allows for simplified management and consistent performance.
Global Metrics View: Ensure all Thanos components are configured to provide a global view of metrics, enabling accurate monitoring and tracing across all instances.

Step-by-Step Scaling Guide

Containerization with Docker

Utilize Docker to create multiple instances of Thanos components. Below is an example Dockerfile to configure your Thanos image:

ARG BASE_DOCKER_SHA="14d68ca3d69fceaa6224250c83d81d935c053fb13594c811038c461194599973"
FROM quay.io/prometheus/busybox@sha256:${BASE_DOCKER_SHA}
LABEL maintainer="The Thanos Authors"

COPY /thanos_tmp_for_docker /bin/thanos

RUN adduser \
    -D \
    -H \
    -u 1001 \
    thanos && \
    chown thanos /bin/thanos
USER 1001
ENTRYPOINT [ "/bin/thanos" ]

This Dockerfile sets up a minimal environment for running Thanos.

Configure Limits for Ingestion and Querying

It’s crucial to configure ingestion limits to handle traffic efficiently. Modify the configuration within the limits.go file. Below is an excerpt illustrating how to set limits:

type Limits struct {
    IngestionRate        float64 `yaml:"ingestion_rate" json:"ingestion_rate"`
    IngestionBurstSize   int     `yaml:"ingestion_burst_size" json:"ingestion_burst_size"`
    MaxSeriesPerQuery    int     `yaml:"max_series_per_query" json:"max_series_per_query"`
    MaxSamplesPerQuery   int     `yaml:"max_samples_per_query" json:"max_samples_per_query"`
    MaxFetchedSeriesPerQuery int `yaml:"max_fetched_series_per_query" json:"max_fetched_series_per_query"`
}

Adjust these values based on your expected load and resource availability.

Use Sharding with Tenants

Sharding is a method to distribute the load among multiple instances. Utilize the ingestion-tenant-shard-size parameter:
```
f.IntVar(&l.IngestionTenantShardSize, "distributor.ingestion-tenant-shard-size", 0, "Default tenant's shard size when shuffle-sharding is used.")
```
This parameter can greatly enhance scalability, allowing more instances to handle the traffic.

Distributed Components and Configuration

Incorporate multiple components such as the Distributor, Ingester, Querier, and Store Gateway, each configured with optimal settings for performance. For example, in your Helm chart or deployment YAML, you can configure multiple replicas:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: thanos-query
spec:
  replicas: 3
  template:
    spec:
      containers:
      - name: thanos
        image: thanos:latest
        args:
        - query
        - --http-address=0.0.0.0:9090
        - --grpc-address=0.0.0.0:9091
        - --store=<STORE-GATEWAY-ADDRESS>

Ensure Data Consistency and High Availability

High availability must be enforced to ensure that data can be ingested continuously. This might include implementing the alert configuration and validation limits for alerts in Thanos components. Consider settings taken from the Limits struct:
```
(l *Limits) RegisterFlags(f *flag.FlagSet) {
    f.IntVar(&l.AlertmanagerMaxAlertsCount, "alertmanager.max-alerts-count", 0, "Maximum number of alerts that a single user can have.")
}
```
Regulation of alert counts can maintain performance even under heavy loads.
Monitoring and Metrics

Continuously monitor the performance metrics of your deployed instances. Utilize built-in Prometheus monitoring metrics within Thanos components. Ensure to collect relevant metrics for components like the Querier, Store, and Compactor.
Testing Before Full Deployment

Utilize a staging environment to test scaling before deploying to production. The Makefile includes helpful commands for local testing:
```
test:
    go test ./...
```
Run the tests to verify your configuration settings do not introduce any instability.
Performance Evaluation

Conduct performance evaluations using load testing tools to simulate high loads and assess how your Thanos setup responds. Focus on monitoring for bottlenecks that may arise during high traffic.

Final Considerations

Scaling Thanos effectively requires methodical planning and an understanding of its components. Employing horizontal scaling, configuring appropriate limits, and ensuring high availability and monitoring can help build a robust Thanos architecture capable of handling production workloads.

References

Documentation and examples can often be found within the Thanos GitHub repository.
For more in-depth configurations and practical usage, refer to the relevant source files and linked proposals within the project.

This guide aims to assist in realizing the full capacity of Thanos in production environments while maintaining performance and data integrity.