Monitoring Kubernetes Clusters with Thanos

Scenario: A developer, named Alex, wants to monitor their Kubernetes clusters using Thanos for better insights into cluster metrics and performance. Alex has already installed Thanos and is familiar with its basic functionalities.

Solution: Thanos offers integrations with Kubernetes for monitoring clusters. In this example, we will walk through the steps to configure and use Thanos to monitor Kubernetes clusters.

Configure Thanos to Monitor Kubernetes Metrics:

First, Alex needs to configure Thanos to monitor Kubernetes metrics. Thanos uses Prometheus as its data model, so Alex will create a Prometheus configuration file to define the Kubernetes exporters.

Create a new file named k8s.yml in the thanos/testdata/ directory with the following content:

scrape_configs:
- job_name: 'kubernetes-nodes'
static_configs:
- targets: ['<NODE_IP>:<NODE_PORT>']

- job_name: 'kubernetes-services'
static_configs:
- targets: ['<SERVICE_IP>:<SERVICE_PORT>']

Replace <NODE_IP> and <NODE_PORT> with the IP address and port of the Kubernetes node exporter, and <SERVICE_IP> and <SERVICE_PORT> with the IP address and port of the Kubernetes service exporter.

Create a Thanos Configuration File:

Next, Alex needs to create a Thanos configuration file to tell Thanos where to find the Prometheus configuration files and how to store the data. Create a new file named thanos.yml in the thanos/config/ directory with the following content:

receivers:
- type: prometheus
config:
scrape_config_file: /thanos/testdata/k8s.yml

- type: file
config:
path: /thanos/testdata/
filter:
expr: __name__ == 'kubernetes_container_cpu_usage_total'

- type: file
config:
path: /thanos/testdata/
filter:
expr: __name__ == 'kubernetes_container_memory_usage_bytes'

store:
type: file
config:
path: /thanos/data

Run Thanos:

Now, Alex can start Thanos by running the following command in the thanos/ directory:

go run cmd/thanos/main.go

Thanos will start and begin scraping metrics from the Kubernetes clusters.

Querying Metrics:

To query the metrics, Alex can use the Thanos query frontend. First, Alex needs to build the query frontend:

go build cmd/thanos/query_frontend.go

Next, Alex can start the query frontend with the following command:

./query-frontend

Now, Alex can query the metrics using the GraphQL API. For example, to get the CPU usage of a container named my-container running on a node named my-node, Alex can use the following GraphQL query:

query {
query(
metric: {
__type__: "vector"
metric: {
__type__: "label"
name: "__name__"
value: "kubernetes_container_cpu_usage_total"
}
dimension: {
__type__: "label"
name: "container_name"
value: "my-container"
}
dimension: {
__type__: "label"
name: "node_name"
value: "my-node"
}
}
start: "2023-03-01T00:00:00Z"
end: "2023-03-01T01:00:00Z"
) {
data {
value {
doubleValue
}
}
}
}

Testing:

To verify the correctness of the setup, Alex can write tests to check if Thanos is correctly scraping and storing the metrics. Create a new file named test_query.go in the thanos/query_test.go directory with the following content:

package test

import (
"context"
"testing"
"time"

"github.com/thanos-io/thanos/pkg/query"
"github.com/thanos-io/thanos/pkg/testutil"
"github.com/thanos-io/thanos/pkg/testutil/dtest"
"github.com/thanos-io/thanos/pkg/testutil/dtest/datasource"
"github.com/thanos-io/thanos/pkg/testutil/dtest/promql"
)

func TestQuery(t *testing.T) {
ctx := context.Background()
ds := datasource.NewFileDatasource("./testdata")
queryable := query.NewQueryable(ds)

query := promql.NewQuery("sum(rate(kubernetes_container_cpu_usage_total{container_name=\"my-container\", node_name=\"my-node\"}[1m]))")
result, err := queryable.Query(ctx, query)
if err != nil {
t.Fatalf("Failed to query: %v", err)
}

expected := float64(100) // Replace this with the expected value
if result.Value.DoubleValue != expected {
t.Fatalf("Unexpected result: got %v, want %v", result.Value.DoubleValue, expected)
}
}

Replace the expected value with the actual expected value. Run the test with the following command:

go test cmd/thanos/query_test.go

If the test passes, then Thanos is correctly monitoring the Kubernetes clusters.