Shoulder.dev Logo Shoulder.dev

Logging - thanos-io/thanos

Thanos Logging

Thanos is an open-source, CNCF Sandbox project that extends Prometheus for multi-tenant and high-availability monitoring. It uses Prometheus components and adds global-scale and high-availability features. Thanos's logging system is designed to provide insights into the system's behavior and help with debugging and monitoring.

Logging Options

Thanos provides different logging options to cater to various use cases. These options are discussed in the proposal Query Logging for Thanos. The logging options include:

  1. Audit Logging: This type of logging logs every internal API request. It is useful for developers to understand the system's state and debug issues.
  2. Active Query Logging: This logger logs all the current active logs in a component. It helps in debugging queries that led to the component instance being killed or tracking queries taking too long.
  3. Adaptive Logging: This type of logging logs only certain queries that satisfy a specific policy or condition. Examples of filters include invalid requests or latency time crossing a certain barrier.

Logging Implementation

Thanos uses middlewares to intercept and filter the queries for logging. The StoreAPI level is used for implementing Adaptive Logging, which allows for a global overview of the logs due to the StoreAPI being interconnected with other components. A custom decider is used to log certain methods based on the logging policy defined.

Configuration

Thanos's logging configuration can be set using the --request.logging-config flag. The configuration file is a YAML file that specifies the logging policy, including the log level, duration fields, and error codes.

Here's an example of a logging configuration:

decision: LogStartAndFinishCall
duration_to_fields:
  - name: request_duration
    unit: ms
    log_level: info
error_to_code: DefaultErrorToCode

In this example, the decision is set to LogStartAndFinishCall, meaning that both the start and finish of the request will be logged. The duration_to_fields section specifies that the request duration should be logged in milliseconds at the info log level. The error_to_code field is set to DefaultErrorToCode, which returns a 500 Internal Server Error by default.

Logging Format

Thanos supports two logging formats: logfmt and json. The log format can be set using the --log.format flag.

Logfmt

Logfmt is a lightweight, structured log format that is easy to parse and supports structured data. Here's an example of a logfmt log entry:

http.start_time=2022-09-12T15:04:05.123Z http.method=GET http.request_id=123456 tenant_ids=["user1", "user2"] msg="started call"

JSON

JSON is a widely-used, structured log format that supports complex data structures. Here's an example of a JSON log entry:

{
  "http": {
    "start_time": "2022-09-12T15:04:05.123Z",
    "method": "GET",
    "request_id": "123456"
  },
  "tenant_ids": ["user1", "user2"],
  "msg": "started call"
}

Logging Examples

Here are some examples of Thanos log entries for different logging options:

Audit Logging

level=info ts=2022-09-12T15:04:05.123Z caller=main.go:329 msg="Starting Thanos" version="(version=v0.25.0, branch=main, revision=abc123)"
level=info ts=2022-09-12T15:04:05.123Z caller=main.go:330 build_context="(go=go1.17.5, user=root@7a9dbdbe0cc7, date=20220822-13:53:16)"
level=info ts=2022-09-12T15:04:05.123Z caller=main.go:331 host_details="(Linux 4.15.0-55-generic #60-Ubuntu SMP Tue Jul 2 18:22:20 UTC 2019 x86_64 mari (none))"
level=info ts=2022-09-12T15:04:05.123Z caller=main.go:332 fd_limits="(soft=1000000, hard=1000000)"
level=info ts=2022-09-12T15:04:05.123Z caller=main.go:333 vm_limits="(soft=unlimited, hard=unlimited)"

Active Query Logging

level=info ts=2022-09-12T15:04:05.123Z caller=query_logger.go:74 component=activeQueryTracker msg="These queries didn't finish in Thanos' last run:" queries="[{\"query\":\"changes(changes(thanos_http_request_duration_seconds_bucket[1h:1s])[1h:1s])\",\"timestamp_sec\":1663002604}]"

Adaptive Logging

level=error ts=2022-09-12T15:04:05.123Z caller=main.go:654 msg="Error while processing request" err="invalid request format"

Logging Best Practices

  1. Use a structured logging format: Structured logging formats, such as logfmt and JSON, make it easier to parse and analyze logs.
  2. Set appropriate log levels: Use the correct log level for each log entry. This helps in filtering and prioritizing log entries during analysis.
  3. Include relevant metadata: Include metadata, such as tenant IDs, request IDs, and user IDs, to help with debugging and monitoring.
  4. Rotate logs: Regularly rotate logs to prevent them from consuming too much disk space.
  5. Centralize logs: Centralize logs to a single location for easier analysis and monitoring.

Logging Tools

Thanos integrates with various logging tools, such as Prometheus, Grafana, and Loki. These tools can be used to visualize, analyze, and monitor logs.

Conclusion

Thanos's logging system is designed to provide insights into the system's behavior and help with debugging and monitoring. By using different logging options, such as Audit Logging, Active Query Logging, and Adaptive Logging, Thanos offers flexibility in logging and monitoring. Configuring the logging system using the --request.logging-config flag and setting the appropriate log level, duration fields, and error codes ensures that Thanos logs are useful and informative. Using a structured logging format, such as logfmt or JSON, and integrating Thanos with logging tools, such as Prometheus, Grafana, and Loki, helps in visualizing, analyzing, and monitoring Thanos logs.