Thanos Data Storage and Retrieval

This documentation covers the data storage and retrieval mechanisms in Thanos, including its storage format, indexing, and querying mechanisms. Thanos is an open-source system tailored for Prometheus users to enable efficient long-term storage and retrieval of time-series data.

What is Data Storage and Retrieval?

Data Storage and Retrieval in Thanos refers to the process of storing and efficiently retrieving time-series data using Thanos’ optimized indexing and chunking mechanisms. Thanos supports writing and reading data in the native Prometheus TSDB blocks format.

Why is Data Storage and Retrieval important?

Data Storage and Retrieval is crucial for Thanos as it enables efficient storage and retrieval of large volumes of time-series data. Thanos’ optimized indexing and chunking mechanisms allow for fast querying and efficient use of object storage.

Insights

Thanos uses the native Prometheus TSDB blocks format for data storage. This format is also used by Prometheus for persisting data on the local disk. Thanos supports index file versions v1 and v2.

Index File Format

The index file stores the index created to allow efficient lookup for series and its samples. All entries are sorted lexicographically unless stated otherwise. The index file is terminated by a table of contents (TOC) which serves as an entry point into the index.

Symbol Table

The symbol table holds a sorted list of deduplicated strings that occurred in label pairs of the stored series. It significantly reduces the total index size. The section contains a sequence of the string entries, each prefixed with the string’s length in raw bytes. All strings are utf-8 encoded. Strings are referenced by sequential indexing. The strings are sorted in lexicographically ascending order.

Series

The series section contains a sequence of series that hold the label set of the series as well as its chunks within the block. The series are sorted lexicographically by their label sets. Each series entry first holds its number of labels, followed by tuples of symbol table references that contain the label name and value. The label pairs are lexicographically sorted. After the labels, the number of indexed chunks is encoded, followed by a sequence of metadata entries containing the chunks minimum (mint) and maximum (maxt) timestamp and a reference to its position in the chunk file.

Label Index

A label index section indexes the existing (combined) values for one or more label names. The body holds #entries / #names tuples of symbol table references, each tuple being of #names length. The value tuples are sorted in lexicographically increasing order. This is no longer used.

Postings

Postings sections store monotonically increasing lists of series references that contain a given label pair associated with the list.

Label Offset Table

A label offset table stores a sequence of label offset entries that point to the beginning of each label index section for a given label name. This is no longer used.

Postings Offset Table

A postings offset table stores a sequence of postings offset entries, sorted by label name and value. Every postings offset entry holds the label name/value pair and the offset to its series list in the postings section. They are used to track postings sections. They are partially read into memory when an index file is loaded.

TOC

The table of contents (TOC) serves as an entry point to the entire index and points to various sections in the file. If a reference is zero, it indicates the respective section does not exist and empty results should be returned upon lookup.

Thanos supports writing and reading data in native Prometheus TSDB blocks in TSDB format. Chunk files hold a few hundred MB worth of chunks each, and chunks for the same series are sequentially aligned. Series in return are aligned by their metric name. Blocks can be backed up to object storage and later be queried by another component. The meta.json file holds meta-information about a block, including the time range and compaction level. Thanos extends the meta.json file with a “thanos” section to which Thanos-specific metadata can be added.