Response caching for Thanos

Summary of Closed Issue: Response Caching for Thanos

Issue Overview: The issue, initiated by member bwplotka on October 15, 2019, served as an umbrella for enhancements needed to support efficient response caching in Thanos, particularly regarding the integration with the Cortex query-frontend that now supports the Prometheus API.

Key Points:

  1. Query Planning and Block Structure:

    • The query-frontend in Cortex uses daily splitting to handle requests. This approach may not be optimal for Thanos because:
      • Downsampled queries over large time ranges require a more intelligent splitting mechanism.
      • Splitting by day for long ranges can lead to inefficient index lookups, needing potential redesign of the query split strategy.
  2. Proposed Splitting Logic:

    • A custom interval function was suggested based on query step values. This aims to optimize how queries are batched during processing.
    • Example logic was provided to determine the splitting intervals based on step duration (e.g., using a 120-day interval for >30m steps).
  3. Challenges Identified:

    • Edge cases need consideration where downsampled blocks are unavailable or where query efficiency could vary based on the number of blocks accessed.
    • The necessity to handle partial responses in cases of StoreAPI downtime without causing cached inaccuracies was highlighted.
  4. Caching Mechanism Adjustments:

    • Recommendations to prevent caching of partial responses were discussed, along with implementing special headers for cache avoidance.
    • Considerations on how the Querier can obtain and utilize information for cache key generation based on StoreAPI changes were proposed.
  5. Potential Directions for Future Work:

    • Suggested moving query-frontend caching logic outside the Cortex project to improve modularity.
    • Encouragement to consider a built-in caching solution directly within Thanos Querier, with backing from a library such as Memcached.
  6. Active Contributions and Feedback:

    • The issue saw various contributions, feedback, and updates from multiple project members, including both implementation suggestions and discussions on design approaches.
    • Despite initial activity, the issue was marked as stale and closed due to inactivity but was noted to be revisited in the light of related developments in the project.

Conclusion: The issue encapsulated the complexities and considerations around implementing a responsive and efficient caching strategy within the Thanos architecture when integrating with Cortex’s capabilities. The collaborative effort aimed to create a mechanism that balances resource efficiency with performance needs, setting the groundwork for future explorations.