Discover Good First Issues for thanos-io/thanos
-
UI: Warnings when building react app Created: 2024-01-16T11:23:42Z
Currently there are quite a few warnings when compiling the react-app as seen below.
These should ideally be fixed, afterwards we can remove CI=false from our build step to ensure any new code introduced doesn’t generate new warnings.
List of warnings:
src/components/ListTree.tsx Line 30:50: Unexpected any. Specify a different type @typescript-eslint/no-explicit-any Line 30:61: Unexpected any. Specify a different type @typescript-eslint/no-explicit-any src/components/withStatusIndicator.tsx Line 41:36: Unexpected any. Specify a different type @typescript-eslint/no-explicit-any src/hooks/useFetch.ts Line 12:92: Unexpected any. Specify a different type @typescript-eslint/no-explicit-any src/pages/alerts/AlertContents.tsx Line 11:42: Unexpected any. Specify a different type @typescript-eslint/no-explicit-any Line 163:38: Unexpected any. Specify a different type @typescript-eslint/no-explicit-any src/pages/config/Config.tsx Line 30:17: Forbidden non-null assertion @typescript-eslint/no-non-null-assertion src/pages/graph/CMTheme.tsx Line 15:7: Identifier 'outline_fallback' is not in camel case camelcase src/pages/graph/GraphControls.tsx Line 66:5: Forbidden non-null assertion @typescript-eslint/no-non-null-assertion Line 94:7: Forbidden non-null assertion @typescript-eslint/no-non-null-assertion Line 115:51: Forbidden non-null assertion @typescript-eslint/no-non-null-assertion Line 139:34: Forbidden non-null assertion @typescript-eslint/no-non-null-assertion src/pages/graph/GraphHelpers.ts Line 140:46: Forbidden non-null assertion @typescript-eslint/no-non-null-assertion src/pages/graph/GraphTabContent.tsx Line 8:9: Unexpected any. Specify a different type @typescript-eslint/no-explicit-any src/pages/graph/Panel.tsx Line 54:9: Unexpected any. Specify a different type @typescript-eslint/no-explicit-any Line 208:7: Identifier 'partial_response' is not in camel case camelcase Line 308:20: Unexpected any. Specify a different type @typescript-eslint/no-explicit-any Line 370:45: Unexpected any. Specify a different type @typescript-eslint/no-explicit-any Line 417:7: Identifier 'partial_response' is not in camel case camelcase src/pages/graph/SeriesName.tsx Line 32:47: Forbidden non-null assertion @typescript-eslint/no-non-null-assertion Line 49:32: Forbidden non-null assertion @typescript-eslint/no-non-null-assertion src/pages/graph/TimeInput.tsx Line 36:18: Unexpected any. Specify a different type @typescript-eslint/no-explicit-any Line 63:20: Forbidden non-null assertion @typescript-eslint/no-non-null-assertion Line 81:48: Unexpected any. Specify a different type @typescript-eslint/no-explicit-any src/pages/status/Status.tsx Line 15:42: Unexpected any. Specify a different type @typescript-eslint/no-explicit-any Line 15:63: Unexpected any. Specify a different type @typescript-eslint/no-explicit-any Line 62:55: Unexpected any. Specify a different type @typescript-eslint/no-explicit-any src/pages/targets/Targets.tsx Line 3:8: 'Filter' is defined but never used @typescript-eslint/no-unused-vars Line 6:10: 'useLocalStorage' is defined but never used @typescript-eslint/no-unused-vars src/thanos/pages/blocks/helpers.ts Line 155:45: Array.prototype.map() expects a return value from arrow function array-callback-return src/thanos/pages/errorBoundary/ErrorBoundary.tsx Line 10:45: Unexpected any. Specify a different type @typescript-eslint/no-explicit-any Line 11:22: Unexpected any. Specify a different type @typescript-eslint/no-explicit-any src/utils/index.ts Line 299:28: Unexpected any. Specify a different type @typescript-eslint/no-explicit-any Line 300:13: Unexpected any. Specify a different type @typescript-eslint/no-explicit-any
Comments: 4
Unassigned -
docs: link paths differ on 'tip' vs 'v0.xx' Created: 2023-11-10T14:53:50Z
Thanos, Prometheus and Golang version used: v0.30+
Object Storage Provider: N/A
What happened: Doc links on the website are not consistent between
tip
and versions >=v0.30. For example:https://thanos.io/tip/proposals-done/202004-embedd-cortex-frontend.md/
vs
https://thanos.io/v0.32/proposals-done/202004-embedd-cortex-frontend/
(The addition of the
.md
suffix)The links are the same for versions <=v0.29, presumably because of this Hugo section:
This breaks links between pages like on https://thanos.io/v0.33/components/query-frontend.md/#query-frontend, where there is a link to https://thanos.io/v0.33/proposals-done/202004-embedd-cortex-frontend.md/, but the page is actually at https://thanos.io/v0.33/proposals-done/202004-embedd-cortex-frontend/
What you expected to happen: Link paths should be the same so that users can easily switch between viewing a doc entry for different Thanos versions.
How to reproduce it (as minimally and precisely as possible): Test links like the above on the live site.
Full logs to relevant components: N/A
Anything else we need to know: N/A
Comments: 5
Unassigned -
Show error if targets page is not fully loaded Created: 2023-11-07T13:19:58Z
Sometimes the targets page doesn’t fully load because it takes too long to load targets from leaf nodes. In such cases, no error is shown and then the user doesn’t know anything about this leading to confusion. Add a box with an error in such case.
Comments: 3
Unassigned -
When ```step``` is a multiple of 1 day, each point will offset multiple hours Created: 2023-07-21T06:35:57Z
Is your proposal related to a problem?
Hi
The problem is about
Thanos-frontend
.When create a range query and step set 1day , frontend wil Active alignment response earch ponit at UTC 00:00.
But in other timezoon area,we hope alignment to local timezoon.
For example:In Asia/shanghai , when setp set 1d, we get point always alignment in every day 08:00. because shanghai time between UTC has 8 hours.
Describe the solution you’d like
In my opinion, If step is a multiple of 1 day, then active offset start and end.
step_align.go line 22
func (s stepAlign) Do(ctx context.Context, r Request) (Response, error) { start := (r.GetStart() / r.GetStep()) * r.GetStep() end := (r.GetEnd() / r.GetStep()) * r.GetStep() // my new code is used as a local test // If you aggregate by day, you need to consider the time zone. Offset once by time zone. if r.GetStep()%day.Milliseconds() == 0 { start -= shanghai_offset_ms.Milliseconds() end -= shanghai_offset_ms.Milliseconds() } return s.next.Do(ctx, r.WithStartEnd(start, end)) }
Can add the time zone parameters as the startup configuration to the
thanos-frontend
.Hope to get your opinion, If you agree with my advice, I will implement the code and create a pull request.
Comments: 14
Unassigned -
Rule: remote-write.config should not be the trigger to enable stateless ruler Created: 2023-06-23T11:21:39Z
Is your proposal related to a problem?
The flag
--remote-write.config
automatically enables stateless mode for ruler and no series will be stored in the ruler’s TSDB.Problem: This makes impossible for a stateful ruler to remote_write a subset of their generated recording rule metrics to an external system.
This is something we are currently doing in prometheus and impossible to achieve in stateful ruler.
- Create some recording rules, remote_write only matching series to an external remote-write capable system.
Describe the solution you’d like
I would like to have a dedicated flag to enable stateless mode (and also enforce a remote_write config when it is set to true), so that I can still execute recording rules in stateful mode and remote_write some series to an external system.
Describe alternatives you’ve considered
Migrating the rulers to stateless?: We are already using this architecture for some other rulers but I would like to not use stateless mode for this usecase.
Comments: 6
Unassigned -
Rules UI show recording rules in wrong order Created: 2023-05-12T03:53:50Z
What happened:
The Rules UI displays rules within a rule group in alphabetical order, misleading the user into thinking the recording rules are evaluated in that order. In reality, the rules are evaluated based on the order specified in the yaml configuration file.
What you expected to happen:
The Rules UI should behave similar to Prometheus, showing the recording rules within a rule group in the order they are evaluated.
How to reproduce it (as minimally and precisely as possible):
groups: - name: AGroup interval: 30s rules: - record: good_events:rate_2m expr: some_expression - record: total_events:rate_2m expr: some_expression - record: good_events:rate_1h expr: sum_over_time(good_events:rate_2m[60m])
The order of rules rendered in the UI (
<Rule Component IP>/rules
):record: good_events:rate_2m
record: good_events:rate_1h
record: total_events:rate_2m
Version:
0.29.0
Comments: 7
Unassigned -
Increase sync-block-duration default Created: 2023-04-17T20:52:32Z
I think the store API’s
--sync-block-duration
should be increased from its current3m
to1h
.New blocks are normally set to be shipped every
2h
by the sidecars, and the more recent metrics are handled by the queriers.I self host my own S3 storage layer and this is what
3m
looks like on the backend:This is
15m
:And this two store APIs set to
45m
(earlier today):Comments: 4
Unassigned -
Use Partial Response should be enabled by default Created: 2023-04-12T06:35:29Z
Hi Team,
We have a requirement that Use Partial Response should be enabled in thanos query UI by default. I tried configuring the “–query.partial-response” flag in thanos-query. But I do not see it is enabled in UI and when querying also, its not sending the flag as true in UI. I need Use Partial Response to be enabled in thanos query UI by default. Is there any flag or config that we need to do for this requirement?
Thanos: 0.29.0
Prometheus: 2.40.0.
Thank you!
Comments: 10
Unassigned -
Distributed traces can be excessively verbose Created: 2023-02-02T18:01:53Z
Is your proposal related to a problem?
I’ve noticed that in large deployments, the spans generated by certain thanos components can be extremely verbose. For example, here is a trace generated recently in our production infrastructure:
Note that this trace generated 59,000 spans over 3 seconds.
Describe the solution you’d like
Two solutions could work here:
Thanos could not emit these spans at all
Thanos could conditionally emit these spans based on a config option.
I recognize that these may be quite valuable for development work, but for production systems it can be cost-prohibitive to ingest and process this many spans.
Describe alternatives you’ve considered
The primary alternative we’ve considered is to filter these spans with the opentelemetry collector. However, we still then face the overhead processing these spans, even if we do not ultimately store them.
Additional context
Our deployment is rather large, so the scale of this problem may not be felt by all users of thanos.
Comments: 10
Unassigned -
Need Cardinality Management Support on Thanos Created: 2022-12-30T18:21:35Z
Currently, multiple product teams send their metrics to Thanos (managed by Central Observability Team). When the product teams send high cardinality metrics to Thanos, it chokes the Store Gateway and Queriers to a level where none of the teams would be able to run queries.
When I have asked similar question on Thanos slack community, most common recommendations were to drop metrics/labels on Prometheus side. While this can help, this is more of a reactive approach.
The pro-active approach would be to have cardinality management. This helps the teams to check the cardinality of the labels and metrics and drop them even before this becomes a problem.
Ideally, we should have a way to see the cardinality numbers of the incoming metrics and labels and provide a way to set alerts when these numbers cross a specific value.
We should also be able to drop the metrics and labels on the Thanos side. Mimir has runtime config which can be updated dynamically, without the need to restart any components.
Solution approach:
I see Mimir has built this support and is very helpful for teams managing the Central Observabilty.
https://grafana.com/docs/mimir/latest/operators-guide/reference-http-api/#label-names-cardinality
https://grafana.com/docs/mimir/v2.4.x/operators-guide/configure/configuring-custom-trackers/
Comments: 14
Unassigned -
Document metrics exported by Thanos' components Created: 2022-10-05T14:56:39Z
Is your proposal related to a problem?
Yes. Figuring out which metrics are exported by the various Thanos components and the meaning behind them is often a process that requires diving deep into the source code. This consumes a lot of table and can be challenging to people that aren’t used to the codebase, Prometheus and/or Thanos.
This could be further broken down into various issues for each component.
Describe the solution you’d like
Add to the docs of each component a list of the metrics they export. The list should contain the metric’s name, type, dimensions, and description. Ideally all this information should match the metric definition in the source code.
Describe alternatives you’ve considered
Probably adding a description of the dimensions when applicable is also a good idea. It could be part of the generic description.
Additional context
This idea was initially proposed by @fpetkovski at #5741.
Comments: 13
Unassigned -
Docs: Document different hashing algorithms in receivers Created: 2022-09-09T09:12:11Z
Is your proposal related to a problem?
We recently introduced new hashing algorithm, and although it feels “semi-experimental” to me at this stage, I think it would be good to document both the existing default algorithm and the new algorithm.
Describe the solution you’d like
As a user, when considering Thanos receiver, I want to be able to tell from the docs:
How Hashmod algorithm and Ketama algorithm work, what are their pros and cons, what are their performance implications, how do they compare when horizontally scaling etc.
How and whether I can migrate from one algorithm to another easily, what is the exact procedure, what are the pitfalls
Additional context
The original proposal for Thanos receiver details some working of the Hashmod algorithm: https://thanos.io/tip/proposals-done/201812-thanos-remote-receive.md/
For Ketama, the recent PR is useful for understanding: https://github.com/thanos-io/thanos/pull/5408
Comments: 10
Unassigned -
Show trace ID in querier UI Created: 2022-08-26T13:48:13Z
Is your proposal related to a problem?
When troubleshooting query performance through the querier, I would like be able to know what the trace ID for that query.
Describe the solution you’d like
Would be nice to output the non-truncated trace ID in the UI, maybe next to the other PromQL related stats.
It would also be great to have the query as part of the tracing tags.
Describe alternatives you’ve considered
The trace ID is currently only a response header, so I have to use the network inspector. It is also truncated for some reason so it’s hard to look it up in a tracing system:
Comments: 13
Unassigned -
Receiver: Logically split router and ingestor mode Created: 2022-08-25T11:36:21Z
Is your proposal related to a problem?
Currently all of the code for receiver ingestion / routing logically is bunched up in the same file. A lot of the features are enabled based on
if
conditions sprinkled throughout the receiver source file. This makes it hard to follow the code and changes, as we’re implementing things that increase the divide between ingestor and router mode.On top of that, router mode currently runs with multi TSDB and store API instantiated, even though we do not collect metrics on such nodes, nor should they be queryable / expose tenant stats (see https://github.com/thanos-io/thanos/pull/5623)
Describe the solution you’d like
Although they are the same component, we could treat ingestor / router receivers internally as separate components with better logic separation in the code. At the same time, receiver in router mode should not have the overhead of running store API.
Describe alternatives you’ve considered
Exist with both modes intermingled :cry:
Additional context
Previous work in https://github.com/thanos-io/thanos/pull/5623, but I abandoned it in favor of larger refactor.
Comments: 4
Unassigned -
The Table of Contents in the changelog links to Github Created: 2022-04-26T23:43:56Z
On https://thanos.io/tip/thanos/changelog.md/ -> Table of Contents
This link v0.25.2 - 2022.03.24 should actually go towards https://thanos.io/tip/thanos/changelog.md/#v0252httpsgithubcomthanos-iothanostreerelease-025—20220324
Perhaps this page / table of contents could be easier/nicer as well - For example that the table of contents would be a menu on the left or right so you do not have to scroll all the way down.
Furthermore, it would be nice if we link to github, but not towards https://github.com/thanos-io/thanos/tree/release-0.25 but to https://github.com/thanos-io/thanos/releases/tag/v0.25.2
Comments: 6
Unassigned -
UI: Color code stores based on how they are being queried Created: 2022-01-03T15:43:55Z
The idea originated from this comment, see for full context https://github.com/thanos-io/thanos/pull/4908#discussion_r771194974
Basically, we could improve the user experience by providing more information on how the stores are being queried, beyond providing just min / max time or just showing that the min / max time is not available, which does not in fact tell users if that store will be queried.
Comments: 9
Unassigned -
Logging: Simplify and adjust request logging configuration Created: 2021-12-16T09:52:43Z
As seems to stem from the discussion in https://github.com/thanos-io/thanos/pull/4934, request logging implementation and configuration seem to have a couple of confusing features, which might be a bit misleading for the users.
I’ll try to summarize some of my and @PhilipGough’s observations below:
There is no simple way to determine if the request logging for a component has taken effect - perhaps it would make sense to log a line with this information (e.g. “logging for XY has been enabled”)
A configuration to log requests for particular server (HTTP, gRPC) on a component requires to specify a port, but it is not clear which port the user should specify and why this is required. Perhaps specifying the port is not necessary?
It is possible to configure a log level for the requests logs, but the logs still do not seem to appear until
log.level=debug
is used. I think normal user expectation would be to set the level and see the logs being logged at that level, without the need to setlog.level=debug
.URL path configuration seems to be working on the basis of exact match, per the comment in https://github.com/thanos-io/thanos/pull/4934#discussion_r769768126, but this is insufficient if URL parameters are passed as well.
Comments: 9
Unassigned -
Tests: Clean up and automate repository for Swift object store test Docker image Created: 2021-10-28T16:15:25Z
As a part of https://github.com/thanos-io/thanos/pull/4735#issuecomment-953757577, we have now forked https://github.com/thanos-community/docker-swift-onlyone-authv2-keystone which contains code to build a Docker image required for our Swift object store tests. However, we should improve on this, namely by:
Allowing issues on the repo, so we can track things in there (this is not turned on by default for forks)
Cleaning-up the repo (Remove docs / information which is not applicable)
Automating Docker image build + push for each change (right now it has to be done manually)
Comments: 7
Unassigned -
Refactoring: move all http.Transport configuration parameter parsing into one place Created: 2021-09-01T14:17:17Z
These are all functions/structs for configuring the same thing:
Another place being added: https://github.com/thanos-io/thanos/pull/4623
Move them into one struct. This probably belongs in
exthttp
.Comments: 9
Unassigned -
tools bucket replicate: concurrent replication Created: 2021-05-27T11:56:19Z
Is your proposal related to a problem?
Replication is currently slow.
Describe the solution you’d like
tools bucket replicate
could upload a few objects at a time instead of one by oneDescribe alternatives you’ve considered
Copying the objects with some other tool but
tools bucket replicate
is way too convenient because you can use two completely different object storage providers thus there are no alternatives to this, IMHO.Additional context
Shouldn’t be too hard to implement :thinking:
Comments: 15
Unassigned -
Add query frontend component to the quickstart script Created: 2020-10-12T14:27:12Z
Feature request
It would be nice to integrate query frontend to the https://github.com/thanos-io/thanos/blob/master/scripts/quickstart.sh so that we can easily test the whole stack.
AC:
In-memory result cache config should be supported
Would be nice to configure Jaeger for query frontend
Comments: 6
Unassigned -
Thanos UI Enhancements Created: 2020-09-08T22:42:08Z
The Thanos project has recently migrated its UI to one built on re-usable and shareable components written in React, with the goal of fostering collaboration with the broader Prometheus community. We would like to propose enhancements to the UI to better surface information in every Thanos component. This includes: exposing build- and run-time configuration; creating an API discovery page to provide living documentation of all of the endpoints exposed by Thanos components; and adding benchmarks for the UI. These enhancements all have the additional benefit of being relevant to and useful in the Prometheus project. As part of this proposal, we would like to further collaborate with the Prometheus community to continue building a shared UI component library and to contribute upstream to Prometheus so that it can leverage these components in its UI.
cc @thanos-io/thanos-maintainers
cc @prmsrswt
Comments: 18
Unassigned -
Improve compatibility testing against Prometheus and TSDB. Created: 2019-01-24T09:08:14Z
Let’s start with acceptance criteria for our compatibity tests:
- I would like to know with which versions of Prometheus Thanos supports on each PR.
But what we mean by supports? We have essentially 2 points of contact:
TSDB format (including very low level index and metadata scheme)
HTTP API (
api/v1/flags
,api/v1/config
,api/v1/label/
,api/v1/read
, api/v1/snapshot` etc)
Storage format can change but is versioned (index and metadata separatedly e.g index version changed somewhere between 2.0 and 2.2.1), HTTP API should not for
v1
but things get added (e.gapi/v1/flags
was added inv2.2.1
, snaphot endpoint was extended etc)Goal: Support all minor Prometheus versions (e.g 2.0, 2.2, … 2.7.. etc) There are expections. For example broken Prometheus releases like
2.1.x
. This means that we would like to test and support to tip of minor version (e.g 2.4.3 for 2.4).How we test this now?
Now (Before https://github.com/improbable-eng/thanos/pull/704 PR or https://github.com/improbable-eng/thanos/pull/730 lands, depending which will land first), our current method for testing compatibility is to perform on CI:
SUPPORTED_PROM_VERSIONS ?=v2.2.1 v2.3.2 v2.4.3 v2.5.0 @for ver in $(SUPPORTED_PROM_VERSIONS); do \ THANOS_TEST_PROMETHEUS_PATH="prometheus-$$ver" THANOS_TEST_ALERTMANAGER_PATH="alertmanager-$(ALERTMANAGER_VERSION)" go test $(shell go list ./... | grep -v /vendor/ | grep -v /benchmark/); \ done
This runs ALL our tests with different
THANOS_TEST_PROMETHEUS_PATH
var which controlled which Prometheus binary is used for our e2e tests (we have quite few of them. All tests that ends up withe2e
suffix in name). This tests were fine to check if we support our common points as mentioned above.The problems we see:
Upgrade of TSDB Golang dependencies (like here) blocks our ability to do any advance testing methods like injecting blocks here or here. This is because, obviously as new Promethus versions are backward compatible with old TSDB format versions, the old Prometheus versions are not forward compatible with new format.
We run ALL tests against different Prometheus versions using external for loop. This means:
We might hit golang using cache all the time, because changing some environment variable is not seen by caching logic and it can assume code being not changed, thus cache being used.
Something is wrong with signal handling, as some tests are green, but actually should fail: https://circleci.com/gh/improbable-eng/thanos/1935?utm_campaign=vcs-integration-link&utm_medium=referral&utm_source=github-build-link
What if config/flags change in some Prometheus version?
Note that upgrading TSDB and Prometheus dependencies is essential to stay up to date with fixes and recent optimizations. We reuse lots of packages.
Extra:
As a nice-to-have we would like to make sure anyone can grab TSDB block from object storage to Prometheus and use it there. This means that we need test if compactor produced block is compatible with Prometheus and if yes, with what version (aiming for just latest is fine). How to test that?
Comments: 16
Unassigned -
s3/aliyunOSS: Support multipart upload without pre known object size. Created: 2018-12-13T14:38:22Z
Discussion for Reference https://github.com/improbable-eng/thanos/pull/617
Acceptance Critiria:
- Minio S3 provider will choose when to use multi part properly even with just
Upload(ctx context.Context, name string, r io.Reader)
interface.
Comments: 16
Unassigned - Minio S3 provider will choose when to use multi part properly even with just
-
ThanosRuler is missing follow_redirect setting Created: 2022-12-05T15:27:03Z
Thanos, Prometheus and Golang version used: v0.28.0
Object Storage Provider: local filesystem
What happened: When alertmanager has authentication, ThanosRuler follows redirects to the OAuth provider and accepts its 200 OK status code as a correct status of alerts send, rather than check if it was actually sent.
What you expected to happen: Set
alertmanagers.config:
follow_redirects=false
to prevent redirects and fail when 302 code received.How to reproduce it (as minimally and precisely as possible): Create nginx instance with
location /api/v1/alerts { return 302 "https://example.com"; }
, set this instance as an alertmanager in ThanosRuler and generate some alerts.Expected behaviour - notification is failing, actual behaviour - “alerts are sent”.
Comments: 13
-
*: Automate `objstore` and `promql-engine` dependency update Created: 2022-11-22T08:53:36Z
We are seeing often issues originating with different object storage implementations that can be resolved by simple
objstore
update. One example for all: https://github.com/thanos-io/thanos/issues/5904.Although the
objstore
is now a separate repository, once changes are made there, we should have an automated way of updating the dependency version, to avoid gaps in version and resolve issues related to object storage updated preemptively.We could have a simple periodic job checking for changes in
objstore
that would automatically open a PR with the update. Using dependabot should actually fit squarely for this use case. Unless there are API changes, most of the updates should be a matter of bumping the version and ensuring tests succeed.Similarly, we should consider other (internal) packages where want to keep close with the upstream, such as the https://github.com/thanos-community/promql-engine
Comments: 13