Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BanyanDB] Add "sharding_key" to improve TopNAggregation performance #12526

Open
2 of 3 tasks
hanahmily opened this issue Aug 13, 2024 · 1 comment
Open
2 of 3 tasks
Labels
database BanyanDB - SkyWalking native database enhancement Enhancement on performance or codes feature New feature

Comments

@hanahmily
Copy link
Contributor

Search before asking

  • I had searched in the issues and found no similar feature requirement.

Description

The current data distribution based on the combination of 'name' and 'entity' can lead to performance issues when calculating the 'TopNAggregation'. This is because each shard only has a subset of the top-n list, and the query process has to be responsible for aggregating those lists to obtain the final result. This introduces overhead in terms of querying performance and disk usage.

To address this issue, we propose adding a new optional field called sharding_key to both Stream and Measure. This field will be used to determine the data distribution, and it will default to entity if not specified.

For example, if we set the sharding_key as service_id, then the new route table should look like this:

  • service_1-10.0.0.1:shard0
  • service_1-10.0.0.2:shard0

This means that instances from the same service will be placed into the same shard, which should improve the performance of the 'TopNAggregation' query.

Task List

  1. Add the sharding_key field to the Stream and Measure models.
  2. Implement the data distribution logic based on the sharding_key field, with entity as the default if sharding_key is not specified.
  3. Update the 'TopNAggregation' flow to write the result to the same shard as the measure.
  4. Ensure backward compatibility for existing data and queries.
  5. Update the documentation to explain the new sharding_key field and its usage.

Use case

No response

Related issues

No response

Are you willing to submit a pull request to implement this on your own?

  • Yes I am willing to submit a pull request on my own!

Code of Conduct

@hanahmily hanahmily added feature New feature enhancement Enhancement on performance or codes database BanyanDB - SkyWalking native database labels Aug 13, 2024
@hanahmily hanahmily added this to the BanyanDB - 0.8.0 milestone Aug 13, 2024
@wu-sheng
Copy link
Member

Let's know if this needs oap side work. TopN has with or without service ID mode, right?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
database BanyanDB - SkyWalking native database enhancement Enhancement on performance or codes feature New feature
Projects
None yet
Development

No branches or pull requests

2 participants