Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[venice-server] Dropping unassigned partitions #1196

Draft
wants to merge 35 commits into
base: main
Choose a base branch
from

Conversation

kristyelee
Copy link

@kristyelee kristyelee commented Sep 24, 2024

Summary, imperative, start upper case, don't end with a period

The goal is to remove unassigned partitions that are on disk state

This involves storage service and engine, which upon starting (constructed), will go through folders on disk and verify status of folders
Helix's ideal state is used for consulting partitions and their assignment
The change proposed includes defining a function in VeniceServer that goes through partitions in idealState and compares to partitions in storageEngine
StorageEngine object initialization is also being updated to run the new function that performs this verification.

Resolves #650

How was this PR tested?

Will write corresponding test.
EDIT: A unit test has been written.

Does this PR introduce any user-facing changes?

  • No. You can skip the rest of this section.
  • Yes. Make sure to explain your proposed changes and call out the behavior change.

@kristyelee kristyelee marked this pull request as draft September 24, 2024 02:37
@kvargha
Copy link
Contributor

kvargha commented Sep 24, 2024

It looks like spotBugs is failing.

If you run this locally, you'll be able to read the report to see what's wrong.
./gradlew :services:venice-server:spotbugsMain

Make sure to build before running this.

@kristyelee kristyelee force-pushed the kristy_lee/650 branch 2 times, most recently from 3e496b1 to 56e7711 Compare September 25, 2024 18:58
@kristyelee
Copy link
Author

(Suggested addition to code, that seems directly useful but is no longer / not used actively:

In the HelixParticipationService file, similar to the getInstance function, I also wrote a getHelixManager function. I tried using this before in access after HelixParticipationService is instantiated, but have changed this implementation. )

(Reverted code in this file back to original state but thought I should make a note of this).

Set<Integer> storageEnginePartitionIds = storageEngine.getPartitionIds();

for (Integer storageEnginePartitionId: storageEnginePartitionIds) {
if (idealStatePartitionIds.contains(storageEnginePartitionId)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This also isn't quite right. The ideal state is the superset of all partitions for the store. All partitions ID's in the storageEngine will be in the superset.

We need to parse more information from the ideal state. Specifically, for each partition that is on this node, we need to determine if the ideal state for that partition id contains the node name.

To better paint the picture, an ideal state is a big json document that looks like this:

"Test_Store_Migration_Demo_v1_0": {
"lor1-app56585.prod.linkedin.com_1690": "LEADER",
"lor1-app56614.prod.linkedin.com_1690": "STANDBY",
"lor1-app110448.prod.linkedin.com_1690": "STANDBY"
},
"Test_Store_Migration_Demo_v1_1": {
"lor1-app56586.prod.linkedin.com_1690": "LEADER",
"lor1-app71895.prod.linkedin.com_1690": "STANDBY",
"lor1-app111181.prod.linkedin.com_1690": "STANDBY"
},

Here, for partitions 1 and 0, we have three hosts assigned roles for that partition. What we should do is, does the current host we're working with reside in these lists?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have written the comparison and retrieved the current host from manager.getInstanceName().

@kristyelee
Copy link
Author

I wrote the related unit test for this PR.
GitHub tests pass for program, except for one Integration test unrelated to my change.

@kristyelee kristyelee force-pushed the kristy_lee/650 branch 2 times, most recently from 6fd4c66 to 377770f Compare October 17, 2024 22:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG] Reconcile on disk state with assigned partitions for given resource and delete unused partitions
3 participants