Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refine sysadvisor #86

Merged
merged 3 commits into from
Jun 7, 2023

Conversation

sun-yuliang
Copy link
Collaborator

What type of PR is this?

refactor & bugfix

Features/Bug fixes/Enhancements
see issue #85

What this PR does / why we need it:

see issue #85

Which issue(s) this PR fixes:

#85

Special notes for your reviewer:

pkg/agent/sysadvisor/types/types.go Outdated Show resolved Hide resolved
@@ -140,6 +158,20 @@ type RegionEntries map[string]*RegionInfo
// PodSet stores container names keyed by pod uid
type PodSet map[string]sets.String

// InternalCalculationResult conveys minimal information to cpu server for composing
// calculation result
type InternalCalculationResult struct {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if ControlKnob contains ControlKnobName more than cpuset, what should we do with InternalCalculationResult? do we need to redefine this structure again?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No in the short term, because cpu plugin only supports cpuset tweak. Even if we need to extend for more control knobs, adding more data members in InternalCalculationResult works.

}

// calculate headroom of non binding numas according to the corresponding reclaim pool entry
nonBindingNumasHeadroom, ok := ha.calculationResult.GetPoolEntry(state.PoolNameReclaim, cpuadvisor.FakedNUMAID)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

strongly not suggest to pass calculationResult this way, it's better to be stored in meta cache and fetch explicitly here, it may cause concurrent read and write, and it may cause abuse and nobody knows how to maintain this info

Copy link
Collaborator Author

@sun-yuliang sun-yuliang Jun 1, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed to get reclaim pool size from pool info in metacache, with a period delay compared to calculation result, ignorable under sliding window.

@@ -44,7 +40,7 @@ type QoSRegionDedicatedNumaExclusive struct {
func NewQoSRegionDedicatedNumaExclusive(ci *types.ContainerInfo, conf *config.Configuration, numaID int,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so, the difference between dedicated region and shared region, is only the initialization logic (to get previous cpu requirement from meta cache)? will these two regions differ in other functions?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. they have different control knobs: reclaimed cpu supplied v.s. non reclaimed cpuset size
  2. focus on different system indicator: cpi & mem latency & membw v.s. pool schedwait
  3. dynamic system indicator target takes effect only for dedicated region
  4. other minor difference under rama/borwein policy

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. they have different control knobs: reclaimed cpu supplied v.s. non reclaimed cpuset size
  2. focus on different system indicator: cpi & mem latency & membw v.s. pool schedwait
  3. dynamic system indicator target takes effect only for dedicated region
  4. other minor difference under rama/borwein policy

okay, so those will be implemented along with rama in another pr, not for this one, right?

Copy link
Collaborator Author

@sun-yuliang sun-yuliang Jun 1, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep. for now these regions seem similar.

@sun-yuliang sun-yuliang force-pushed the syl/refine-cpu-advisor branch 2 times, most recently from bb97c52 to 84f23db Compare June 5, 2023 11:20
@codecov
Copy link

codecov bot commented Jun 5, 2023

Codecov Report

Patch coverage: 53.01% and project coverage change: +1.55 🎉

Comparison is base (21d86d9) 51.30% compared to head (d16c0da) 52.86%.

Additional details and impacted files
@@            Coverage Diff             @@
##             main      #86      +/-   ##
==========================================
+ Coverage   51.30%   52.86%   +1.55%     
==========================================
  Files         318      354      +36     
  Lines       32418    34529    +2111     
==========================================
+ Hits        16632    18253    +1621     
- Misses      13840    14146     +306     
- Partials     1946     2130     +184     
Flag Coverage Δ
unittest 52.86% <53.01%> (+1.55%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
pkg/agent/evictionmanager/endpoint/endpoint.go 0.00% <ø> (ø)
...ins/cpu/dynamicpolicy/calculator/cpu_assignment.go 77.01% <ø> (ø)
...gins/cpu/dynamicpolicy/cpueviction/cpu_eviciton.go 0.00% <0.00%> (-63.25%) ⬇️
pkg/agent/qrm-plugins/cpu/dynamicpolicy/policy.go 43.73% <ø> (+5.38%) ⬆️
...ins/memory/dynamicpolicy/state/state_checkpoint.go 48.05% <0.00%> (ø)
...ourcemanager/fetcher/kubelet/topology/interface.go 0.00% <0.00%> (ø)
.../agent/resourcemanager/reporter/cnr/cnrreporter.go 61.20% <ø> (-3.06%) ⬇️
pkg/agent/resourcemanager/reporter/converter.go 33.33% <ø> (ø)
pkg/agent/resourcemanager/reporter/manager.go 51.37% <ø> (-6.38%) ⬇️
pkg/agent/resourcemanager/reporter/reporter.go 55.55% <ø> (ø)
... and 133 more

... and 11 files with indirect coverage changes

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

@sun-yuliang sun-yuliang changed the title WIP: refine sysadvisor refine sysadvisor Jun 5, 2023
@sun-yuliang sun-yuliang added the workflow/need-review review: test succeeded, need to review label Jun 5, 2023
@sun-yuliang sun-yuliang self-assigned this Jun 5, 2023
waynepeking348
waynepeking348 previously approved these changes Jun 5, 2023
waynepeking348
waynepeking348 previously approved these changes Jun 6, 2023
@sun-yuliang sun-yuliang added the workflow/merge-ready merge-ready: code is ready and can be merged label Jun 7, 2023
@waynepeking348 waynepeking348 merged commit 06be0f4 into kubewharf:main Jun 7, 2023
luomingmeng pushed a commit to luomingmeng/katalyst-core that referenced this pull request Oct 11, 2024
* refactor(sysadvisor): refine cpu advisor to improve clarity and fix several bugs

* refactor(sysadvisor): abstract provision and headroom assembler for extensibility

* test(sysadvisor): fix cpu advisor tests and bugs
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
workflow/merge-ready merge-ready: code is ready and can be merged workflow/need-review review: test succeeded, need to review
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants