-
Notifications
You must be signed in to change notification settings - Fork 104
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Can katalyst add the scheduling function for LLC and memory bandwidth resources? #102
Comments
Yes, the isolation of LLC and memory bandwidth is also on the roadmap. However, they may not be regarded as resources scheduled by the scheduler, but to avoid resource competition through interference detection and suppression. |
Hello, how do you ensure that LLC and memory bandwidth are the resource bottlenecks? |
I found that when we co-locate some online&offline memory-intensive pods, LLC and memory bandwidth are likely to become bottlenecks for online pods, if we assign more LLC and memory bandwidth to online pods, they will perform better. We can use Intel's CAT and MBA technologies to evict offline memory-intensive pods sometimes. |
I am also considering similar issues, but there are several issues that need to be thought about and solved: 1. For offline workloads, it is necessary to profile in advance to judge whether LLC and memory bandwidth will have a huge impact on them. Only through monitoring, it is difficult to determined whether it is memory-intensive if the offline workload is complex. 2. Since the LLC and memory bandwidth resources of a single node are limited, if too many resources are allocated to the online workload, it will cause a waste of resources. If it is only evict, can you use pod affinity to solve the problem? I hope to have more in-depth communication with you! |
For the first issue, that's actually a complex problem, we have to profile in advance to get pod's sensitivity of LLC and MB and label the pod; for the second issue, I think it's feasible to use pod affinity to solve the problem, pods deployed in the same socket will share LLC and MB as we know, for example, if we wanna deploy a pod sensitive to LLC, we should avoid deploying it to the socket where there is already an LLC-sensitive pod, we can avoid LLC competition in this way and not need to design eviction policy. It is worth mentioning that katalyst will realize this ability soon: #220 |
Good job! |
What would you like to be added?
Not only CPU cores, LLC and memory bandwidth resources are bottleneck resources sometimes, can you add scheduling functions for LLC and MB?
Why is this needed?
LLC and memory bandwidth resources are bottleneck resources sometimes.
The text was updated successfully, but these errors were encountered: