Offload TaskResultsCompletionStatus from etcd to db or use compression to allow large worfklows (~100k) #13783
Labels
area/controller
Controller issues, panics
area/offloading
Node status offloading
type/feature
Feature request
Summary
I am currently evaluating argo-workflows a goto solution for scheduling tasks for my company. So far we really like it featurewise and we thing it is really good fit 👍
Problem is that it number of tasks is expected to be around 100k per workflow and so far I haven't manage to persuade argo to do that.
From what I've observed there is limitation imposed by maximum size of entity inside etcd db which is around 1.5 MB. From my testing this can be observed with following workflow
You can use it with
ytt -f <manifest_name> | kubectl create -f - -n <argo_namespace>
. This manifest will get stuck at around 19177/20177 mark.When I look at content of Workflow manifest it has states of each job inside it has jobs listed like this
Size of workflow manifest also roughly correlates to etcd limit:
Also when I decrease size of prefix I am able to schedule more jobs (around 80k with single character prefix)
What I am proposing is:
argo-workflows/pkg/apis/workflow/v1alpha1/workflow_types.go
Line 1955 in c9b1477
ALWAYS_OFFLOAD_NODE_STATUS
)Here is my current configuration for argo-workflows https://github.com/Hnatekmar/kubernetes/blob/a09391109103d5ff9036eed85fd05577fff1c654/manifests/applications/argo-workflows.yaml
Use Cases
When scheduling 100k or more jobs
Message from the maintainers:
Love this feature request? Give it a 👍. We prioritise the proposals with the most 👍.
The text was updated successfully, but these errors were encountered: