Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Offload TaskResultsCompletionStatus from etcd to db or use compression to allow large worfklows (~100k) #13783

Open
Hnatekmar opened this issue Oct 18, 2024 · 6 comments
Assignees
Labels
area/controller Controller issues, panics area/offloading Node status offloading type/feature Feature request

Comments

@Hnatekmar
Copy link

Summary

I am currently evaluating argo-workflows a goto solution for scheduling tasks for my company. So far we really like it featurewise and we thing it is really good fit 👍
Problem is that it number of tasks is expected to be around 100k per workflow and so far I haven't manage to persuade argo to do that.

From what I've observed there is limitation imposed by maximum size of entity inside etcd db which is around 1.5 MB. From my testing this can be observed with following workflow

apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
  name: this-is-extremly-long-prefix-so-i-will-spam-etcd-with-this-i
spec:
  podGC:
    strategy: OnPodSuccess
    deleteDelayDuration: 0s
  entrypoint: e
  templates:
  - name: c
    inputs:
      parameters:
      - name: message
    container:
      image: alpine:3.7
      command: [echo, "{{inputs.parameters.message}}"]
  - name: e1
    steps:
      #@ for i in range(100):
      - - name: #@ "message" + str(i)
          template: c
          arguments:
            parameters:
              - name: message
                value: #@ "istep-" + str(i)
      #@ end
  - name: e
    dag:
      tasks:
    #@ for i in range(1000):
        - name: #@ "Step" + str(i)
          template: e1
    #@ end

You can use it with ytt -f <manifest_name> | kubectl create -f - -n <argo_namespace>. This manifest will get stuck at around 19177/20177 mark.

When I look at content of Workflow manifest it has states of each job inside it has jobs listed like this

      this-is-extremly-long-prefix-so-i-will-spam-etcd-with-this-i-4292625616: true
      this-is-extremly-long-prefix-so-i-will-spam-etcd-with-this-i-4293120823: true
      this-is-extremly-long-prefix-so-i-will-spam-etcd-with-this-i-4293149953: true
      this-is-extremly-long-prefix-so-i-will-spam-etcd-with-this-i-4293305504: true
      this-is-extremly-long-prefix-so-i-will-spam-etcd-with-this-i-4294093307: true
      this-is-extremly-long-prefix-so-i-will-spam-etcd-with-this-i-4294368260: true
      this-is-extremly-long-prefix-so-i-will-spam-etcd-with-this-i-4294498843: true

Size of workflow manifest also roughly correlates to etcd limit:

$ kubectl get workflow -n argo-workflows -o yaml  |  wc -c
 1694314

Also when I decrease size of prefix I am able to schedule more jobs (around 80k with single character prefix)

What I am proposing is:

  • change of format
    TaskResultsCompletionStatus map[string]bool `json:"taskResultsCompletionStatus,omitempty" protobuf:"bytes,20,opt,name=taskResultsCompletionStatus"`
    to single key with base64 compressed string
  • Or we can offload this to db (when enabled I don't thing anyone will try this without ALWAYS_OFFLOAD_NODE_STATUS)

Here is my current configuration for argo-workflows https://github.com/Hnatekmar/kubernetes/blob/a09391109103d5ff9036eed85fd05577fff1c654/manifests/applications/argo-workflows.yaml

Use Cases

When scheduling 100k or more jobs


Message from the maintainers:

Love this feature request? Give it a 👍. We prioritise the proposals with the most 👍.

@Hnatekmar Hnatekmar added the type/feature Feature request label Oct 18, 2024
@Hnatekmar
Copy link
Author

Also it disappeared but I am willing to work on this :) Just need to discuss how it should be done

@Hnatekmar
Copy link
Author

#7121 seems to be relevant to this issue but it seems discussion gravited towards db optimisations. Which won't solve this issue

@shuangkun
Copy link
Member

Duplicate with #13213

@shuangkun shuangkun self-assigned this Oct 18, 2024
@shuangkun
Copy link
Member

I have also encountered this problem before. I implemented ALWAYS_OFFLOAD_TASK—RESULT_STATUS. If the maintainers think this requirement is reasonable, I can contribute here.

@Joibel
Copy link
Member

Joibel commented Oct 18, 2024

Offloading task result status to the database along with node status seems like a reasonable thing to do.

I haven't looked into it, but it felt like it might be possible to:

  • Delete workflowTaskResults using a workerpool as they were marked as completed and copied into the node status.
  • Then delete the taskResultStatus entries for those workflowTaskResults once we knew they were all completed as we're only tracking them to ensure we collect them all, once collected they're done with.

This might be a better solution than offloading, WDYT @shuangkun?

These might be two separate things.

@shuangkun
Copy link
Member

At that time, the task result status field was introduced to have a key function to help determine some status of the workflow, such as whether all tasks are completed (convenient for GC, whether the workflow completion status can be set), and whether the task output parsing in the previous step is completed (starting the next pod) , I am not sure whether some task result status can achieve this effect, may need to think about it.

@agilgur5 agilgur5 added area/offloading Node status offloading area/controller Controller issues, panics labels Oct 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/controller Controller issues, panics area/offloading Node status offloading type/feature Feature request
Projects
None yet
Development

No branches or pull requests

4 participants