Offload TaskResultsCompletionStatus from etcd to db or use compression to allow large worfklows (~100k) #13783

Hnatekmar · 2024-10-18T07:40:33Z

Summary

I am currently evaluating argo-workflows a goto solution for scheduling tasks for my company. So far we really like it featurewise and we thing it is really good fit 👍
Problem is that it number of tasks is expected to be around 100k per workflow and so far I haven't manage to persuade argo to do that.

From what I've observed there is limitation imposed by maximum size of entity inside etcd db which is around 1.5 MB. From my testing this can be observed with following workflow

apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
  name: this-is-extremly-long-prefix-so-i-will-spam-etcd-with-this-i
spec:
  podGC:
    strategy: OnPodSuccess
    deleteDelayDuration: 0s
  entrypoint: e
  templates:
  - name: c
    inputs:
      parameters:
      - name: message
    container:
      image: alpine:3.7
      command: [echo, "{{inputs.parameters.message}}"]
  - name: e1
    steps:
      #@ for i in range(100):
      - - name: #@ "message" + str(i)
          template: c
          arguments:
            parameters:
              - name: message
                value: #@ "istep-" + str(i)
      #@ end
  - name: e
    dag:
      tasks:
    #@ for i in range(1000):
        - name: #@ "Step" + str(i)
          template: e1
    #@ end

You can use it with ytt -f <manifest_name> | kubectl create -f - -n <argo_namespace>. This manifest will get stuck at around 19177/20177 mark.

When I look at content of Workflow manifest it has states of each job inside it has jobs listed like this

      this-is-extremly-long-prefix-so-i-will-spam-etcd-with-this-i-4292625616: true
      this-is-extremly-long-prefix-so-i-will-spam-etcd-with-this-i-4293120823: true
      this-is-extremly-long-prefix-so-i-will-spam-etcd-with-this-i-4293149953: true
      this-is-extremly-long-prefix-so-i-will-spam-etcd-with-this-i-4293305504: true
      this-is-extremly-long-prefix-so-i-will-spam-etcd-with-this-i-4294093307: true
      this-is-extremly-long-prefix-so-i-will-spam-etcd-with-this-i-4294368260: true
      this-is-extremly-long-prefix-so-i-will-spam-etcd-with-this-i-4294498843: true

Size of workflow manifest also roughly correlates to etcd limit:

$ kubectl get workflow -n argo-workflows -o yaml  |  wc -c
 1694314

Also when I decrease size of prefix I am able to schedule more jobs (around 80k with single character prefix)

What I am proposing is:

change of format

argo-workflows/pkg/apis/workflow/v1alpha1/workflow_types.go

Line 1955 in c9b1477

TaskResultsCompletionStatus map[string]bool `json:"taskResultsCompletionStatus,omitempty" protobuf:"bytes,20,opt,name=taskResultsCompletionStatus"`

to single key with base64 compressed string
Or we can offload this to db (when enabled I don't thing anyone will try this without ALWAYS_OFFLOAD_NODE_STATUS)

Here is my current configuration for argo-workflows https://github.com/Hnatekmar/kubernetes/blob/a09391109103d5ff9036eed85fd05577fff1c654/manifests/applications/argo-workflows.yaml

Use Cases

When scheduling 100k or more jobs

Message from the maintainers:

Love this feature request? Give it a 👍. We prioritise the proposals with the most 👍.

The text was updated successfully, but these errors were encountered:

Hnatekmar · 2024-10-18T07:42:01Z

Also it disappeared but I am willing to work on this :) Just need to discuss how it should be done

Hnatekmar · 2024-10-18T09:48:57Z

#7121 seems to be relevant to this issue but it seems discussion gravited towards db optimisations. Which won't solve this issue

shuangkun · 2024-10-18T11:13:45Z

Duplicate with #13213

shuangkun · 2024-10-18T11:18:27Z

I have also encountered this problem before. I implemented ALWAYS_OFFLOAD_TASK—RESULT_STATUS. If the maintainers think this requirement is reasonable, I can contribute here.

Joibel · 2024-10-18T13:58:42Z

Offloading task result status to the database along with node status seems like a reasonable thing to do.

I haven't looked into it, but it felt like it might be possible to:

Delete workflowTaskResults using a workerpool as they were marked as completed and copied into the node status.
Then delete the taskResultStatus entries for those workflowTaskResults once we knew they were all completed as we're only tracking them to ensure we collect them all, once collected they're done with.

This might be a better solution than offloading, WDYT @shuangkun?

These might be two separate things.

shuangkun · 2024-10-18T14:21:12Z

At that time, the task result status field was introduced to have a key function to help determine some status of the workflow, such as whether all tasks are completed (convenient for GC, whether the workflow completion status can be set), and whether the task output parsing in the previous step is completed (starting the next pod) , I am not sure whether some task result status can achieve this effect, may need to think about it.

Hnatekmar added the type/feature Feature request label Oct 18, 2024

shuangkun self-assigned this Oct 18, 2024

agilgur5 added area/offloading Node status offloading area/controller Controller issues, panics labels Oct 18, 2024

agilgur5 mentioned this issue Oct 18, 2024

Offload TaskResultStatus to db with Large workflow #13213

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Offload TaskResultsCompletionStatus from etcd to db or use compression to allow large worfklows (~100k) #13783

Offload TaskResultsCompletionStatus from etcd to db or use compression to allow large worfklows (~100k) #13783

Hnatekmar commented Oct 18, 2024

Hnatekmar commented Oct 18, 2024

Hnatekmar commented Oct 18, 2024

shuangkun commented Oct 18, 2024

shuangkun commented Oct 18, 2024

Joibel commented Oct 18, 2024

shuangkun commented Oct 18, 2024

Offload TaskResultsCompletionStatus from etcd to db or use compression to allow large worfklows (~100k) #13783

Offload TaskResultsCompletionStatus from etcd to db or use compression to allow large worfklows (~100k) #13783

Comments

Hnatekmar commented Oct 18, 2024

Summary

Use Cases

Hnatekmar commented Oct 18, 2024

Hnatekmar commented Oct 18, 2024

shuangkun commented Oct 18, 2024

shuangkun commented Oct 18, 2024

Joibel commented Oct 18, 2024

shuangkun commented Oct 18, 2024