Roadmap

Fluent forward

Extend the fluent forward exporter to support our TC -> LO use case.

Field customizations

We need a way to provide customization capability for resources the controller creates. The preferred way would be to use the typeoverride solution that we already have for SyslogNG

Enable multiple sources

Currently only one collector can manage a tenant which we enforce through the tenant status. We want to allow however multiple different external or internal sources to implement the same tenancy rules. The idea to implement it is to dedicate the current Controller resource to the Kubernetes log collection use case and introduce separate CRDs for use cases such as receiving telemetry from external sources (where we process not just logs but metrics and traces as well). Even for the Kubernetes collector there is a use case we can think about where the one to many relationship implemented currently is too limited, because we would need multiple connector to be able to implement the global tenant configuration. (the use case is the multiple isolated node groups with a single global infra tenant)

Key flattening

The problem we are facing is that the current way to include pod labels (and other resource attributes) is suboptimal in certain cases

Resource SchemaURL:
Resource attributes:
     -> k8s.container.name: Str(log-generator)
     -> k8s.namespace.name: Str(tenant-demo-2)
     -> k8s.pod.name: Str(log-generator-7ff5bb5c6f-624pp)
     -> k8s.container.restart_count: Str(0)
     -> k8s.pod.uid: Str(a39573bd-f899-491f-a37a-3e8e98c5b003)
     -> k8s.pod.labels.app.kubernetes.io/instance: Str(log-generator)
     -> k8s.pod.labels.app.kubernetes.io/name: Str(log-generator)
     -> k8s.pod.start_time: Str(2024-03-21T10:38:02Z)
     -> k8s.node.name: Str(loki)
     -> k8s.pod.labels.pod-template-hash: Str(7ff5bb5c6f)
     -> k8s.deployment.name: Str(log-generator)
     -> loki.resource.labels: Str(k8s.pod.name, k8s.namespace.name)

Persistent buffering and file position

https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/main/receiver/filelogreceiver/README.md

pending upstream fix:

Qs:

buffer metrics?
PVCs (or any alternative) with daemonsets: https://kubernetes.io/docs/concepts/storage/volumes/#local

Verify and fix the backpressure problem

When we deal with lots of outputs, one slow output can fill up the queues. If queues are limited there will be backpressure. If there is backpressure the source will stop. The idea here is to use separate receivers per tenant, but this need to be verified.

Hot reload

Look at how hot reload could improve the configuration update flow.

Support metrics and traces (discovery)

We want a PoC first through a discovery session. Metrics and traces will most probably require separate pipelines.

Evolve the subscription filtering API

Currently we use OTTL to demonstrate the capabilities of the subscription filter, but we want to avoid that on the long run for security and operational maintainability reasons.

A tangible example: instead of using OTTL the user should provide kubernetes labels for example as filter expressions, which should be validated through the API, a webhook or the controller itself.

Metrics

We lack a complete solution for collecting byte metrics, although we plan to use the count connector already. There is another approach that doesn't involve duplicating logs which is implemented in bindplane: https://github.com/observIQ/bindplane-agent/tree/release/v1.43.0/processor/metricextractprocessor

We have to keep considering both approaches until we can have a good measurement.

Qs

understand how opamp provides as of metrics

Support other log sources

host logs
file based logs through a managed sidecar container
logs sent to a network/otlp endpoint directly
kubernetes event log
metrics and traces

Docker container runtime support

Currently the receiver configuration is tuned to support containerd only.

Sort term: add a note in the docs that it only works with containerd for now Idea to investigate: setup fallback parser to support both

Optimization by merging subscriptions

We could possibly optimize for the case when subscriptions have lots of overlap in their labelselector, thus might be sending the same data multiple times to the same destination. Instead of using a routerconnector for subscriptions we could possibly use a single pipeline to add all the subscriptions as subsequent processors and then use a routerconnector for the messages already labeled with the subscription id to route them to the right output.

Configcheck

Go with the simplest possible solution.

Existing alternatives currently (and possible improvement ideas)

silly config check is available by default
there is an option in the collector for syntax check, not implemented for the operator
implementing a full config check by running an isolated job (probably not needed for our scenario, more for an aggregator where custom configs are applied by the user)

See the following issue: https://github.com/open-telemetry/opentelemetry-collector/issues/4205

Backlog items

Output secret management
OTTL elimination from Subscription API
Output API revamp (OTLP/Loki/Fluent)
Shared output
Variable size collector to support different node sizes -> daemonset does not support it
- Multiple daemonset for a single collector vs multiple collectors
Loki label -> index support

Provide feedback

Saved searches

Use saved searches to filter your results more quickly