Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handling multiple datasets on a single TCP input with integrations #11465

Open
Oddly opened this issue Oct 18, 2024 · 0 comments
Open

Handling multiple datasets on a single TCP input with integrations #11465

Oddly opened this issue Oct 18, 2024 · 0 comments

Comments

@Oddly
Copy link

Oddly commented Oct 18, 2024

Currently, the well-documented and supported approach for Elastic integrations is to use separate TCP input ports for each dataset. However, there's a lack of guidance on handling multiple datasets from a single TCP input within an integration.

While developing integrations, I've encountered the need to split incoming logs from one TCP port into different datasets, but I haven't found documentation or best practices on how to do this.

There are two potential approaches I see:

  1. Single Dataset filter:

    • Create one dataset with a TCP input that ingests all traffic.
    • Use agent processors to adjust the data_stream.dataset field based on the message content.
    • Create additional datasets with only assets (no working input) to receive and process events.
  2. Multiple Datasets on the same port:

    • Create multiple datasets, each with its own assets (index templates, ingest pipelines, datastreams).
    • Have all dataset inputs listen on the same port.
    • Use individual processors to filter data from the message field and drop what is not valid for this dataset.
    • Add data_stream.dataset key and value to the input config per dataset

At the moment, some integrations are solving this problem differently (see the f5_bigip integration), which is using one dataset with a redirection into different pipelines. This creates one data stream with many fields, which may lead to bad performance.

So my questions on this are:

  1. What is the recommended best practice for handling multiple datasets from a single TCP input?
  2. Where can we best document this properly?

Edit: Made the examples clearer

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant