Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC] Renovate Streaming Support #5910

Open
yurishkuro opened this issue Aug 31, 2024 · 0 comments
Open

[RFC] Renovate Streaming Support #5910

yurishkuro opened this issue Aug 31, 2024 · 0 comments
Labels
changelog:new-feature Change that should be called out as new feature in CHANGELOG

Comments

@yurishkuro
Copy link
Member

Background

One of the challenges of distributed tracing is that spans can arrive from all kinds of places in the architecture at different times. If your only job is to store them (which is what Jaeger collector does primarily) then it's not a big problem, since the storage backends take care of partitioning and indexing the spans by trace-id. But the most interesting applications of traces require looking at a whole trace in one place to make decisions based on the overall call graph, not on individual spans.

Data Streaming is great at doing that. Historically Jaeger supported a couple of Java-based data pipelines (for basic dependency graph and for transitive dependency graph), which were implemented independently on top of Spark and Flink frameworks. There were problems with that approach:

  • The business logic had to be written in Java, meaning we could not reuse all the domain model capabilities we had in the primary Go code
  • We had to duplicate some of the logic, e.g. the all-in-one supported constructing a dependency graph on the fly, which was implemented completely independently from the Java Spark job.
  • The https://github.com/jaegertracing/spark-dependencies and https://github.com/jaegertracing/jaeger-analytics-flink repos had seen very little changes, the latter doesn't even have a production-grade way of running it

Proposal

We should bring streaming capabilities into the main Jaeger repo using Go code. This will address many of the problems mentioned above. The main challenge with data streaming is that it is a stateful activity, which requires checkpointing capabilities to avoid data loss and inconsistent results when Jaeger instances are restarted. This is where the well known streaming frameworks like Spark and Flink come in - they provide the needed orchestration and statefulness. In the past we could not use them with Go, but today there are projects like Apache Beam that provide a unified programming model via well supported SDK (including Go) that allows implementing the pipeline logic in Go and executing it on a number of runtimes

image

@dosubot dosubot bot added the changelog:new-feature Change that should be called out as new feature in CHANGELOG label Aug 31, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
changelog:new-feature Change that should be called out as new feature in CHANGELOG
Projects
None yet
Development

No branches or pull requests

1 participant