Graceful shutdown of a single subscription #1201

svroonland · 2024-03-24T20:45:01Z

Implements functionality for gracefully stopping a stream for a single subscription: stop fetching records for the assigned topic-partitions but keep being subscribed so that offsets can still be committed. Intended to replace stopConsumption, which did not support multiple-subscription use cases.

Implements some of #941.

We should deprecate stopConsumption before releasing.

erikvanoosten

I didn't look at the implementation yet, only docs and tests.

erikvanoosten · 2024-03-30T14:37:23Z

docs/consuming-kafka-topics-using-zio-streams.md

+
+## Controlled shutdown
+
+The examples above will keep processing records forever, or until the fiber is interrupted, typically at application shutdown. When interrupted, in-flight records will not be processed fully through all stream stages and offsets may not be committed. For fast shutdown in an at-least-once processing scenario this is fine. zio-kafka also supports a _graceful shutdown_, where the fetching of records for the subscribed topics/partitions is stopped, the streams are ended and all downstream stages are completed, allowing all in-flight messages to be fully processed.


allowing all in-flight messages to be fully processed.

I suggest we change these words to:

allowing commits for in-flight records to complete.

Let's continue to use the word 'records'.

Removed the word 'all' because when an interrupt is received, internally queued records are dropped and not passed to the consumer stream.

The remaining in-flight records are already passed to the consumer so processing already commenced. The only relevant operation from zio-kafka point of view is that the records can be committed.

docs/consuming-kafka-topics-using-zio-streams.md

erikvanoosten · 2024-03-30T14:44:25Z

docs/consuming-kafka-topics-using-zio-streams.md

+
+ZIO.scoped {
+  for {
+    streamControl <- Consumer.partitionedStreamWithControl(Subscription.topics("topic150"), Serde.string, Serde.string)


Could this be layed out for smaller screens please?

docs/consuming-kafka-topics-using-zio-streams.md

erikvanoosten · 2024-03-30T14:48:34Z

zio-kafka-test/src/test/scala/zio/kafka/consumer/ConsumerSpec.scala

+            client <- randomClient
+
+            keepProducing <- Ref.make(true)
+            _             <- produceOne(topic, "key", "value").repeatWhileZIO(_ => keepProducing.get).fork


You can also use test helper method scheduledProduce.

zio-kafka-test/src/test/scala/zio/kafka/consumer/ConsumerSpec.scala

erikvanoosten

Still need more time to digest this.

docs/consuming-kafka-topics-using-zio-streams.md

zio-kafka/src/main/scala/zio/kafka/consumer/Consumer.scala

zio-kafka/src/main/scala/zio/kafka/consumer/SubscriptionStreamControl.scala

svroonland · 2024-04-03T08:26:00Z

Hmm, should we instead of this:

Consumer.runWithGracefulShutdown(Consumer.partitionedStreamWithControl(Subscription.topics("topic150"), Serde.string, Serde.string)) { 
  stream => ... 
}

offer this:

Consumer.partitionedStreamWithGracefulShutdown(Subscription.topics("topic150"), Serde.string, Serde.string) {
  (stream, _) => stream.flatMapPar(...) 
}

The second parameter would be the SubscriptionStreamControl, which you could always manually call stop on. Or would that prevent certain use cases.. 🤔

erikvanoosten · 2024-04-03T08:55:26Z

Hmm, should we instead of this:

If I understand it correctly, the proposal allows for more use cases; with it you can also call stop for any condition you want. Is it true that after stopping, you can start consuming again?

svroonland · 2024-04-03T09:03:14Z

Well, I mean compared to just the partitionedStreamWithControl method. In both cases you would need to do something with the stream that ultimately reduces to a ZIO of Any, so I don't think the partitionedStreamWithGracefulShutdown is limiting in that regard.

stop currently doesn't support that, since the stream would then be finished. We could probably build pause and resume like in #941.

erikvanoosten · 2024-04-03T09:11:20Z

If resume after stop is not supported (and never will be), then I like the first proposal better where you don't need to call stop. What would you do after calling stop?

svroonland · 2024-04-03T09:40:37Z

Well, in both proposals you can call stop.

I don't think you want to do anything after stop, but it would give you more explicit control when to stop, instead of when the scope ends.

We probably need to decide if we want to add pause/resume in the future. If we do, we should add the control parameter like in the partitionedStreamWithGracefulShutdown example for future compatibility. If we don't, we can drop it altogether and make SubscriptionStreamControl a purely internal concept (if at all).

guizmaii · 2024-04-05T03:11:38Z

zio-kafka/src/main/scala/zio/kafka/consumer/Consumer.scala


  override def plainStream[R, K, V](
    subscription: Subscription,
    keyDeserializer: Deserializer[R, K],
    valueDeserializer: Deserializer[R, V],
-    bufferSize: Int
+    bufferSize: RuntimeFlags


This is probably an IDE mistake

Suggested change

bufferSize: RuntimeFlags

bufferSize: Int

Yeah, happens to me all the time as well.

guizmaii · 2024-04-05T03:30:45Z

Hey :)

Thanks for the great work!

Here's some initial feedback:

I'm not a big fan of the SubscriptionStreamControl implementation.

To me, functions/methods returning it should return a Tuple (stream, control).
It avoids adding one more concept for our users to understand and learn (Kafka already has a lot of concepts)
It also simplifies the interface of the control type, the current one with the [S <: ZStream[_, _, _]] being complex
It also simplifies the return type of our functions/methods, avoiding this kind of type:

SubscriptionStreamControl[Stream[Throwable, Chunk[(TopicPartition, ZStream[R, Throwable, CommittableRecord[K, V]])]]]

in favor of:

(Stream[Throwable, Chunk[(TopicPartition, ZStream[R, Throwable, CommittableRecord[K, V]])], SubscriptionStreamControl)

Made the change in a PR to show/study how, to me, it simplifies things: https://github.com/zio/zio-kafka/pull/1207/files

guizmaii · 2024-04-05T03:50:17Z

zio-kafka/src/main/scala/zio/kafka/consumer/internal/RunloopAccess.scala

-    } yield stream
+    } yield SubscriptionStreamControl(
+      ZStream.fromQueue(partitionAssignmentQueue),
+      withRunloopZIO(requireRunning = true)(_.stopSubscribedTopicPartitions(subscription)) *> partitionAssignmentQueue


Suggested change

withRunloopZIO(requireRunning = true)(_.stopSubscribedTopicPartitions(subscription)) *> partitionAssignmentQueue

withRunloopZIO(requireRunning = false)(_.stopSubscribedTopicPartitions(subscription)) *> partitionAssignmentQueue

Not sure you want to use true here. true means: if the runloop is not running, start it and apply the stopSubscribedTopicPartitions function.
In your case, IMO, if the Runloop isn't running, calling the control.stop should be a no-op, which will be the case if you use false instead

Also, do we want to execute the partitionAssignmentQueue.offer(Take.end).ignore code if the Runloop isn't running? If not, then something like this would be more appropriate:

withRunloopZIO(requireRunning = false) { runloop => runloop.stopSubscribedTopicPartitions(subscription) *> partitionAssignmentQueue.offer(Take.end).ignore }

In your case, IMO, if the Runloop isn't running, calling the control.stop should be a no-op

We need to stop the subscription also when runloop isn't running. Right?

zio-kafka/src/main/scala/zio/kafka/consumer/internal/RunloopAccess.scala

guizmaii · 2024-04-05T04:23:04Z

Didn't finish my review yet. I still have some parts of the code to explore/understand, but I have to go. I'll finish it later 🙂

svroonland · 2024-04-05T06:35:18Z

Thanks for the feedback Jules. Agreed about the extra concept that would be unwanted. Check out my latest interface proposal where there is only a plainStreamWithGracefulShutdown method and SubscriptionStreamControl remains hidden.

erikvanoosten

Still reading the code...

zio-kafka/src/main/scala/zio/kafka/consumer/SubscriptionStreamControl.scala

erikvanoosten · 2024-04-07T06:29:41Z

zio-kafka/src/main/scala/zio/kafka/consumer/SubscriptionStreamControl.scala

+ * @tparam S
+ *   Type of the stream returned from [[stream]]


Actually, thinking about more, the trait doesn't really care what S is, or that it is even a stream at all. That means another abstraction might be hidden ('Stoppable'?). Abstracting further is hard though; the definition of stop is pretty specific.

I also noticed that the stop method is defined in terms of the consumer and is not related to the stream. Should that be the case? Shouldn't this stop only relate to the referred stream?

I am trying to weigh this form against @guizmaii 's proposal in #1207. I am no longer certain which one I like more.

erikvanoosten · 2024-04-07T06:33:32Z

zio-kafka/src/main/scala/zio/kafka/consumer/Consumer.scala


  override def plainStream[R, K, V](
    subscription: Subscription,
    keyDeserializer: Deserializer[R, K],
    valueDeserializer: Deserializer[R, V],
-    bufferSize: Int
+    bufferSize: RuntimeFlags


Yeah, happens to me all the time as well.

erikvanoosten · 2024-04-07T07:23:05Z

zio-kafka/src/main/scala/zio/kafka/consumer/Consumer.scala

@@ -70,6 +70,22 @@ trait Consumer {
    valueDeserializer: Deserializer[R, V]
  ): Stream[Throwable, Chunk[(TopicPartition, ZStream[R, Throwable, CommittableRecord[K, V]])]]

+  /**
+   * Like [[partitionedAssignmentStream]] but wraps the stream in a construct that ensures graceful shutdown


Suggested change

* Like [[partitionedAssignmentStream]] but wraps the stream in a construct that ensures graceful shutdown

* Like [[partitionedAssignmentStream]] but wraps the stream in a construct that ensures graceful shutdown.

erikvanoosten · 2024-04-07T07:24:13Z

zio-kafka/src/main/scala/zio/kafka/consumer/Consumer.scala

@@ -93,6 +109,22 @@ trait Consumer {
    valueDeserializer: Deserializer[R, V]
  ): Stream[Throwable, (TopicPartition, ZStream[R, Throwable, CommittableRecord[K, V]])]

+  /**
+   * Like [[partitionedStream]] but wraps the stream in a construct that ensures graceful shutdown


Suggested change

* Like [[partitionedStream]] but wraps the stream in a construct that ensures graceful shutdown

* Like [[partitionedStream]] but wraps the stream in a construct that ensures graceful shutdown.

erikvanoosten · 2024-04-07T07:29:22Z

zio-kafka/src/main/scala/zio/kafka/consumer/Consumer.scala

+  /**
+   * Like [[plainStream]] but wraps the stream in a construct that ensures graceful shutdown
+   */


Since this method might be the most attractive way to use zio-kafka, lets extend the documentation a bit.

e.g.

Suggested change

/**

* Like [[plainStream]] but wraps the stream in a construct that ensures graceful shutdown

*/

/**

* Like [[plainStream]] but wraps the stream in a construct that ensures graceful shutdown. During a graceful shutdown the consumer is stopped but the stream can complete processing and commit already fetched records.

*

* Example usage:

* {{{

* ... todo ...

* }}}

*/

We could also include a reference to the documentation (though I am always extremely happy when the scaladocs are all you need, e.g. scalatest documentation is my benchmark).

erikvanoosten · 2024-04-07T07:54:20Z

zio-kafka/src/main/scala/zio/kafka/consumer/internal/Runloop.scala

@@ -75,6 +75,13 @@ private[consumer] final class Runloop private (
  private[internal] def removeSubscription(subscription: Subscription): UIO[Unit] =
    commandQueue.offer(RunloopCommand.RemoveSubscription(subscription)).unit

+  private[internal] def stopSubscribedTopicPartitions(subscription: Subscription): UIO[Unit] =


This method is stopping (actually ending) streams, not subscriptions. WDYT of:

Suggested change

private[internal] def stopSubscribedTopicPartitions(subscription: Subscription): UIO[Unit] =

private[internal] def endStreams(subscription: Subscription): UIO[Unit] =

And similarly rename RunloopCommand.StopSubscribedTopicPartitions to RunloopCommand.EndStreamsBySubscription

erikvanoosten · 2024-04-07T08:17:14Z

I understand now that when graceful shutdown starts we're ending the subscribed streams. That should work nicely. Lets work out what will happen next to the runloop. The runloop would still be happily fetching records for that stream. When those are offered to the stream, PartitionStreamControl.offerRecords will probably append those records to the queue (even though it now also contains an 'end' token). Because of the 'end' token that is already in that queue, these new records will never be taken out. Back pressure will kick in (depending on the fetch strategy) and the partitions will be paused. Once we're unsubscribed, 15 seconds later, the queue will be garbage collected. So far so good.

We can do slightly better though. We're fetching and storing all these records in the queue for nothing, even potentially causing an OOM for systems that are tuned for the case where processing happens almost immediately.

My proposal is to:

stop accepting more records in PartitionStreamControl.offerRecords when the queue was ended
in Runloop.handlePoll only pass running streams to fetchStrategy.selectPartitionsToFetch so that partitions for ended streams are immediately paused

If you want, I can extend this PR with that proposal (or create a separate PR).

svroonland · 2024-04-07T11:02:54Z

@erikvanoosten If you have some time to implement those two things, by all means.

erikvanoosten · 2024-04-13T15:55:34Z

@erikvanoosten If you have some time to implement those two things, by all means.

@svroonland Done in commit 1218204.

Now I am wondering, how can we test this?

svroonland · 2024-04-14T15:45:54Z

Change looks good. Totally forgot to implement this part.

…for comprehension

erikvanoosten · 2024-04-29T18:21:47Z

Depends on zio/zio#8804.

…-control

By using ZIO.async, we no longer need a reference to the zio runtime, nor do we need the `exec` trickery anymore.

Also: fix typo and make metric descriptions consistent.

When many hundreds of partitions need to be consumed, an excessive amount of heap can be used for pre-fetching. The `ManyPartitionsQueueSizeBasedFetchStrategy` works similarly as the default `QueueSizeBasedFetchStrategy` but limits total memory usage.

Refactoring of the producer so that it handles errors per record.

Zio-kafka applications always pre-fetch data so that user streams can process the data asynchronously. This is not compatible with auto commit. When auto commit is enabled, the consumer will automatically commit batches _before_ they are processed by the user streams. An unaware user might accidentally enable auto commit and lose data during rebalances. Solves #1289.

## About this PR 📦 Updates [org.scalameta:scalafmt-core](https://github.com/scalameta/scalafmt) from `3.8.2` to `3.8.3` 📜 [GitHub Release Notes](https://github.com/scalameta/scalafmt/releases/tag/v3.8.3) - [Version Diff](scalameta/scalafmt@v3.8.2...v3.8.3) ## Usage ✅ **Please merge!** I'll automatically update this PR to resolve conflicts as long as you don't change it yourself. If you'd like to skip this version, you can just close this PR. If you have any feedback, just mention me in the comments below. Configure Scala Steward for your repository with a [`.scala-steward.conf`](https://github.com/scala-steward-org/scala-steward/blob/767fcfecbfd53c507152f6cf15c846176bae561d/docs/repo-specific-configuration.md) file. _Have a fantastic day writing Scala!_ <details> <summary>⚙ Adjust future updates</summary> Add this to your `.scala-steward.conf` file to ignore future updates of this dependency: ``` updates.ignore = [ { groupId = "org.scalameta", artifactId = "scalafmt-core" } ] ``` Or, add this to slow down future updates of this dependency: ``` dependencyOverrides = [{ pullRequests = { frequency = "30 days" }, dependency = { groupId = "org.scalameta", artifactId = "scalafmt-core" } }] ``` </details> <sup> labels: library-update, early-semver-patch, semver-spec-patch, commit-count:1 </sup> Co-authored-by: zio-scala-steward[bot] <145262613+zio-scala-steward[bot]@users.noreply.github.com>

…-control

svroonland changed the title ~~Subscription stream control~~ Graceful shutdown of a single subscription Mar 30, 2024

svroonland marked this pull request as ready for review March 30, 2024 11:07

svroonland requested a review from erikvanoosten as a code owner March 30, 2024 11:07

erikvanoosten reviewed Mar 30, 2024

View reviewed changes

svroonland mentioned this pull request Apr 4, 2024

Slow (timeout) shutdown and rebalancing with withRebalanceSafeCommits #1132

Open

guizmaii reviewed Apr 5, 2024

View reviewed changes

zio-kafka/src/main/scala/zio/kafka/consumer/internal/RunloopAccess.scala Outdated Show resolved Hide resolved

erikvanoosten reviewed Apr 7, 2024

View reviewed changes

erikvanoosten mentioned this pull request Apr 10, 2024

[Idea] pause/resume/stop Stream and remove stopConsumption #941

Open

svroonland added 6 commits April 27, 2024 09:42

Draft of interface changes

7a9a6f8

Remove deprecated for now

c8196b3

Draft implementation

583d112

Fix implementation + first test

f55c898

More tests copied from stopConsumption

6f86958

Alternative interface, workaround inability to deconstruct tuples in …

9eb989b

…for comprehension

svroonland and others added 6 commits April 27, 2024 09:44

requireRunning false

c9e48ab

Log unexpected interruption

5d334e0

Log more

60464b6

Use partitionedStream

258168d

Update pendingRequests and assignedStreams

6e25cd4

Formatting

8d19e16

erikvanoosten force-pushed the subscription-stream-control branch from 958b0a5 to 8d19e16 Compare April 27, 2024 07:52

Fix linting

f1edcab

svroonland and others added 6 commits May 10, 2024 17:50

Merge remote-tracking branch 'origin/master' into subscription-stream…

f19417b

…-control

Merge branch 'master' into subscription-stream-control

5365aa0

Merge branch 'master' into subscription-stream-control

f977001

This works

c54b3e9

This works with timeout

33a6d82

Remove unused annotation

a155267

svroonland marked this pull request as draft May 23, 2024 05:32

svroonland and others added 14 commits June 5, 2024 12:48

Merge remote-tracking branch 'origin/master' into subscription-stream…

15e041f

…-control

Merge remote-tracking branch 'origin/master' into subscription-stream…

7c21f82

…-control

Merge branch 'master' into subscription-stream-control

f5e42c5

Small improvements to the Producer (#1272)

9a31569

By using ZIO.async, we no longer need a reference to the zio runtime, nor do we need the `exec` trickery anymore.

Document metrics and consumer tuning based on metrics (#1280)

27f033e

Also: fix typo and make metric descriptions consistent.

Alternative producer implementation (#1285)

eaae8af

Refactoring of the producer so that it handles errors per record.

Upgrade to 2.1.7+11-854102ae-SNAPSHOT with ZStream finalization fix

bbbfe48

Add sonatype snapshots

a75a78e

Bump ZIO version

5fec195

Merge remote-tracking branch 'origin/master' into subscription-stream…

699e6e8

…-control

Revert stuff

aafd4ec

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Graceful shutdown of a single subscription #1201

Graceful shutdown of a single subscription #1201

svroonland commented Mar 24, 2024 •

edited

Loading

erikvanoosten left a comment

erikvanoosten Mar 30, 2024 •

edited

Loading

erikvanoosten Mar 30, 2024

erikvanoosten Mar 30, 2024

erikvanoosten left a comment

svroonland commented Apr 3, 2024 •

edited

Loading

erikvanoosten commented Apr 3, 2024

svroonland commented Apr 3, 2024

erikvanoosten commented Apr 3, 2024

svroonland commented Apr 3, 2024

guizmaii Apr 5, 2024

erikvanoosten Apr 7, 2024

guizmaii commented Apr 5, 2024 •

edited

Loading

guizmaii Apr 5, 2024 •

edited

Loading

erikvanoosten Apr 7, 2024

guizmaii commented Apr 5, 2024

svroonland commented Apr 5, 2024

erikvanoosten left a comment

erikvanoosten Apr 7, 2024

erikvanoosten Apr 7, 2024

erikvanoosten Apr 7, 2024

erikvanoosten Apr 7, 2024

erikvanoosten Apr 7, 2024

erikvanoosten Apr 7, 2024 •

edited

Loading

erikvanoosten commented Apr 7, 2024 •

edited

Loading

svroonland commented Apr 7, 2024

erikvanoosten commented Apr 13, 2024 •

edited

Loading

svroonland commented Apr 14, 2024 •

edited

Loading

erikvanoosten commented Apr 29, 2024


		## Controlled shutdown

		The examples above will keep processing records forever, or until the fiber is interrupted, typically at application shutdown. When interrupted, in-flight records will not be processed fully through all stream stages and offsets may not be committed. For fast shutdown in an at-least-once processing scenario this is fine. zio-kafka also supports a _graceful shutdown_, where the fetching of records for the subscribed topics/partitions is stopped, the streams are ended and all downstream stages are completed, allowing all in-flight messages to be fully processed.

	withRunloopZIO(requireRunning = true)(_.stopSubscribedTopicPartitions(subscription)) *> partitionAssignmentQueue
	withRunloopZIO(requireRunning = false)(_.stopSubscribedTopicPartitions(subscription)) *> partitionAssignmentQueue

	* Like [[partitionedAssignmentStream]] but wraps the stream in a construct that ensures graceful shutdown
	* Like [[partitionedAssignmentStream]] but wraps the stream in a construct that ensures graceful shutdown.

	* Like [[partitionedStream]] but wraps the stream in a construct that ensures graceful shutdown
	* Like [[partitionedStream]] but wraps the stream in a construct that ensures graceful shutdown.

-  /**
-   * Like [[plainStream]] but wraps the stream in a construct that ensures graceful shutdown
-   */
+  /**
+   * Like [[plainStream]] but wraps the stream in a construct that ensures graceful shutdown. During a graceful shutdown the consumer is stopped but the stream can complete processing and commit already fetched records.
+   *
+   * Example usage:
+   * {{{
+   *   ... todo ...
+   * }}}
+   */

	private[internal] def stopSubscribedTopicPartitions(subscription: Subscription): UIO[Unit] =
	private[internal] def endStreams(subscription: Subscription): UIO[Unit] =

Graceful shutdown of a single subscription #1201

Are you sure you want to change the base?

Graceful shutdown of a single subscription #1201

Conversation

svroonland commented Mar 24, 2024 • edited Loading

erikvanoosten left a comment

Choose a reason for hiding this comment

erikvanoosten Mar 30, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

erikvanoosten left a comment

Choose a reason for hiding this comment

svroonland commented Apr 3, 2024 • edited Loading

erikvanoosten commented Apr 3, 2024

svroonland commented Apr 3, 2024

erikvanoosten commented Apr 3, 2024

svroonland commented Apr 3, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

guizmaii commented Apr 5, 2024 • edited Loading

guizmaii Apr 5, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

guizmaii commented Apr 5, 2024

svroonland commented Apr 5, 2024

erikvanoosten left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

erikvanoosten Apr 7, 2024 • edited Loading

Choose a reason for hiding this comment

erikvanoosten commented Apr 7, 2024 • edited Loading

svroonland commented Apr 7, 2024

erikvanoosten commented Apr 13, 2024 • edited Loading

svroonland commented Apr 14, 2024 • edited Loading

erikvanoosten commented Apr 29, 2024

svroonland commented Mar 24, 2024 •

edited

Loading

erikvanoosten Mar 30, 2024 •

edited

Loading

svroonland commented Apr 3, 2024 •

edited

Loading

guizmaii commented Apr 5, 2024 •

edited

Loading

guizmaii Apr 5, 2024 •

edited

Loading

erikvanoosten Apr 7, 2024 •

edited

Loading

erikvanoosten commented Apr 7, 2024 •

edited

Loading

erikvanoosten commented Apr 13, 2024 •

edited

Loading

svroonland commented Apr 14, 2024 •

edited

Loading