River’s published docs on how to shut down gracefully since day one, but the recommendation’s always been convoluted, involving two stop phases with Stop and StopAndCancel, multiple accompanying channels for signals and stop tracking, and a goroutine.
But why don’t you judge for yourself? Here’s what it looked like before today:
if err := riverClient.Start(signalCtx); err != nil { panic(err)}
sigintOrTerm := make(chan os.Signal, 1)signal.Notify(sigintOrTerm, syscall.SIGINT, syscall.SIGTERM)
go func() { <-sigintOrTerm
softStopCtx, softStopCtxCancel := context.WithTimeout(ctx, 10*time.Second) defer softStopCtxCancel()
go func() { select { case <-sigintOrTerm: softStopCtxCancel() case <-softStopCtx.Done(): } }()
err := riverClient.Stop(softStopCtx) if err == nil { return }
hardStopCtx, hardStopCtxCancel := context.WithTimeout(ctx, 10*time.Second) defer hardStopCtxCancel()
_ = riverClient.StopAndCancel(hardStopCtx)}()
<-riverClient.Stopped()There’s a lot of boilerplate there, and a lot that could be subtly wrong. I wrote the original version of it myself, and still have to re-read it carefully every time to understand what it’s doing.
Why was it so complicated? Because stopping is more nuanced than it’d appear at first glance, and we want River to do it as efficiently and cleanly possible. Ideally:
- Any work in progress is given a chance to continue until finished.
- Any work that doesn’t finish in a reasonable timeframe should be cancelled, but cancelled so that it’s accounted for in the database and eligible to run immediately the next time a client starts.
- Jobs that don’t respond to context cancellation shouldn’t gum up the whole program.
SoftStopTimeout
As of v0.38.0, River picks up the new client configuration SoftStopTimeout:
riverClient, err := river.NewClient(riverpgxv5.New(dbPool), &river.Config{ SoftStopTimeout: 10 * time.Second, ...})With SoftStopTimeout in the picture, the graceful stop example above reduces to:
signalCtx, stop := signal.NotifyContext(ctx, syscall.SIGINT, syscall.SIGTERM)defer stop()
if err := riverClient.Start(signalCtx); err != nil { panic(err)}
<-riverClient.Stopped()Much better! Upon the cancellation of its context, Client.Start begins stopping. Producers stop fetching new jobs and running jobs are left to wind down (a “soft” stop). After SoftStopTimeout expires, River internally initiates a “hard” stop by cancelling the work context of outstanding jobs. It’s the same routine as the original example, but with the bulk of the logic moved internally. Both stop calls have been removed. Explicit timeouts are gone.
The API maintains compatibility and continues to provide granular Stop/StopAndCancel functions for more complex cases that need them, but most programs will want to move to the new style for simplification’s sake.