New Pro feature: Sequences

We recently launched River Pro with only the workflows feature, and we're inspired by how many in the Go community have already found it worth paying for. Shortly after the launch, I was also blessed to welcome my second child into the world. Needless to say, it's been a busy and wonderful couple of months. 😅

Of course, workflows were only the starting point and there is so much more to come. Today, River Pro gains its next major feature: Sequences. Sequences allow you to execute a series of jobs in a guaranteed one-at-a-time sequential order relative to other jobs in the same sequence. Read on to learn how they work, or learn more about River Pro and how to get started.

What are sequences?

Under most circumstances, developers want a job queue to process their jobs in parallel as quickly as possible. For some use cases, however, processing a bunch of related jobs at the same time can lead to complicated race conditions due to the jobs operating on the same underlying data, or triggering related side effects that may conflict.

For example, imagine that you're ingesting Stripe webhooks. You may receive bursts of webhooks at the same time, and to process them you may need to mutate to the same underlying database rows. Or you may be triggering external events, like sending emails to the customer.

In such cases it would be much easier to reason about the system if you could process the webhooks one-at-a-time, but if you have many customers this is going to be a bottleneck. Instead, what if we could process the webhooks for a given customer in order, but still process webhooks for different customers in parallel? This is what sequences deliver.

By ingesting the webhooks directly into your River queue and then partitioning them into sequences based on customer ID, you can guarantee that all events for a given customer are processed in order, while still allowing events for different customers to be processed in parallel:

Sequences diagram

In each sequence, only one job can be processed at a time. Subsequent jobs wait in pending, and the next job in the sequence is only processed when the current job is finished. Each sequence can run in parallel with others. And unlike a job-level locking strategy, pending jobs do not block worker slots from being able to process other jobs—nor is there excess database churn from solving this with repeated snoozing.

Users of Kafka may recognize this as similar to the concept of a "partition key" in that system, and the use cases are similarly broad.

Fine-grained control

Sequences are partitioned based upon a "sequence key" that is computed from various job attributes such as its kind and args (or a subset of args), similar to to unique jobs. They also provide a guarantee of one-at-a-time execution, but unlike unique jobs, sequences allow an infinite number of jobs to be queued up in the sequence, even though only one job will be worked at a time.

Available today in River Pro

Sequences are available now in the latest version of River Pro. Check out the docs to learn more about how to use them, and how to get started with River Pro.

River Pro API changes

The latest River Pro release contains some significant design changes in order to facilitate the new sequences feature and other upcoming functionality. In particular, River Pro is now primarily used through a new a riverpro.Cient type that lets us expose additional methods and provide a deeper integration to customize the underlying river.Client. If you're a River Pro user, make sure to check out the changelog for details on how to migrate to the new API.

Although these API changes are something we wish we could have avoided, they are necessary and we'd rather make them now than later. I don't expect we'll need to make any more significant changes like this going forward. The migration guide in the changelog contains a handful of find and replaces that should turn this into a quick and easy process for existing customers.

Other recent River improvements

River itself has seen some significant upgrades in recent releases as well.

The unique jobs implementation has been further refined to allow for the much faster unique index-based implementation to be used even if you're customizing the list of unique states. As part of this change, the old advisory lock unique jobs implementation has been deprecated and removed across the past 2 releases. The benefit of this is that we were able to make the unique jobs feature function even with batch inserts. Going forward we hope that most features will be available regardless of which insert method you choose, and we've consolidated the underlying logic to make this easier to deliver.

The job completer has been refactored so that all post-execution job updates are batched together, rather than only successful completed jobs. This should lead to a significant improvement in throughput and a reduction in database contention if your app typically has a mix of job result like snoozes and errors.

Finally, we've added middleware! Middleware allows you to wrap both job insertion and execution with additional logic that can be reused across job types. This can be useful useful for logging, telemetry, or for building higher level abstractions on top of base River functionality. Check out the middleware docs to get started, or see the related PR for more details.