Announcing River: Fast and reliable background jobs for Go

For the past several months, I’ve been working with Brandur to build the Postgres background job library that the Go ecosystem needs: it’s called River, and we’re launching it today in beta.

River builds on the state of the art in Postgres-based queues across other language ecosystems to deliver a Go library that’s easy to use, has minimal overhead, and can scale to tens of thousands of jobs per second on commodity hardware, all with full transactional ACID guarantees. By leveraging the latest Go features, including generics, River streamlines job definitions and reduces the need for boilerplate.

Note: Brandur has also written his own announcement post here.

History

About 9 years ago, I released my first Postgres-based background job library in Go: que-go, a port of que from Ruby. At the time I had just ended my tenure at Heroku and this combined a few of my interests.

Heroku was initially built as a Ruby on Rails monolith, and throughout my time there much of our effort was spent trying to break it apart into a proper distributed system that could handle our growth. Within the primary API codebase, Heroku engineers used several Postgres-backed queue implementations over time: DelayedJob, then QueueClassic, and finally Que.

As the system continued to scale, my colleague Brandur Leach detailed some of the operational challenges from this design. Despite these issues, Heroku’s primary API codebase never got away from this transactional job queue.

Transactional enqueueing

It was through this work at Heroku that I learned to appreciate the simplicity of transactional job queues within an application’s primary database. Pat Helland detailed some of this in Life Beyond Distributed Transactions (2007):

It would be horribly complex for an application developer to send a message while working on a transaction, have the message sent, and then the transaction abort. This would mean that you have no memory of causing something to happen and yet it does happen! For this reason, transactional enqueuing of messages is de rigueur.

Pat is correct about the complexity and added headaches of non-transactional enqueueing. The challenges with this approach are predictable and well understood, and yet throughout my career it’s felt like non-transactional queues are the default choice for most developers. Even in my most recent role at Mux, I spent several months moving our video webhooks to a transactional setup that ensures they’re not emitted before the transaction commits.

So why do so many developers implement background jobs with Redis, Kafka, or other queue systems instead of their primary Postgres database? I believe it comes down to 3 factors:

Throughput limits at the database level, which significantly improved after Postgres 9.5's SKIP LOCKED.
Inadequacies in earlier queueing libraries. For example, in 2014 Que (and by extension que-go) held a transaction for the duration of a job, which meant a high memory overhead and difficulty with long-running jobs.
A general lack of awareness about the benefits of transactional enqueuing, and the fact that this model can easily scale to tens of thousands of jobs per second with modern Postgres and a well-designed library.

We believe that a library like River should be the default model for building reliable systems, appropriate for all but the very largest applications. When you’re starting out, you have fewer operational dependencies and a simpler mental model. As you grow, the transactional approach minimizes the time you’ll spend dealing with predictable distributed systems edge cases.

And if you somehow do manage to outgrow this kind of system, congratulations on all your success :) Then you'll get to solve the distributed systems challenges you've been able to avoid this whole time.

Feature rich

River is already packed with features to save developers time, including:

Retries with exponential backoff
Configurable retention
Transactional job completion
Unique jobs
Periodic and cron jobs
Interfaces for error and panic handling
Graceful shutdown

We have a lot more planned, but first we want to ensure we've settled on the right APIs. Let us know about any issues you encounter including awkward or hard-to-use APIs.

Get started

We're excited to hear the Go community's feedback on what we've built so far. Check out the docs and give River a try today.

If you'd like to get updates on River's progress including future releases, sign up here.

Credits

Finally, we want to give a special shout out to some of the libraries that have collectively inspired us to want to build River over the years:

Oban in Elixir
que, sidekiq, delayed_job and GoodJob in Ruby
Hangfire in .NET