River has a built-in leader election system which is used internally so that even in the presence of many concurrently operating clients, only one at a time is running queue maintenance tasks.
In charge of maintenance
River operates a number of maintenance services to keep queues healthy. For example, there's a service to remove finished jobs after they've crossed a retention threshold, and another that rescues jobs that appear to be stuck.
Maintenance tasks are performed broadly, and only one client needs to be running them at a time — more than one would produce unnecessary work and contention. River engages in a leader election process so that even with many concurrently running clients, only one of them is in charge of queue maintenance. Leadership coordination happens through the unlogged river_leader
table.
When a leader is shutting down, it resigns leadership and notifies other clients using a Postgres LISTEN
/NOTIFY
channel, prompting a new leadership election. This happens quickly, so as long as there's more than one River client deployed, there will be few gaps in maintenance operations.
River also handles situations where the current leader does not shut down cleanly, such as a program crash, power outage, or network interruption. If the current leader is unable to renew its leadership within a five second TTL, a new leader will automatically be elected from the remaining available nodes.
One leader per database and schema
Leadership is per database and schema. If you have many River clients running against different databases and schemas, there will be at most one leader per database and schema combination.