Batches - rselk/sidekiq GitHub Wiki
Batches are Sidekiq's term for a collection of jobs which can be monitored as a group.
This feature is available in Sidekiq Pro only
Overview
At The Clymb, we upload a lot of Excel spreadsheets to load data into our database. These spreadsheets might have hundreds of rows, each row requiring a few seconds of processing. I don't want to process the file synchronously (the web browser will time out after 60 seconds) and I don't want to spin off the upload as a single Sidekiq job (there's no performance benefit to serial execution). Instead I want to break up the Excel spreadsheet into one job per row and get the benefit of parallelism to massively speed up the data load time. But how do I know when the entire thing is done? How do I track the progress?
This is what batches allow you to do!
batch = Sidekiq::Batch.new
batch.description = "Batch description (this is optional)"
batch.notify(:email, :to => '[email protected]')
batch.jobs do
rows.each { |row| RowWorker.perform_async(row) }
end
puts "Just started Batch #{batch.bid}"
Here we've created a new Batch, told it to notify us via email when it's complete and then filled it with jobs to perform. The bid
, or Batch ID, is the unique identifier for a Batch.
You can dynamically add jobs to a batch from within an executing job:
class SomeWorker
include Sidekiq::Worker
def perform(...)
b = Sidekiq::Batch.new(bid)
b.jobs do
# add more jobs
end
end
end
bid
is a method on Sidekiq::Worker that gives access to the Batch ID associated to the job.
Watch Out!
Expiry
Batch data expires in Redis after 72 hours. If the jobs in a batch take longer than 72 hours to process, you need to extend the expiration:
batch.expires_in 2.weeks
You may extend batch expiration up to 30 days.
Race Condition
NOTE: this has been fixed as of Sidekiq Pro 1.8.0 The jobs
method has a race condition: as it executes and sends jobs to Redis, Sidekiq will immediately start working on those jobs. If Sidekiq finishes all outstanding jobs before the jobs
method has created all of them, you can get premature complete/success notifications. For instance:
b.jobs do
SomeWorker.perform_async(1)
sleep 1
# Uh oh, Sidekiq has finished all outstanding batch jobs
# and fires the complete message!
SomeWorker.perform_async(2)
end
If you find your jobs
method can't push jobs to Redis fast enough, you can use Sidekiq::Client.push_bulk to gather all necessary jobs and then push them all at once to Redis.
Status
To fetch the status for a Batch programmatically, you use Sidekiq::Batch::Status
:
status = Sidekiq::Batch::Status.new(bid)
status.total # jobs in the batch => 98
status.failures # failed jobs so far => 5
status.pending # jobs which have not succeeded yet => 17
status.created_at # => 2012-09-04 21:15:05 -0700
status.complete? # if all jobs have executed at least once => false
status.join # blocks until the batch is considered complete, note that some jobs might have failed
status.failure_info # an array of failed jobs
status.data # a hash of data about the batch which can easily be converted to JSON for javascript usage
Callbacks
Sidekiq can notify you when a Batch is complete or successful with batch.on(event, klass, options={})
:
- success - when all jobs in the batch have completed successfully.
- complete - when all jobs in the batch have run once, successful or not.
class SomeClass
def on_complete(status, options)
puts "Uh oh, batch has failures" if status.failures != 0
end
def on_success(status, options)
puts "#{options['uid']}'s batch succeeded. Kudos!"
end
end
batch = Sidekiq::Batch.new
batch.expires_in 7.days
batch.on(:success, SomeClass, 'uid' => current_user.id)
Regarding success, if a job fails continually it's possible the success event will never fire. Another caveat: the batch data can expire in Redis before all jobs are successful, meaning callbacks will not fire. By default batch data expires after 72 hours; this is one case where extending batch expiration may be necessary.
Monitoring
Sidekiq Pro contains extensions for the Sidekiq Web UI, including an overview for Batches which shows the current status of all Batches along with a Batch details page listing any errors associated with jobs in the Batch. Require the Pro extension where you require the standard Web UI:
require 'sidekiq/pro/web'
mount Sidekiq::Web => '/sidekiq'
Polling
You can poll for the status of a batch (perhaps to show a progress bar as the batch is processed) using the built-in Rack endpoint. Add it to your application's config.ru
:
require 'sidekiq/rack/batch_status'
use Sidekiq::Rack::BatchStatus
run Myapp::Application
Then you can query the server to get a JSON blob of data about a batch by passing the BID. For example:
http://localhost:3000/batch_status/bc7f822afbb40747.json
{"complete":true,"bid":"bc7f822afbb40747","total":10,"pending":0,"description":null,"failures":0,"created_at":1367700200.1111438,"fail_info":[]}
Testing
Batches require server-side middleware to function properly and as such, don't work in the test environment. If you are using inline testing and want to test your batch callbacks, you'll need to fire the callbacks manually.
options = { 'user_id' => 123 }
batch = Sidekiq::Batch.new
batch.on(:success, MyCallbackClass, options)
batch.jobs do
# define your work
end
# fire callback manually
MyCallbackClass.new.on_success(batch.status, options)
Canceling a Batch
If a batch of jobs is no longer valid, can you cancel them or remove them from Redis?
Sidekiq's internal data structures don't make it efficient to remove a job in Redis. Instead I recommend you have each job check if it is still valid when it executes. This way the jobs don't do any extra work and Redis is happy. The Batch API makes this pretty easy to do.
Step 1 Create the batch as normal:
batch = Sidekiq::Batch.new
batch.jobs do
# define your work
end
# save batch.bid somewhere
Step 2 Cancel the batch due to some user action
batch = Sidekiq::Batch.new(bid)
batch.invalidate_all
Step 3 Each job verifies its own validity
class MyWorker
include Sidekiq::Worker
def perform
return if !batch.valid?(jid)
# do actual work
end
end