Problems and Troubleshooting - rselk/sidekiq GitHub Wiki
Help!
Read below for tips. If you still need help, you can:
- Ask your question in The Sidekiq Google Group
- Open a GitHub issue. (Don't be afraid to open an issue, even if it's not a Sidekiq bug. An issue is just a conversation, not an accusation!)
You should not email any Sidekiq commmitter privately. Please respect our time and efforts by sticking to one of the two above. Remember also that Sidekiq is free, open source software: support is not guaranteed, it's best effort according to the availability of the Sidekiq committers. Sidekiq Pro customers get guaranteed support.
Threading
Sidekiq is multithreaded so your Workers must be thread-safe.
Thread-safe libraries
Most popular Rubygems are thread-safe in my experience. A few exceptions to this rule:
- right_aws
- aws-sdk (According to a somewhat old post in the discussion group, this gem is thread-safe with the exception of the use of autoload. More details here. Explicitly calling AWS.eager_autoload! during initialization should allow it to be used with Sidekiq)
- aws-s3 (For S3 and other AWS work, use Fog instead, it is substantially better and under active development (aws-s3 is old and hasn't been updated in years). There is a guide to using S3 with Fog available, and it has been tested to be thread safe).
- basecamp
Some gems can be troublesome:
- typhoeus has a history of crashing a lot
- pg (the postgres driver, make sure
PG::Connection.isthreadsafe
returns true) - RMagick (see #338, try mini_magick instead)
- therubyracer, versions before 0.11 can cause Sidekiq to hang
- libxml-ruby (see #1174)
- ActiveResource - as of v4.0.0, ActiveResource uses a lot of global state. See the issue for more details.
Writing thread-safe code
Well-factored code is typically thread-safe without any changes. Always prefer instance variables and methods to class variables and methods. Require all necessary classes on startup so you aren't requiring code while executing jobs: Ruby's require
statement is not atomic, as explained in this Stack Overflow answer.
Sidekiq locking up
Use the TTIN signal to get a dump of stack traces from all Threads.
Ruby 2.0 has proven to be more stable than Ruby 1.9 and solve lockups for several people. Give it a shot if you are on Ruby 1.9.
ActiveRecord
Take care to avoid unsafe code related to the ActiveRecord database connection or connection pool. Calls like verify_active_connections!
manipulate the ConnectionPool in a thread-unsafe way. Avoid these calls from inside of your jobs' perform
method. See issue #267 for an example.
"Cannot find ModelName with ID=12345"
Sidekiq is so fast that it is quite easy to get transactional race conditions where a job will try to access a database record that has not committed yet. The clean solution is to use after_commit
:
class User < ActiveRecord::Base
after_commit :greet, :on => :create
def greet
UserMailer.delay.send_welcome_email(self.id)
end
end
Note: after_commit
will not be invoked in your tests if you have use_transactional_fixtures
enabled, but test_after_commit has been written to help out in this case.
If you aren't using ActiveRecord models, use a scheduled perform to run after you can be sure the transaction has committed:
MyWorker.perform_in(5.seconds, 1, 2, 3)
Either way, Sidekiq's retry mechanism's got your back. The first time might fail with RecordNotFound but the retry will succeed.
Job status polling works in development, not in production
If you poll your model periodically (say, from an ajax request) to determine when your background job has completed, and your background completes in less than a second, you may run into an issue where your job polling logic works in development mode but sporadically in production.
This may be caused by rails' use of Rails.cache
. By default, Model.cache_key
is only precise to the second. Updates that start and finish during the same second may cause your status polling to return a stale record. In databases that support sub-second time values (such as postgres), set config.active_record.cache_timestamp_format = :nsec
in config/application.rb
to increase the cache precision and avoid stale records.
Heroku
"ERR max number of clients reached"
You've hit the max number of Redis connections allowed by your plan.
Limit the number of redis connections per process in config/sidekiq.yml. For example, if you're on Redis To Go's free Nano plan and want to use the Sidekiq web client, you'll have to set the concurrency down to 3.
:concurrency: 3
See #117 for a discussion on the topic.
Why does the Sidekiq Web UI look terrible / not render correctly in production but works fine in development?
Sidekiq Web wants to serve CSS/JS assets out of the gem. Your production web server is not forwarding CSS/JS requests to your app so Sidekiq Web can serve them but instead returning a 404 if they aren't found on the filesystem.
If you are using Rails 3.1 or 3.2 along with the asset pipeline, try putting the following into your config.ru file instead of specifying the route in routes.rb:
require 'sidekiq/web'
run Rack::URLMap.new(
"/" => Rails.application,
"/sidekiq" => Sidekiq::Web
)
If you are using Nginx make sure to uncomment the line:
config.action_dispatch.x_sendfile_header = 'X-Accel-Redirect'
and comment this one:
# config.action_dispatch.x_sendfile_header = "X-Sendfile"
The workers are not starting
If you are migrating from Resque make sure there the Redis database does not contain any old tasks. You can clear all data with redis-cli flushall
.
Another common problem is that you might have defined a namespace in Sidekiq.configure_server
but not in Sidekiq.configure_client
or named it something else. Make sure you configure both!
Too many connections to MongoDB
If you are using Mongoid you'll also want to use the kiqstand middleware to properly disconnect workers so your connections aren't overloaded.
My Sidekiq process is disappearing!?
Linux's OOM killer might kill Sidekiq if your machine is running low on memory and can't swap. Use dmesg | egrep -i 'killed process'
to search for OOM activity:
[102335.319388] Killed process 6567 (ruby) total-vm:1333004kB, anon-rss:355088kB, file-rss:688kB
The solution is to get more memory or optimize your workers. See Memory Bloat below for tips.
My Sidekiq process is crashing, what do I do?
Only two things can cause a Ruby VM to crash: a VM bug or a native gem bug. Sidekiq is pure Ruby and cannot crash the Ruby VM on its own. You'll need to gather a core file for the crashed process and use GDB to inspect the state of the Ruby process when it crashed. A couple of notes:
- native gem bugs can cause crashes - make sure you are running the latest version of all native gems so you have the latest fixes
- every time the Sidekiq process crashes, any messages being processed are lost. You can avoid this with Sidekiq Pro's reliable fetch feature.
You can get a list of all native gems in your app with this command:
bundle exec ruby -e 'puts Gem.loaded_specs.values.select{ |i| !i.extensions.empty? }.map{ |i| i.name }'
Sidekiq tries to use a connection from a child process without reconnecting
Since 2.9.0, Sidekiq assumes you don't touch redis until the app is booted and forked. Therefore, you'll get Redis::InheritedError
if your code or a gem uses the Sidekiq client API before the app server has forked. For example: enqueuing a job upon app startup.
Jobs are mysteriously disappearing or failing without anything in the logs!
Often this is due to an old, left-over Sidekiq process that is still running. Make sure old processes are killed.
Sinatra
Be sure to boot the gems in your application by adding:
Bundler.require(:default)
to the top of your main Sinatra file. Read more about booting Bundler on the Bundler site.
Memory Bloat
If you have a memory bloating and your Sidekiq process goes from X MB to BIG MB over time, 99% of the time the cause is unoptimized ActiveRecord queries. Something in your Workers might be querying the database and loading tens or hundreds of thousands of ActiveRecord instances. Example:
# See if product search returns no results
return "No results" if Product.search(...).blank?
If the product search returns 10,000 results, this query will create 10,000 objects and then immediately throw them away. Terrible! The right way:
# See if product search returns no results
return "No results" if Product.search(...).count == 0
Unfortunately it's up to you to determine which worker and query is causing the bloat. Sidekiq can't help in this task. Another example:
Wrong, might load millions of user objects in memory:
User.each { |u| u.something }
Right, will iterate through 1000 users at a time:
User.find_each { |u| u.something }
In short, it is really easy to use ActiveRecord inefficiently. Read through your queries and make sure you understand exactly what each will do.