Resque Workers Restart - sul-dlss/preservation_catalog Wiki

You can determine the number of running workers running by:

Option 1 (Preferred) Via Capistrano

Allows running jobs to finish. For SDR objects with very large files or with tons of small files, some of the preservation jobs can take multiple hours.

We prefer to let these long jobs finish, and the resque-pool Ruby gem takes care of this for us. Capistrano also lets us restart the workers on all worker VMs with one command:

bundle exec cap [prod|stage|qa] resque:pool:hot_swap

If there are existing jobs still running when all the new workers come up,the worker count will be higher than expected until the previously running jobs finish.

If you want to gracefully stop the current pool without starting a new set of workers, you can do:

bundle exec cap [prod|stage|qa] resque:pool:stop  # performs a somewhat graceful shutting down of current worker pool

For a list of which signals have what effect on individual resque-pool worker processes, see: https://github.com/resque/resque#signals

Configuration for Ubuntu

Note that for the hot_swap command to work, the capistrano configs in the app need:

# for ubuntu to perform resque:pool:hot_swap
set :pty, true

On Worker VMs (When Capistrano Doesn't Work)

  1. You will need to ssh to each VM (preferably using cap xxx ssh to be the correct user for the app)
  2. You will need to run the commands below as the pres user (verify in capistrano configuration and/or puppet)

Worker VMs (as of 2022-05)

  • preservation-catalog-prod-02.stanford.edu
  • preservation-catalog-prod-03.stanford.edu
  • preservation-catalog-prod-04.stanford.edu
  • preservation-catalog-stage-02.stanford.edu
  • preservation-catalog-qa-02.stanford.edu

Option 2A. Run the resque-pool command manually

This command will still allow the existing preservation jobs to finish gracefully, as it is what capistrano sends to the VM.

As the pres user:

cd ~/preservation_catalog/current
RAILS_ENV=production bundle exec resque-pool --daemon --hot-swap

Option 2B. Using systemctl (when you can't run the resque-pool command directly)

NOTE: as of Aug 2022, this is not working reliably, and in particular isn't playing well with Capistrano's management of resque-pool. Please run the command given above instead of using systemctl. see also this pres issue about getting systemd to work for managing resque-pool.

Using kill (when all else fails)

Option 2C. Without Capistrano

from the VM, as pres user:

  1. Find the pid of resque-pool-master ps -ef | grep resque-pool-master

  2. kill -s QUIT <resque-pool-master-pid>

  3. Use ps -ef | grep resque to determine if the old workers have stopped.

  4. If the workers didn't stop, you may need to use more aggressive options with the kill command.

  5. Start resque-pool, as the pres user:

    cd ~/preservation_catalog/current
    bundle exec resque-pool --daemon --hot-swap --environment production