Configure SGE - VertebrateResequencing/vr-pipe GitHub Wiki
Sun Grid Engine was a job scheduler developed by Sun, released with an open source license. When Oracle purchased Sun it was renamed Oracle Grid Engine and an updated version released without source code. As a result multiple forks arose, for which one has in-built support in VRPipe: Son of Grid Engine (the 'SGE' that the rest of this document refers to). The various forks may work with VRPipe's SGE and sge_ec2 schedulers, but only only Son of Grid Engine has been tested.
Our public unconfigured VRPipe AMI has SGE installed but not configured. This guide shows you how to complete the installation by configuring SGE and starting up the master host so that it will work well with VRPipe. It is assumed that you will use VRPipe's sge_ec2 scheduler, which creates instance-type-specific queues, along with complex attributes, a parallel environment and groups, and also automatically launches and terminates instances to act as execution hosts as demand dictates. For this reason those topics are not discussed in this guide.
You must have a shared filesystem to use SGE. This guide assumes you have one mounted at /shared; alter the example command lines as appropriate if that isn't the case. It is also assumed that the home directory is not shared; if yours is shared alter the command lines below to replace /home/ec2-user/ with some location that is mounted on the local (fast) hard drive or similar.
- Make an SGE configuration file:
mkdir -p /shared/software/VRPipe/sge; mkdir /home/ec2-user/.sge
cp $SGE_ROOT/util/install_modules/inst_template.conf /shared/software/VRPipe/sge/install.conf
Edit /shared/software/VRPipe/sge/install.conf so that the following options are set:
SGE_ROOT="/opt/sge"
SGE_QMASTER_PORT=6444
SGE_EXECD_PORT=6445
SGE_ENABLE_SMF="false"
SGE_CLUSTER_NAME=vrpipe_cluster
CELL_NAME="default"
ADMIN_USER=ec2-user
QMASTER_SPOOL_DIR=/home/ec2-user/.sge/spool/qmaster
EXECD_SPOOL_DIR=/home/ec2-user/.sge/spool
GID_RANGE="20000-20100"
SPOOLING_METHOD="berkeleydb"
DB_SPOOLING_DIR="/home/ec2-user/.sge/spool/spooldb"
PAR_EXECD_INST_COUNT="20"
ADMIN_HOST_LIST=""
SUBMIT_HOST_LIST=""
EXEC_HOST_LIST=""
EXECD_SPOOL_DIR_LOCAL="/home/ec2-user/.sge/spool"
HOSTNAME_RESOLVING="true"
SHELL_NAME="ssh"
COPY_COMMAND="scp"
DEFAULT_DOMAIN="none"
ADMIN_MAIL="[email protected]"
(enter your correct email address here)
ADD_TO_RC="true"
SET_FILE_PERMS="true"
RESCHEDULE_JOBS="wait"
SCHEDD_CONF="1"
SHADOW_HOST=""
EXEC_HOST_LIST_RM=""
REMOVE_RC="true"
- Arrange that a settings.sh file (which doesn't exist yet) will be sourced when you log in:
echo '. /shared/software/VRPipe/sge/default/common/settings.sh' >> /shared/software/.profile
- Set up your cell directory to point to a location on your shared disc (that will be created later):
cd $SGE_ROOT
sudo ln -s /shared/software/VRPipe/sge/default default
- Create an AMI of this instance with a new name like 'unconfigured vrpipe on mounted gluster with SGE'.
- Terminate the instance you were using.
Any time you want to start using SGE (this is your first time, or you've previously been using it but had to terminate the instance that held the master and need to launch a new one to replace it), carry out the following steps. The idea is to keep the SGE master instance running continuously, and to launch and terminate execution host instances as needed (VRPipe configured with the sge_ec2 scheduler will do this for you). In this suggested configuration, jobs are not executed on the master.
- Launch a new instance running the AMI you made in step 4 of the previous section. For best results it should have at least 1GB of free memory and 2 CPUs. If you'll run vrpipe-server on the same instance (recommended), we recommend a 4GB machine with 4 CPUs.
- Clear out any existing cell folder:
cd $SGE_ROOT
sudo rm default
sudo rm -fr /shared/software/VRPipe/sge/default
- Run SGE's automated install script to set this as the master host:
sudo ./inst_sge -m -auto /shared/software/VRPipe/sge/install.conf
- Move the cell folder to your shared disc:
mv default /shared/software/VRPipe/sge/default
sudo ln -s /shared/software/VRPipe/sge/default default
- Set self as the one and only submit host:
. /opt/sge/default/common/settings.sh
qconf -as <hostname>
(where hostname is something like ip-10-36-139-102 and appears in your command prompt) - Delete the unnecessary default queue and host group:
qconf -dq all.q
qconf -dhgrp @allhosts