Set Up Stuff on UCL Lab Machines - radical-cybertools/BigJobAsync GitHub Wiki

SSH into one of the lab machines, e.g.,

ssh [email protected]

1. Create a Python Virtualenv

The lab machines don't seem to come with virtualenv preinstalled, so we have to download and unpack it first:

wget --no-check-certificate https://pypi.python.org/packages/source/v/virtualenv/virtualenv-1.9.tar.gz
tar xzf virtualenv-1.9.tar.gz

Now we can use it to create a new MDStack virtualenv (use a different name if you want):

python virtualenv-1.9/virtualenv.py $HOME/MDStack
source $HOME/MDStack/bin/activate

NOTE: Every time you log in to the lab machine, run source $HOME/MDStack/bin/activate to activate your virtualenv.

2. Install BigJobAsync

Now that we have a local virtualenv, we can install any Python packages without requiring root privileges. We install the latest stable version of BigJobAsync directly from GitHub. The installer will automatically install all required dependencies, including BigJob and SAGA-Python:

pip install --upgrade -e git://github.com/radical-cybertools/BigJobAsync.git@master#egg=bigjobasync

Next, we need to install the latest development version of saga-python as it fixes a few issues that I encountered with the ancient Ubuntu 10.x installation on the UCL lab machines:

pip install --upgrade -e git://github.com/saga-project/saga-python.git@devel#egg=saga-python

Once the installer has finished, make sure everything is in place (version numbers might diverge):

python -c "import saga; print saga.version"
0.9.15-13-g32884cd
python -c "import bigjob; print bigjob.version"
0.53
python -c "import bigjobasync; print bigjobasync.version"
0.2

3. Set Up Access Credentials for Stampede

In order to use BigJobAsync with stampede, we need to set up password-less SSH login. First, we create a new RSA keypair.

NOTE: Make sure that you save it as /home/username/.ssh/mdstack_rsa and that you leave the password empty

ssh-keygen -t rsa -C "MDStack" 

Generating public/private rsa key pair.
Enter file in which to save the key (/home/oweidner/.ssh/id_rsa): /home/oweidner/.ssh/mdstack_rsa
Enter passphrase (empty for no passphrase): 
Enter same passphrase again: 
Your identification has been saved in /home/oweidner/.ssh/mdstack_rsa.

Next, in a separate terminal window, log in to your account on stampede and add the content of the newly generated /home/username/.ssh/mdstack_rsa.pub to $HOME/.ssh/authorized_keys on stampede. Once that's done, you can close the connection.

Back on the UCL lab machine, try to log in to stampede with your new key:

ssh -i $HOME/.ssh/mdstack_rsa [email protected]

Once that works, create a file $HOME/.ssh/config and add the following entry:

Host *.tacc.utexas.edu
IdentityFile ~/.ssh/mdstack_rsa
User tacc_username

Now you should be able to login to stampede without providing a username or an identity:

ssh stampede.tacc.utexas.edu

If that works, you are all set.

4. Run the Example Script

Create a directory $HOME/example and download / copy the sample input files and example script to it.

mkdir $HOME/example
cd $HOME/example
wget https://raw.github.com/radical-cybertools/BigJobAsync/master/examples/loreipsum_pt1.txt
wget https://raw.github.com/radical-cybertools/BigJobAsync/master/examples/loreipsum_pt2.txt
wget https://raw.github.com/radical-cybertools/BigJobAsync/master/examples/01_example_local_input.py

Open the file 01_example_local_input.py and change the following lines.

# CHANGE: Your stampede username
USERNAME    = "tg802352" 
# CHANGE: Your stampede working directory 
WORKDIR     = "/scratch/00988/tg802352/example/"
# CHANGE: Your stampede allocation
ALLOCATION  = "TG-MCB090174"

Now you can run the script

python 01_example_local_input.py

The output should look similar to the one below, however, there won't be any particular order as the individual stages of task execution run interleaved and highly asynchronously.

 * Task combinator-task-0 state changed from 'New' to 'TransferringInput'.
 * Task combinator-task-1 state changed from 'New' to 'TransferringInput'.
[...]
 * Resource '<_BigJobWorker(_BigJobWorker-9, started daemon)>' state changed from 'New' to 'Pending'.
 * Task combinator-task-0 state changed from 'TransferringInput' to 'WaitingForExecution'.
 * Task combinator-task-1 state changed from 'TransferringInput' to 'WaitingForExecution'.
[...]
 * Task combinator-task-0 state changed from 'WaitingForExecution' to 'Pending'.
 * Task combinator-task-1 state changed from 'WaitingForExecution' to 'Pending'.
[...]
 * Resource '<_BigJobWorker(_BigJobWorker-9, started daemon)>' state changed from 'Pending' to 'Running'.
[...]
 * Task combinator-task-0 state changed from 'Pending' to 'Running'.
 * Task combinator-task-1 state changed from 'Pending' to 'Running'.
[...]
 * Task combinator-task-0 state changed from 'Running' to 'WaitingForOutputTransfer'.
 * Task combinator-task-1 state changed from 'Running' to 'WaitingForOutputTransfer'.
[...]
 * Task combinator-task-0 state changed from 'TransferringOutput' to 'Done'.
 * Task combinator-task-1 state changed from 'TransferringOutput' to 'Done'.