ARC clusters for python - ai-se/admin GitHub Wiki

This article shows my practice on ARC system. Contact [email protected]

Account Preparation

Ask for the access see https://arcb.csc.ncsu.edu/~mueller/cluster/arc/
Cisco VPN is required when you are out of campus. https://oit.ncsu.edu/campus-it/campus-data-network/vpn/
Download the anaconda/miniconda install shell from https://www.anaconda.com/distribution/#download-section OR https://docs.conda.io/en/latest/miniconda.html
scp xx.sh [email protected]:/home/UNITIY_ID/
After the login, install python package (for python programs)

srun --pty /bin/bash # get 16 cores (1 node) in interactive mode
sh xxxcondaxxx.sh # see following~

...
Do you accept the license terms? [yes|no]
[no] >>> yes

Miniconda3 will now be installed into this location:
/home/jchen37/miniconda3

  - Press ENTER to confirm the location
  - Press CTRL-C to abort the installation
  - Or specify a different location below

[/home/jchen37/miniconda3] >>> /home/jchen37/python3

Test python python3/bin/python3, should see

Python 3.6.4 |Anaconda, Inc.| (default, Jan 16 2018, 18:10:19)
[GCC 7.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.

The python program

You can let your python program executable as python main.py -alg WORTHY -model ss -r 10

import sys
import os
path = os.getcwd()
rootpath = path[:path.rfind('FSSE') + 4] # FSSE is the folder name of your program
sys.path.append(rootpath)

if __name__ == '__main__':
    # Parsing the sys.argv. You can custom the parameter names, etc.
    # For the convenient of debugging, you can have default parameters. 
    alg = 'WORTHY'
    model_id = 0
    repeat = 1
    for i, v in enumerate(sys.argv):
        if v == '-alg':
            alg = sys.argv[i + 1].upper()
        if v == '-model':
            model_id = int(sys.argv[i + 1])
        if v == '-r':
            repeat = int(sys.argv[i + 1])
    
    ... # rest of the program
    
    # writing out the results
    with open(f'{rootpath}/results/{model.name}.{alg}.res', 'a+') as f:
         f.write('##\n')
         ...
    sys.exit(0)

Deployment on ARC

Copy program from local machine to ARC scp -r xxx [email protected]:/home/unity_id
Make sure all required packaged is install on arc by home/unity_id/python3/bin/pip install xxx

mkdir arc
cd arc
mkdir out err

On the folder arc, create a batch file yyy.batch as

#!/bin/bash
#
#SBATCH --job-name=run_WORT
#SBATCH --ntasks=1
#SBATCH --time=01:30:00
#SBATCH --error=err/%j.err
#SBATCH --output=out/%j.out

/home/unity_id/python3/bin/python3 /path/to/main.py -alg worthy -model $mid -r 10

Note: the $mid will be set up outside

Create the ignition.sh

for mid in {0..6}; do export mid; sbatch yyy.batch; done
for mid in {0..6}; do export mid; sbatch yyy.batch; done

In this example, we will see algorithm worthy executed in model 0-6 for 20 repeats (each model, 10 repeats for one task)

Run the code by sh ignition.sh
- Monitoring squeue
- Cancel scancel ###JOB_ID

             JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
            117339    normal run_WORT  jchen37  R       5:25      1 c99
            117340    normal run_WORT  jchen37  R       5:25      1 c106
            117341    normal run_WORT  jchen37  R       5:25      1 c80
            117342    normal run_WORT  jchen37  R       5:25      1 c81

ARC clusters for python - ai-se/admin GitHub Wiki

Account Preparation

The python program

Deployment on ARC

Advanced topics