ARC clusters for python - ai-se/admin GitHub Wiki

This article shows my practice on ARC system. Contact [email protected]

Account Preparation

srun --pty /bin/bash # get 16 cores (1 node) in interactive mode
sh xxxcondaxxx.sh # see following~
...
Do you accept the license terms? [yes|no]
[no] >>> yes

Miniconda3 will now be installed into this location:
/home/jchen37/miniconda3

  - Press ENTER to confirm the location
  - Press CTRL-C to abort the installation
  - Or specify a different location below

[/home/jchen37/miniconda3] >>> /home/jchen37/python3
  • Test python python3/bin/python3, should see
Python 3.6.4 |Anaconda, Inc.| (default, Jan 16 2018, 18:10:19)
[GCC 7.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.

The python program

You can let your python program executable as python main.py -alg WORTHY -model ss -r 10

import sys
import os
path = os.getcwd()
rootpath = path[:path.rfind('FSSE') + 4] # FSSE is the folder name of your program
sys.path.append(rootpath)

if __name__ == '__main__':
    # Parsing the sys.argv. You can custom the parameter names, etc.
    # For the convenient of debugging, you can have default parameters. 
    alg = 'WORTHY'
    model_id = 0
    repeat = 1
    for i, v in enumerate(sys.argv):
        if v == '-alg':
            alg = sys.argv[i + 1].upper()
        if v == '-model':
            model_id = int(sys.argv[i + 1])
        if v == '-r':
            repeat = int(sys.argv[i + 1])
    
    ... # rest of the program
    
    # writing out the results
    with open(f'{rootpath}/results/{model.name}.{alg}.res', 'a+') as f:
         f.write('##\n')
         ...
    sys.exit(0)

Deployment on ARC

  • Copy program from local machine to ARC scp -r xxx [email protected]:/home/unity_id
  • Make sure all required packaged is install on arc by home/unity_id/python3/bin/pip install xxx
mkdir arc
cd arc
mkdir out err
  • On the folder arc, create a batch file yyy.batch as
#!/bin/bash
#
#SBATCH --job-name=run_WORT
#SBATCH --ntasks=1
#SBATCH --time=01:30:00
#SBATCH --error=err/%j.err
#SBATCH --output=out/%j.out

/home/unity_id/python3/bin/python3 /path/to/main.py -alg worthy -model $mid -r 10

Note: the $mid will be set up outside

  • Create the ignition.sh
for mid in {0..6}; do export mid; sbatch yyy.batch; done
for mid in {0..6}; do export mid; sbatch yyy.batch; done

In this example, we will see algorithm worthy executed in model 0-6 for 20 repeats (each model, 10 repeats for one task)

  • Run the code by sh ignition.sh
    • Monitoring squeue
    • Cancel scancel ###JOB_ID
             JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
            117339    normal run_WORT  jchen37  R       5:25      1 c99
            117340    normal run_WORT  jchen37  R       5:25      1 c106
            117341    normal run_WORT  jchen37  R       5:25      1 c80
            117342    normal run_WORT  jchen37  R       5:25      1 c81

Advanced topics