ARC - BNNLab/BN_Group_Wiki GitHub Wiki

1. Introduction to ARC

ARC (advanced Research Computing) is the University of Leeds High Powered Computing (HPC) service and is used by researchers across the university for intensive computing tasks. In Chemistry, it is mainly using for Electronic Structure and Molecular Modeling e.g. DFT/Ab Initio calculations. The primary software we use is Gaussian 09. Gaussian 09 is available on ARC3 and ARC4.

General help and details of how to get access can be found here:

https://arc.leeds.ac.uk/

2. Linux

ARC3 and ARC4 are on a linux platform, everything has to be done from the command line, so look up basic linux commands e.g. making, editing and deleting files/folders before continuing!

Useful links:

Some beginners' introduction books for Linux can be found here:

3. BASH programming

The ability to use bash scripts to carry out file operations on ARC is essential for high throughput computational chemistry. The guide and tutorials can be found here:

https://tldp.org/HOWTO/Bash-Prog-Intro-HOWTO.html

4. Accessing ARC

After getting an account set up (see https://arc.leeds.ac.uk/), use the following to log in:

From a university computer, VPN or on University wifi networks:

ssh [[email protected]](mailto:[email protected])

From off-campus:

ssh [[email protected]](mailto:[email protected])

5. Uploading/downloading between OneDrive and ARC

For instructions on uploading/downloading large files between OneDrive and ARC4, take a look at the video below.

https://web.microsoftstream.com/video/ae085b90-445a-4382-be50-52418dae6c98 

The webpage itself can be found at:

https://foqueiroz.github.io/web/public/arc/index.html

6. Batch Queuing with Task Arrays

In many cases you want to run many almost identical jobs simultaneously, usually running the same program many times but changing the input data or some argument or parameter. One possible solution is to write a Python script to create all the qsub (sh) files and then write a BASH script to execute them. This is very time consuming and might end up submitting many more jobs to the queue than you actually need to, often taking many hours to submit 10,000's of jobs and requiring you to stay connected to ARC the entire time.

Using a task array allows you to only submit one job to the queue containing all the required jobs. Meaning you only need to produce one sh file instead of 1000's. It also only appears as one job in your job queue.

An example task array (submit.sh) is shown below for submission of a set of 25,000+ ORCA jobs:

#$ -cwd -V 
#$ -l h_vmem=1G 
#$ -l h_rt=48:00:00 
#$ -pe smp 4 
#$ -m be 

job=$(sed -n ${ SGE_TASK_ID} p dirlist.txt) 
/home/home02/pmmass/orca/orca "${ job} " > "${ job%.*} .out"

where each job is taken from a text file, "dirlist.txt" containing one line for each job name. This list can be easily generated by running the following command in the required directory:

ls > dirlist.txt

or for all .txt files:

ls *.txt > dirlist.txt

for all directories:

ls -d * > dirlist.txt

To test the array run:

qsub -t 1 submit.sh

this will submit the first job in the .txt file and should be used to test if the script is working correctly. 

To submit all jobs run:

qsub -t 1-$(wc -l < dirlist.txt) submit.sh

this will submit all tasks where the number of tasks is the number of lines in the .txt file.

In this example all jobs will be run with 4 CPU cores, 1GB of memory and 48h of time. The final line can be changed based on the job/software you wish to run.

Once submitted, you should see a single job in the queue, along with a number alongside it showing how many tasks it has yet to complete. Running tasks should also appear separately in qstat.