Getting Started - ucsf-wynton/tutorials GitHub Wiki

Getting a Wynton HPC Account
Connecting to Wynton HPC
Linux operating system on Wynton HPC
Using the Linux command line
Storage
Overview of the different kinds of nodes on Wynton
A little about Linux environment modules
Submitting a job to Wynton
Interactive sessions on Wynton HPC
Parallel jobs
GPU scheduling
Best practices
Troubleshooting tips
Getting Additional Help

Getting a Wynton HPC account

Fill out the Wynton account request form
- note: if you are from Gladstone, ask IT for a UID/GID and check the box for "Gladstone" in the form
After the form is submitted, the Wynton admins will set up your Wynton HPC account and work with you to make sure you can access the cluster
Read the User Agreement
If you need to change your password
- go to password change
For password resets
- contact the Wynton admins

Connecting to Wynton HPC

ssh
- Open an ssh client. An ssh client is already installed if you are using OS X or Linux. On Windows you might need to download an ssh client application.
  - Mac
    - Terminal (built-in)
    - iTerm2
  - Linux
    - Terminal (built-in)
  - Windows
- In the example below, replace alice with your actual Wynton user name. Type the first line and your password, when prompted:

{local}$ ssh [email protected]
[email protected]'s password:
Last login: Thu Jul 16 17:03:28 2020 
[alice@wynlog2 ~]$

sftp
- sftp is a common method used to transfer files between 2 computers. If you are using OS X or Linux, an sftp client application is already installed. Under Windows, you might need to download an additional application.
- In the example below, replace alice with your actual Wynton user name. Type the first line and your password, when prompted:

{local}$ sftp [email protected]
[email protected]'s password:
Connected to log2.wynton.ucsf.edu.
sftp>

For more information on how to transfer files on Wynton see Wiki - How to move files
Troubleshooting logging in to Wynton HPC
- If you have difficulty connecting, make sure you have received confirmation that your account has been created and the username you are using is correct.
- Make sure the server hostname you are connecting to is correct. You can only log into the Wynton login nodes or the data transfers directly from the outside.
- If your password needs to be reset, please contact the Wynton system administrators.

Linux operating system on Wynton HPC

Wynton HPC runs the Linux operating system, specifically CentOS 7 Linux
Becoming comfortable using the Linux command line and the bash "shell" are very useful skills to interact with the Wynton HPC environment

Using the Linux command line

A good intro to using the Linux command line is available at Software Carpentry - The Unix Shell
Video recording of Software Carpentry - The Unix Shell (UCSF login required) [running time 2:17:48] by Geoffrey Boushey at UCSF, a member of the Library's Data Science Initiative team
- Topics covered in the 2 hour recording include
  - Introducing the Shell
  - Navigating Files and Directories
  - Working with Files and Directories
  - Finding things

Storage

IMPORTANT: Wynton storage is NOT backed up. If your data are important, do not keep the only copy on Wynton.

BeeGFS is the parallel file system used by Wynton. It is optimized for HPC
Home directory
- mounted under /wynton/home
- user home directory quotas are 500GiB
Group directory
- mounted under /wynton/group
- to check quota for group members beegfs-ctl --getquota --git <group>
- For example, 100TB of Gladstone space under /wynton/group/gladstone. Here's how it works.
Global scratch space
- mounted as /wynton/scratch and is available as a shared directory from all Wynton nodes
- If you are copying files that will only be needed temporarily, for example as input to a job, then you have the option of copying them directly to a global scratch space at /wynton/scratch. There is 492TiB of space available for this purpose.
- /wynton/scratch is automatically purged after 2 weeks, but you should go ahead and delete the files when you no longer need them.
- note: it is good practice to first create your own subdirectory here and copy to that location

mkdir /wynton/scratch/my_own_space
scp filename.tsv [email protected]:/wynton/scratch/my_own_space

Local scratch space
- mounted as /scratch
- each node has it's own /scratch directory that is not shared with other nodes
- it is good practice to create a directory under /scratch to write to
- https://wynton.ucsf.edu/hpc/scheduler/using-local-scratch.html

Overview of the different kinds of nodes on Wynton

There are a few different kind of nodes (Linux hosts): login, development, data transfer, compute, gpu compute
login nodes
- login nodes can be logged into directly
- minimal compute resource
- dedicated solely to basic tasks such as copying and moving files on the shared file system, submitting jobs, and checking the status on existing jobs
- node names
  - log1.wynton.ucsf.edu
  - log2.wynton.ucsf.edu
development nodes
- cannot log into development nodes directly. They can be accessed from the login nodes
- node names
  - dev1.wynton.ucsf.edu
  - dev2.wynton.ucsf.edu
  - dev3.wynton.ucsf.edu
  - gpudev1.wynton.ucsf.edu
- validating scripts, prototyping pipelines, compiling software, etc
- interactive jobs (Python, R, Matlab)
data transfer node
- like login nodes, the data transfer nodes can be logged into directly
- node names
  - dt1.wynton.ucsf.edu
  - dt2.wynton.ucsf.edu
- have access to the outside internet
- data transfer nodes each have 10 Gbps network connections and can be logged into directly like the login nodes. For comparison, the login nodes have 1 Gbps network connections
- for large transfers, making use of Globus would be the preferred transfer method
- Gladstone user have additional options for high-speed data transfers to/from Gladstone, local and Dropbox locations: See internal confluence docs.
compute nodes
- can not log in to compute nodes directly
- the scheduler will send jobs to compute nodes
- the majority of compute nodes have Intel processors, a few have AMD
- local /scratch
  - either hard disk drive (HDD), solid state drive (SSD), or Non-Volitile Memory Express (NVMe) drive
  - each node has a tiny /tmp (4-8 GiB)
gpu (for GPU computation)
- cannot log in to gpu nodes directly
- as of 2019-09-20
  - 38 GPU nodes with a total of 132 GPUs available to all users
    - Among these, 31 GPU nodes, with a total of 108 GPUs, were contributed by different research groups
  - GPU jobs are limited to 2 hours in length when run on GPUs not contributed by the running user's lab.
  - Contributors are not limited to 2-hour GPU jobs on nodes they contributed
  - There is also one GPU development node that is available to all users

A little about Linux environment modules

https://wynton.ucsf.edu/hpc/software/software-modules.html
available module repositories (need to be loaded)
- CBI : Repository of software shared by the Computational Biology and Informatics (http://cbi.ucsf.edu) at the UCSF Helen Diller Family Comprehensive Cancer Center
- Sali: Repository of software shared by the UCSF Sali Lab
A list of the available modules in the CBI and Sali repositories is available at https://wynton.ucsf.edu/hpc/software/software-repositories.html or by using the module avail command after loading a module with module load To list all the modules in the CBI repository: module load CBI followed by module avail
Loading a module use: module load
For example to load the R module from the CBI module repository: module load CBI r
To see what gets set when a module is loaded use: module show For example, to see what gets set when the mpi module is loaded: module show mpi
To see what software modules you have currently loaded use: module list
To see what software modules are currently available (in the software repositories you have loaded), use: module avail
To disable (“unload”) a previously loaded module. , use: module unload For example, to unload the R module if it had been loaded previously: module unload r
To disable all loaded software modules and repositories: module purge
Other ways of loading software
- Centos Software Collections (SCL)
- Build the software in your home directory
- Use a Singularity container (similar to Docker and Docker container images can be converted to Singularity images)
  - https://wynton.ucsf.edu/hpc/software/singularity.html

Submitting a job to Wynton

The current job scheduler used is SGE 8.1.9 (Son of Grid Engine), however Wynton will be transitioning to the Slurm job scheduler in Q4 2020
The scheduler coordinates distributing jobs, which get submitted as batch scripts, to the compute nodes of the cluster
Example SGE job submission qsub -l h_rt=00:01:00 -l mem_free=1G my_job.sge (replace time, memory and file name with your choices)
- my_job.sge = the batch script file to be submitted
- -l h_rt = maximum runtime (hh:mm:ss or seconds)
- -l mem_free = maximum memory (K for kilobytes, M for megabytes, G for gigabytes)
Jobs always run on the compute nodes whether they are submitted from a login node or from a development node.
To check on the job
- Current status: qstat or qstat -j 191442 (replace 191442 with the actual SGE job id)
- After the job ran successfully: grep "usage" my_job_sge.0284740 (replace the output file name with the actual output file name)
- After a failed job: tail -100000 /opt/sge/wynton/common/accounting | qacct -f -j 191442 (replace 191442 with the actual SGE job id)
How much memory to request when submitting a job?
- With experience and trial & error, you can estimate the memory requirements for various types of jobs
- Logs, reports and accountings can help provide clues
- Wynton is relatively forgiving on memory estimates
- If unsure, try 8GB and then increase/decrease accordingly
Tips on submitting jobs
- For intensive jobs during busy times, you can reserve resources for your job as soon as they become available by including this parameter -R y
- Compute nodes do not have access to the internet, i.e., you can not run jobs that include steps like downloading files from online resources.
- Development nodes DO have access to the internet.
- If your script or pipeline requires access to the internet, consider splitting up the work
  - run a script on a dev node that retrieves online files and then submits jobs to be run on compute nodes.
- Also cron jobs can be run on a dev node to periodically download files separate from compute-heavy jobs that can be submitted to compute nodes
To check the job queue metrics of the cluster, go to https://wynton.ucsf.edu/hpc/status/index.html
https://wynton.ucsf.edu/hpc/scheduler/submit-jobs.html

Parallel jobs

A parallel environment for multithreaded (SMP) jobs is available for use on the cluster
This environment must be used for all multithreaded jobs. Such jobs not running in this PE are subject to being killed by the cluster systems administrator without warning.
Example submission for a parallel BLAST job

#!/bin/bash
#
#$ -S /bin/bash
#$ -l arch=linux-x64    # Specify architecture, required
#$ -l mem_free=1G       # Memory usage, required.  Note that this is per slot
#$ -pe smp 2            # Specify parallel environment and number of slots, required
#$ -R yes               # SGE host reservation, highly recommended
#$ -cwd                 # Current working directory

blastall -p blastp -d nr -i in.txt -o out.txt -a $NSLOTS

Notes on the example

In the above example, the '-a' flag tells blastall the number of processors it should use.
$NSLOTS is the number of slots requested for the parallel environment
more information on parallel and MPI jobs, https://salilab.org/qb3cluster/Parallel_jobs

GPU scheduling

Compiling GPU applications
- The CUDA Toolkit is installed on the development nodes
- Several versions of CUDA are available via software modules. To see the currently available versions, run the command: module avail cuda
more information on GPU jobs, https://wynton.ucsf.edu/hpc/scheduler/gpu.html

Interactive sessions on Wynton HPC

It is currently not possible to request interactive jobs via the scheduler
There are dedicated development nodes (dev1, dev2, dev3, gpudev1) that can be used for short-term interactive development such as building software and prototyping scripts before submitting them to the scheduler.
Interactive python session
1) ssh to a login node
2) ssh to a dev node
3) type python3 to enter the Python REPL for an interactive session
4) when done, type exit() to quit session
Interactive R session
1) ssh to a login node
2) ssh to a dev node
3) type R to enter the R interactive session
4) when done, type q() to quit the session
Interactive MATLAB session
1) ssh to a login node
2) ssh to a dev node
3) type module load Sali matlab
4) type matlab

More information on working with

Python, https://wynton.ucsf.edu/hpc/howto/python.html
MATLAB, https://wynton.ucsf.edu/hpc/howto/matlab.html
R, https://wynton.ucsf.edu/hpc/howto/r.html
GUI apps / X-Windows / X2Go
- X2Go is accelerated remote desktop software
- It should be significantly faster than using X Windows tunneled through ssh
- To use X2Go on Wynton one will need to install the X2Go client on their computer

Best Practices

Backup your data if it is important
Login nodes (or dev nodes) to submit batch jobs to the cluster, dev nodes for interactive work
Use local scratch for staging data and computations
If using conda environments in Anaconda Python, this is best done inside a Singularity container
If writing many files to the file system, for example 1000's or more, avoid writing all the files to a single directory.
- instead, spread out the files into a number of different directories for better performance
For interactively using GUI applications, using X2Go will have better performance than X-forwarding

Troubleshooting tips

Check the job scheduler logs
- error log: unless otherwise specified, this will be in the directory that the job was launched from and the file name will be formatted as the job script name followed by .e<jobid>
- output log: unless otherwise specified, this will be in the directory that the job was launched and the file name will be formatted as the job script name followed by .o<jobid>.

Getting Additional Help

Getting Additional Help - wiki page

Getting Started - ucsf-wynton/tutorials GitHub Wiki

Table of Contents

Getting a Wynton HPC account

Connecting to Wynton HPC

Linux operating system on Wynton HPC

Using the Linux command line

Storage

Overview of the different kinds of nodes on Wynton

A little about Linux environment modules

Submitting a job to Wynton

Parallel jobs

GPU scheduling

Interactive sessions on Wynton HPC

Best Practices

Troubleshooting tips

Getting Additional Help

⚠️ GitHub.com Fallback ⚠️

Getting Started - ucsf-wynton/tutorials GitHub Wiki

Table of Contents

Getting a Wynton HPC account

Connecting to Wynton HPC

Linux operating system on Wynton HPC

Using the Linux command line

Storage

Overview of the different kinds of nodes on Wynton

A little about Linux environment modules

Submitting a job to Wynton

Parallel jobs

GPU scheduling

Interactive sessions on Wynton HPC

Best Practices

Troubleshooting tips

Getting Additional Help

⚠️ **GitHub.com Fallback** ⚠️

⚠️ GitHub.com Fallback ⚠️