HPC resources through CQLS - BrodiePearson/POA_Computing GitHub Wiki

Contributors: Brodie Pearson & Ara Lee

To login use the following terminal command with your ONID username, and follow the resulting prompts

ssh [username]@hpc.cqls.oregonstate.edu

Viewing node details & logging into a node

Now you are logged into the CQLS cluster, you can use

sinfo

To see a list of nodes and their use (or idle) status. You can use

scontrol show nodes [optional node name]

to see more detail about a specific node.

Once you know which node you want to use, write a command like the one below to login into a specific node interactively (in this case youmu which is a Grace-Hopper Superchip with 72 CPUs and a GH200 GPU)

srun -n 1 -N 1 -w youmu -p ceoas-arm --propagate=NONE --pty /bin/bash

You should now be in an interactive session on the youmu node. If you are interested in utilizing GPUs, you can use

nvidia-smi

to get information about the GPU(s) available on the node, and their current usage.

Opening software on a node (Interactive session via command line)

You can open specific software by typing, for example, python. Note that for julia usage on the Grace-Hoppers, you must type two extra commands prior to opening julia due to the need for specific drivers for the GH200’s novel architecture:

export PATH="/local/cluster/CEOAS/aarch64/opt/julia/julia-1.10.0/bin:${PATH}"
export LD_LIBRARY_PATH=/local/cluster/CEOAS/aarch64/opt/julia/julia-1.10.0/lib

Submitting a job to a specific node

qsub run_script.sh

where run_script.sh is a file containing (with your username) a script to query the ewg node's GPU status (nvidia-smi) and julia version, and then submit the simulation in file test/test_sim.jl. The submission also keeps track of the start time, end time, and exuction time of the job. This script is likely more complex than scripts you would create, as we needed to specify internet access (http_proxy and https_proxy commands and specific julia paths for this node's architecture.

#!/bin/bash
#SBATCH --job-name=name
#SBATCH --output output.out
#SBATCH --partition=ewg
#SBATCH --nodelist=ewg

START=$(date +%s.%N)
echo "  Started on:           " `/bin/hostname -s` 
echo "  Started at:           " `/bin/date` 
echo "--------------------------------------------" 

export http_proxy=http://proxy-internal.ceoas.oregonstate.edu:3128
export https_proxy=http://proxy-internal.ceoas.oregonstate.edu:3128

# Set the PATH to the specific Julia version
# julia-1.8.5
# export PATH=/local/cluster/julia-1.8.5/bin:$PATH
# export LD_LIBRARY_PATH=/local/cluster/julia-1.8.5/lib:$LD_LIBRARY_PATH
# export CPATH=/local/cluster/julia-1.8.5/include:$CPATH

# # julia-1.10.1
# export PATH=/local/cluster/bin:$PATH
# export LD_LIBRARY_PATH=/local/cluster/lib:$LD_LIBRARY_PATH
# export CPATH=/local/cluster/include:$CPATH

# # julia-1.10.5
export PATH="/fs1/local/cqls/software/x86_64/julia-1.10.5/envs/julia/bin:$PATH"
export LD_LIBRARY_PATH="/fs1/local/cqls/software/x86_64/julia-1.10.5/envs/julia/lib:$LD_LIBRARY_PATH"

nvidia-smi
julia --version # checking Julia version
julia test/test_sim.jl 

echo "" 
echo "--------------------------------------------" 
echo "  Finished at:          " `date` 
END=$(date +%s.%N)

DIFF=$(echo "$END - $START" | bc)
HOURS=$(echo "scale=2; $DIFF / 3600" | bc)  
echo "  Execution time:        $HOURS hours"