The High Profile Computer at UNBC - Michael-D-Preston/PrestonLab GitHub Wiki

Introduction

Hi! Genome Quebec gave us a bunch of data and my baby little computer isnt cool enough to process it all in a meaningful amount of time so I (we) need something a bit bigger...

quick reference to how I'll format this document

  • This is the step
This is the code you'll run, note the copy button --->

This is the output

  • Step 2
    • These are bonus facts that I want to say,
    • or sub steps

Getting HPC access

You'll need UNBC high profile computer (HPC) access, go here and request access

Getting into the HPC

First and foremost, be in the UNBC wifi - I know it sucks but all this won't work unless you are. What if you're not at the university you say!? well load up a unbc desktop on a VPN and then follow along.

Download PuTTY on your computer. This is a program that allows us to talk into the HPC computer In PuTTY under host name enter 'username'@klinaklini.unbc.ca and click open

How familiar are you with the nervous system of a cephalopod? Octopi have 9 brains, a central one and then one for each arm. You have just entered the central brain ['name'@klinaklini]$. It's bad form to run commands or download stuff in the main brain Dont import files or run commands in the central ssh. So we want to go to one of the arm brains. Type in the command prompt:

ssh compute1

['name'@compute1]$

to go to arm brain 1 (There are 16 compute clusters total. Sweet! What if you don't want to be in compute1?

exit

And if you want to leave the HPC, again just type

exit

Depending on whos online with you, different compute's will be busy type

sinfo

PARTITION AVAIL TIMELIMIT NODES STATE NODELIST

defq* up infinite 1 down* compute3

defq* up infinite 15 idle compute[1-2,4-16]

to see if any of the compute clusters are active (and then don't choose to go to that one) All of the clusters are idle rn for me

So I will go to compute1 for the next couple things.

Penguins and moving around

You are on a linux cluster, its a different operating system like windows or mac, and you only get command line functionality. I'm gonna assume you know the common linux commands to move around and create files but if not here you go, and yknow what just do this tutorial but honestly this tutorial is really good too and has alot of indepth commands

So where are you? (this lists the current active directory)

pwd

/home/'name'

You are in the home directory, and just like being in the central brain, you don't want to store or do stuff here. Go to your research folder! (cd 'directory' moves you to a new directory, uhh go do a quick bash intro class to find out how to effectively use this command)

cd /data/researchHome/'name'/

any cool files in here? (this lists files)

ls

:(

Nothing! ugh

Side point. Copy and paste wont work in the command line prompt, you instead copy normally, but right click to paste. Also highlighting text in putty copys it

how to import files into your HPC folder so you can do cool stuff

You can download WinSCP to move files into the HPC. I haven't done this so idk how it works, or you can use in windows built in functionalities. Open windows folders, go to "this PC", there are three dots click em, and then click "map network drive", in folder write

\\research-files.unbc.ca\researchHome\'username'

Finish Now you should be able to put files here and then see them in your research space!

On a mac? (boo) Follow the directions under mount 'researchHome' locally Here

Running R

Finally time to run R, and know we are getting old school command line R, not any cringe R studio kinda stuff. to run R just... type "R"

R

output:

R version 4.4.1 (2024-06-14) -- "Race for Your Life" Copyright (C) 2024 The R Foundation for Statistical Computing Platform: x86_64-redhat-linux-gnu

R is free software and comes with ABSOLUTELY NO WARRANTY. You are welcome to redistribute it under certain conditions. Type 'license()' or 'licence()' for distribution details.

Natural language support but running in an English locale

R is a collaborative project with many contributors. Type 'contributors()' for more information and 'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or 'help.start()' for an HTML browser interface to help. Type 'q()' to quit R.

And you're in! you can now act like this is R (because it is!) I'll see you next episode when I figure out how to run R and the data2 pipeline!

PS. really do a quick runthrough of bash commands, if you wanna see how long your process has been running or how many clusters its using, feel free to do:

top

output:

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND

  169285 aball     20   0   20080   4608   3456 R   0.7   0.0   0:00.05 top

  1 root      20   0  171308  13452   9216 S   0.0   0.0   0:37.00 systemd

  2 root      20   0       0      0      0 S   0.0   0.0   0:06.00 kthreadd

  3 root       0 -20       0      0      0 I   0.0   0.0   0:00.00 rcu_gp

  4 root       0 -20       0      0      0 I   0.0   0.0   0:00.00 rcu_par_gp

  5 root       0 -20       0      0      0 I   0.0   0.0   0:00.00 slub_flushwq

PID is the task ID

user is uh you

Don't know what PR and PR, NI, VIRT, RES, SHR, S are, but % CPUs is how many cpus youre using! 100% = 1 cpu, 1500% is 15 CPUS!!.

%Mem is memory usage

TIME is how long your program has been running in minutes:seconds.miliseconds

Command is what youre running, you'll probably see R

Note: if you want to see R here you have to run the top command in bash WHILE youre running R. HOW! easy, just load up another instance of putty and login like normal, go to the same cluster and see!

How to download packages

Unfinished

On main compute node, module load r/4.3.0 then R then download see service request ticket