Supercomputer Basics - byuawsfhtl/RLL_computer_vision GitHub Wiki

Tutorial Video

https://youtu.be/43iRcwxHqvk

Supercomputer SLURM commands

See the Office of Research Computing's website.

Creating a supercomputer account

Before you will be able to access the supercomputer you will need to create an account. This can be done at:

https://rc.byu.edu/

After following the link click request an account in the top right corner. Agree to the various terms and conditions and fill out the application.

Sponsor: jpp6 (Joe Price)

When asked about justification, main points to include are that you need to use the supercomputer’s GPU’s for image segmentation and handwriting recognition, and that you are a part of Dr. Price’s Computer Vision research team. My original justification was:

“I am working on the computer vision team for Dr. Joe Price's research group. I'll need access to the fslg_census, fslg_handwriting, and fslg_death groups. I will mainly be using Python to do image segmentation and handwriting recognition. I'll need access to GPUs on occasion, maybe 12GB of VRAM if that's available.”

Yours does not need to be this specific.

You will likely want access to the following file sharing groups:

fslg_census

fslg_census1940

fslg_death

fslg_handwriting

fslg_JoePriceResearch

After completing the form, Dr. Price will give approval which should give you access to the supercomputer.

(There are a few other steps once you’ve been approved, including setting up Duo for your supercomputer account. I don’t quite remember the steps following approval, but it should hopefully be fairly straightforward)

Logging onto and navigating the supercomputer

Typically, I do my work on the supercomputer through the command line.

If you are using Windows you will want to open the command prompt. If using Mac or Linux you will want to open the terminal.

In the terminal type in:

ssh [email protected]

your_username is your supercomputer account, but will most likely be your BYU net id

You will be prompted to type in your password and a duo verification code. When typing in these passwords, you will not see characters pop up on the screen. This is normal. Type in your password and hit enter. You should now be in the supercomputer.

The layout of the supercomputer is comprised of a series of directories and subdirectories (essentially folders that contain files and other folders)

To navigate the supercomputer (or really do anything on it), you type in commands into the terminal

Useful Navigation Commands:

pwd – prints your current location in the supercomputer

ls – lists everything contained in the directory (files and subdirectories. I typically remember this by remembering that ls stands for list stuff)

  • ls - lists everything in your current directory

  • ls fslg_groups – lists everything contained in the subdirectory fslg_census

cd path/to/subdirectory – Moves into a subdirectory

  • cd – moves you to the home directory (a good command if you get lost)

  • cd fslg_groups – moves you into the fslg_groups directory

  • cd fsl_groups/fslg_census/compute/projects – moves you into the projects directory, a subdirectory of compute (which is a subdirectory of fslg_census, etc.)

  • cd .. – moves you out of the subdirectory you are currently in

  • cd ../.. – moves you out of two subdirectories

  • cd ../../fslg_census1940 – moves you out of two subdirectories and into fslg_census1940

  • cd ./compute/projects – moves you into the projects directory from your current directory. The . is not necessary when normally navigating the supercomputer, but may be needed in python or sbatch scripts

Knowing how to use cd is essential to navigating the supercomputer. pwd and ls are good at figuring out where you are at and where you need to go

Often the first command you will do on the supercomputer is cd fslg_groups and then moving to whatever location the files for your task is in.

Creating, editing, copying, moving, or deleting a script/file/directory

Creating

Different methods are used to create files and directories

touch filename – create a file

  • touch segmentation.py – creates a python script called segmentation.py

  • touch run_segmentation.sh – creates an sh script called run_segmentation.sh

mkdir directoryname – create a directory

  • mkdir 1920_census_train – creates a directory named 1920_census_train

Editing

There are a variety of text editors that can be used to edit a file once it has been created (or immediately after it is created). The two most common are nano and vim.

Nano

Nano seems to be the easier of the two. It also contains instructions on how to use it at the bottom of the terminal after it is open.

nano filename – opens a file in nano to be edited

  • nano segmentation.py – opens segmentation.py in nano

In nano, ^ is equivalent to the Ctrl button on your keyboard

  • ^X – ctrl + x (allows you to exit nano)

Because the nano commands are listed at the bottom of terminal when opened (and because I am not as familiar with nano), I will not list the remainder of the nano commands here.

Vim

The other commonly used text editor is vim. It can be a bit strange (or frustrating) to get used to at first, but it has its benefits

vim filename - opens a file in vim to be edited

  • vim run_segmentation.sh – opens run_segmentation.sh in vim

Vim requires that you type in letters or combinations of letters in order to edit and save files. If you are unable to type in commands, press Esc to be able to type vim commands

  • i – allows you to edit the text of the file (Pressing it again changes it to a replace functionality when typing. Press i again in order to go back to inserting text)

  • :q! – quit out of vim (without saving)

  • :wq – save the file and quit out of vim

  • Shift + G – skip to the bottom of the script

There are many other nifty commands in vim (which you can look up online), but these are the most important ones

Copying

cp filename name_of_copy – makes a copy of a file with a new name (in the same directory)

  • cp segmentation.py new_segmentation.py – makes a copy of segmentation.py called new_segmentation.py

cp filename filepath/of/copy – makes a copy of the file in the location given

  • cp segmentation.py ../../projects/census_scripts – makes a copy of segmentation.py in the census_scripts directory

cp filename filepath/of/copy/name_of_copy – makes a copy of the file in the location given

  • cp segmentation.py ../../projects/census_scripts/new_segmentation.py – makes a copy of segmentation.py in the census_scripts directory called new_segmentation.py

In order to run cp commands with a directory, you need to include –r, -R, or --recursive (copy directories recursively)

  • cp –r census_scripts census_scripts_copy – makes a copy of the directory census_scripts called census_scripts_copy

Moving

mv filename filepath/for/move – moves filename to the designated filepath (Works the same for directories)

  • mv segmentation.py ../../projects/census_scripts – moves segmentation.py to the census_scripts directory

mv can also be used to rename files (and directories) if the filename (or directory) does not exist

mv filename new_filename

  • mv segmentation.py better_segmentation.py – renames segmentation.py to better_segmentation.py

Deleting

rm filename – deletes the file

  • rm segmentation.py – deletes segmentation.py

rm –r directoryname – deletes the directory and all files in it

  • rm –r census_scripts – deletes census_scripts and all files in it

Creating and using environments

An environment gives you a sandbox that allows you to download packages and use them on the supercomputer without downloading it everywhere on the supercomputer. It allows you to download packages while removing the risk of breaking something somewhere else.

Because we do not have download permissions for the entire supercomputer, we must use an environment to use any packages that are not on the supercomputer.

It is typically easiest to copy an environment that already contains the packages you need (or alternatively, just run that enviroment). The environment is in the form of a directory, so the process of copying the environment is the same

cp –R env_name new_env_copy

  • cp –R census_env ../projects/census_env_copy – makes a copy of census_env named census_env_copy in the projects directory

To create an environment, move into the directory where you would like the environment and type the following commands:

module load python/3.8 – allows python3 to be the base language to be used in your environment

virtualenv env_name – Creates a virtual environment

  • module load python/3.8

  • virtualenv segment_env – Creates a virtual environment named segment_env

In order to active (open) your enviroment, move to the directory where your enviroment is locate then use the following command:

source env_name/bin/activate – opens your environment

  • source segment_env/bin/activate opens environment segment_env

To install python packages into python, uses the pip command:

pip install package_name – installs a package into your library

  • pip install pandas – installs the pandas package into your library

  • pip install torch==1.8 – installs the torch version 1.8 package into your library

The process of installing detectron2 and some related packages is a bit more complicated, but should be covered in other pipeline documents.

If you ever want to exit your environment, simply use the command:

deactivate – Exit the environment you are currently in

Deleting an environment follows the same process as removing a directory:

rm –r env_name – deletes the enviroment

  • rm –r segment_env – deletes the enviroment segment_env

A much simpler way-- Using Conda Distributions

Conda (from Anaconda) is a great data science distribution that has pretty much any package you will ever need. When creating a new environment, it even gives you the most updated Python interpreter. This method can save a lot of headache when you need specialized libraries and packages like Tesseract because it is constantly updated by developers to ensure it includes dependencies that may be missed when using regular pip installs.

To create an environment in a shared folder:

  1. $ module load miniconda3/latest
  2. $ cona init bash (if in VS Code you will probably need to close the terminal and reopen to initialize)
  3. $ conda activate to activate the base conda environment
  4. $ conda create -p ~/fsl_group/path/env_name put the path where you want to save the environment and the name you want to save it as in the last extension (env_name).
  5. $ conda activate ~/fsl_group/path/env_name activates your new environment
  6. $ mamba install -c conda-forge opencv or whatever package you need. Note: mamba is a faster version of Conda and works with all Conda commands. You can use Conda or Mamba interchangebly. Conda-forge is crowd-sourced and most likely to have all of the correct dependencies you need. You can use plain conda install or pip install but they aren't likely to be updated or otherwise work well on the supercomputer.

Also note that Python is automatically installed or updated with any package you install from Conda.

To deactivate the conda environment use:

$ conda deactivate

Mounting the Supercomputer onto VS code:

  1. Go to Extensions on the Left Sidebar:

  2. Search for and install the "Remote - SSH" app.

  3. Then, go to the new "Remote Explorer" icon on the left sidebar and then click the "+" icon next to "ssh targets".

  4. log into the supercomputer as you would on Terminal with "ssh [email protected]"

  5. Hit enter on the next drop-down menu (C:\Users\schoolID7\ssh\config) and Hit Connect on the next popup menu.

  6. Go back and hit the "new window" icon next to ssh.rc.byu.edu and a new window will pop up.

  7. Select Linux if it asks what interface you would like to use

  8. (something else might happen in here where it asks for your password and duo security, just give it what it wants. If it crashes, close VS code and start again)

  9. Hit open folder like you would mount a regular file, and hit enter again if it prompts for the file path. Then type in your password and duo security code.

  10. Trust all the authors

And bobs your uncle, you should be in!

"Have I Made Myself Clear?"

"...Like BUDDA"

Shawn Spencer

Running basic scripts

Python scripts are fairly easy to run. However, if your python scripts require certain packages you will need to activate an environment that contains those packages in order for the script to run properly. (Creating and activating an environment will be covered in the next section.) After activating your environment, run your script by typing python before the name of the script.

python script.py – runs the python script

  • python directory_prep.py – runs the python script directory_prep.py

Shell scripts can be run in a variety of ways. To run a shell script that doesn’t utilize the supercomputer’s computing power:

sh script.sh

  • sh compress.sh – runs the shell script compress.sh

bash script.sh

  • bash compress.sh – runs the shell script compress.sh

Creating sbatch scripts

If you are running a script that requires more computing power (i.e. need GPU for segmentation of HWR) you need to run a sbatch shell script.

When creating an sbatch script it is typically easiest to copy an already existing sbatch script that performs a similar function. However, if you would like to create a new sbatch script from scratch, there is a useful job script generator at https://rc.byu.edu/documentation/slurm/script-generator that will give you code that you should include at the top of your script.

Some baseline parameters to use while creating a script that work well for segmentation and HWR:

  • Limit this job to one node: checked
  • Number of GPUs: 1
  • Memory per processor: 8192 MB
  • Walltime: 24 hours
  • Job is a test job: Leave unchecked
  • Job is preemptable: Leave unchecked
  • I am in a file sharing group and my group members need to read/modify my output files: Leave unchecked
  • Need Licenses?: Leave unchecked
  • Job name: Pick a name that isn’t too long but is descriptive enough to know what is being run
  • Receive email for job events: You can check any of those boxes, but I would recommend checking end and abort, with only checking abort as a bare minimum
  • Email address: type in your email address
  • Features: Don’t bother with any of those

Once you have entered all this information, copy and paste the code at the bottom of the webpage into your script

The basic structure of an sbatch script is as follows

  1. Copy and pasted job script from supercomputer website
  2. module load python/3.8
  3. activate your environment
  4. Run your (most likely) python script(s) along with any other needed shell commands

If you are running a split job (many of the segmentation jobs use these), you will want to place the following code into the start of your script

#SBATCH --array=1-1000 # number of albums

Place with the rest of the #SBATCH lines of code

# Compatibility variables for PBS. Delete if not needed.
export PBS_NODEFILE=`/fslapps/fslutils/generate_pbs_nodefile`
export PBS_JOBID=$SLURM_JOB_ID
export PBS_O_WORKDIR="$SLURM_SUBMIT_DIR"

Place after the #SBATCH lines of code but before the line that says # Set the max number of threads to use…

When running an sbatch script, it begins at the root directory and so you will need to make sure to type to type out filepaths to where the scripts you want to run within the sbatch script.

Looking at existing sbatch scripts is the best way of understanding what is going on in sbatch scripts and can be used as a base template for new sbatch scripts

Running sbatch scripts

To run an sbatch script:

sbatch script.sh – runs the sbatch shell script

  • sbatch run_segmentation.sh – runs the sbatch script run_segmentation.sh

After running an sbatch script you will be given a job number that will help you get information on the status of the script you are running

_squeue –u username – returns the job number and status of all the jobs you are running_

  • squeue –u lojin – returns the job number and status of all jobs that user lojin is running

Sometimes you will realize that you ran the wrong script, your script had errors, did something you did not intend, etc. You will want to cancel these scripts in order to preserve computing power and save time

_scancel job_number – cancels an sbatch job on the supercomputer_

  • scancel 3876598 – cancels job number 3876598

Jobs can also take user input. If you include $1 in your sbatch script, $1 will be equal to your first input, $2 equal to your second input, etc.

sbatch script.sh input – runs the script using the given input

  • sbatch run_segmentation.sh ohio – runs the sbatch script run_segmentation.sh with input ohio (most likely will segment images in an ohio folder, depending on how the script is coded)

File management

File management is one of the most important aspects of effective supercomputer use (and often the most neglected). Compressing unused files, deleting compressed files, and having an organized set of directories helps things stay running smoothly

_fslquota – command to check filespace on the shared folders_

tar commands are used to compress and decompress directories in order to save space. A directory with 100,000 images could be compressed into a tar file that would only take 1 file’s worth of space.

_tar –xvf directory.tar – extracts the files in a tar file and lists the name of each file extracted (good for checking if the command is working properly)_

  • tar –xvf 1920_census_seg_img.tar – extracts the files in census_seg_img.tar and lists the name of each file extracted

_tar –cvf directory.tar directory – compresses the files in the directory into a tar file and lists each file that is compressed (good for checking if the command is working properly)_

  • tar –cvf 1920_census_seg_img.tar 1920_census_seg_img – compresses the files in census_seg_img into a tar file named 1920_census_seg_img.tar and lists the name of each file compressed

_tar –xf directory.tar – extracts the files in a tar file without listing the name of each file extracted (runs faster)_

  • tar –xf 1920_census_seg_img.tar – extracts the files in census_seg_img.tar

_tar –cf directory.tar directory – compresses the files in the directory into a tar file without listing each file that is compressed (runs faster)_

  • tar –cf 1920_census_seg_img.tar – compresses the files in census_seg_img into a tar file named 1920_census_seg_img.tar

It is important to note that compressing a file on its own does not conserve space. The files that were compressed must be deleted afterwards. A good pipeline for file compression might look like this:

_tar –cvf 1920_census_seg_img.tar 1920_census_seg_img_

_rm –r 1920_census_seg_img_

The tar file containing the compressed images can be decompressed in order to recover the deleted images

Other helpful tips

Other useful tips

This section contains other useful information in relation to the supercomputer that doesn’t seem to fit in any particular category. If you discover something that you think is useful, please put it here!

Permissions

Sometimes you will not be able to move, delete, edit, etc. files because you do not have the correct permissions to do so. Generally, we will want to give full permissions for everyone to edit our files (for convenience, so please be careful), but full permissions is not always the default when creating files.

To check the permission status of all the files in a directory, use the command:

ls –l – list all files in the directory and provide additional information

You will see something like the following:

drwxrwsrwx 98 lojin fslg_census 36864 Jun 22 14:52 images

The first portion is the permissions which will contain letters or a dash.

The d stands for directory and indicates if a file is a directory or not

r stands for read, w stands for write, and x stands for execute. The first set of rwx refers to the creator of the file, the second set of three is the permissions for those in the file sharing group and the last set of three refers to others

drwxrw-r—would be a directory that the owner has all permissions, the file sharing group has read and write permissions, and others only have read permissions.

-rwxrwxrwx would be a file that everyone has all permissions to

Following the permissions is the number of files in the directory or a 1 if it is just a file, subdirectories only count as one file (I’m not 100% sure on this one), the creator of the file, the file sharing group it belongs to, the total number of files in the directory including files in the subdirectory or if it is a file possibly the amount of characters in the file (I’m not 100% sure on this one either), the date and time the file/directory was created, and last the name of the file.

To give permissions to everyone for a file:

chmod 777 filename – gives all permissions for the selected file

  • chmod 777 directory_prep.py – gives all permissions for directory_prep.py

You can change the numbers to grant certain types of permissions (what the numbers correlate to can be found online if you look up how to use chmod), but for the purposes of the lab, chmod 777 is typically enough for our purposes

The * Character

The * character is used as a substitute for all characters and can be used to select a group of files, or all files at once greatly reducing the amount of commands you need to type

rm slurm*.out – removes all .out files that begin with slurm and end with .out

mv *.jpeg ../images – moves all jpeg files into directory images

chmod 777 * - give all permissions for all files in the directory

There are lots of little tricks for file manipulations that involve *. Just think logically and find uses that save time.

Supercomputer GUI (Visual Supercomputer)

If you are having trouble with the supercomputer, the visual supercomputer might be able to help you complete some basic tasks on the supercomputer. It is also a great way to double check if scripts ran properly because you can see image files.

The visual supercomputer can be accessed through https://viz.rc.byu.edu/

You will be asked once again for your username, password, and a duo code.

Once logged in press the blue + in the top left corner of the screen and choose an option to access the supercomputer. MATE and Cinnamon appear to be the easiest to navigate.

From this point, you should be able to navigate the supercomputer in a way similar to using a regular computer.