Getting access to BioHPC Linux nodes - labordynamicsinstitute/replicability-training GitHub Wiki
You may be asked to compute on BioHPC's ECCO Linux nodes (f.i., for very long-running jobs, or for very large memory).
Request an account
Go to the BioHPC account request page, and create an account on the BioHPC cluster.
Then contact BioHPC support, requesting to join the ECCO group and lv39 (Lars') "lab" (ecco_lv39
).
Reserve a node
- Go to BioHPC Reservations page, choose "Restricted", and reserve a node:
- cbsuecco02: up to 7 days
- all others: up to 3 days
- in both cases, renewable
- Then go to 'My Reservations' and share the reservation with Lars (
lv39
) and others, if necessary.
Access a node
See Getting Started Guide and Remote Access. SSH is the best path (if you don't need graphical applications).
Note that, for off-campus access, you will need to use Cornell VPN. Instructions can be found here.
Setting up the right permissions
Run this ONCE the first time you ever access BioHPC:
echo "umask 007" >> $HOME/.bash_profile
You can check if its there by running this command:
grep umask $HOME/.bash*
Fixing permission issues
Sometimes, permissions get out of sync and prevent your collaborators from accessing your files. Do this to fix it.
cd /to/the/right/location
chmod -R g+rwX problematic_directory
Notes
Shared directory
Your default home directory (/home/NETID
) is not shared among group users (same as on CISER). Use /home/ecco_lv39
instead.
Using Stata
-
To use Stata version 16, execute one of the following commands before running
stata-mp
:/usr/local/stata16/stata-mp
orexport PATH=/usr/local/stata16:$PATH
-
Ensure that the tmp directory being used is running on the BioHPC /workdir space, by running the following commands before executing your program(s):
export STATATMP=/workdir/netid/tmp mkdir $STATATMP
-
Don't run Stata interactively via SSH. Instead, execute the program by the following:
stata-mp -b do master.do
Utilize tmux
Cheatsheet: https://gist.github.com/MohamedAlaa/2961058
- Login via SSH
- Launch tmux with a session name that makes sense, e.g.
tmux new -s AEAREP-xxxx
- Launch your Matlab, Stata, etc job
- Disconnect from tmux:
ctrl-b d
. You don't need to press this both Keyboard shortcut at a time. First press "Ctrl+b" and then press "d". - Log out of SSH
Next time:
- Login via SSH
- Reconnect to your tmux session:
tmux a -t AEAREP-xxxx
- If you forgot what session,
tmux ls
Sharing Tmux session
This must be done when launching tmux
:
tmux -S /tmp/shareds new -s AEAREP-xxxx
chgrp ecco_lv39 /tmp/shareds
The second user can now connect to the first user's tmux
screen by typing
tmux -S /tmp/shareds attach -t AEAREP-xxxx
Note: When logged into the compute node, you can call ps ux
to see all your running jobs.
Saving Tmux output
See https://unix.stackexchange.com/questions/26548/write-all-tmux-scrollback-to-a-file
Using Docker
The BioHPC docker
command is docker1
, see here for more details. All files that are shared via the -v
option must reside on /workdir/NETID
and cannot be shared across nodes. To get the files to /workdir/NETID
, the following commands can be used, assuming that your files are in /home/ecco_lv39/Workspace/aearep-$AEAREP
:
- Sync to
workdir
:
AEAREP=12345
[ -d /workdir/$(id -nu) ](/labordynamicsinstitute/replicability-training/wiki/--d-/workdir/$(id--nu)-) || mkdir /workdir/$(id -nu)
rsync -auv /home/ecco_lv39/Workspace/aearep-$AEAREP/ /workdir/$(id -nu)/aearep-$AEAREP/
- Sync back to shared drive (once computations are done, or at any time
AEAREP=12345
rsync -auv /workdir/$(id -nu)/aearep-$AEAREP/ /home/ecco_lv39/Workspace/aearep-$AEAREP/
Using Conda for Python package management
See Python tips
Transfer of Data to BioHPC (possibly OBSOLETE)
- The BioHPC instructions for using FileZilla are great for moving data from your personal workspace to BioHPC.
- In the event that you need to transfer data from CISER to BioHPC (CISER does not have FileZilla installed):
- First, open up a bash shell in the directory that holds the folder which you want to transfer to BioHPC.
- SFTP into BioHPC
sftp [email protected]
. Your password is the same that you use to login to the cbsuecco02 node (or the login node). cd
into the desired directory on the BioHPC node.- You need to first create the directory on BioHPC:
mkdir data
. - Use the put command to place the desired folder (i.e. "data") on BioHPC:
put -r data/
. - If you run into an error along the lines of
Can't find request for ID 31425
, try zipping up the files and just transferring the zip file. Once transferred, you can unzip on BioHPC (if you run into issues with the "unzip" command, try using 7z: i.e./programs/bin/util/7z x (ZIPFILE)
) - Give Lars access to my "workdir/mjd443" with
chmod -R a+rwX /workdir/netid
(this command is not permanent and should be run again after any edits to the directory).