3: Server tricks to make your life easier - SIWLab/Lab_Info GitHub Wiki
There's a lot to keep track of with 4 (or 5) servers, multiple software packages, multiple projects, multiple species, etc. But there's a few things you can do to make dealing with everything a bit easier.
To log into each server, you need to either type out the whole server address and specify the port every time (don't do this), or you can go through your history to copy your last login.
The latter gets messy if you do a lot of work locally.
An easier way is to add to your ~/.ssh/config
.
This file is probably already at this spot on your computer, but you can add it if it's not.
You can tell your system to remember your ssh logins so you don't have to by adding this to ~/.ssh/config
for each of your logins:
Host capsicum
HostName capsicum.eeb.utoronto.ca
Port 22
User username
Host grandiflora
HostName grandiflora.eeb.utoronto.ca
Port 24
User username
Host mustang
HostName mustang.biol.mcgill.ca
Port 26
User username
Host ohta
HostName ohta.eeb.utoronto.ca
User username
Host gustave
HostName gustave.eeb.utoronto.ca
User username
replacing username
with the username you were given for each (should correspond to your utor email).
Remember to run source ~/.ssh/config
every time you edit this file so the system will update.
Then to connect to a server, run ssh host
and type your password when prompted, replacing host
with your desired host e.g.:
esc3025-kent-wright:~ tvk$ ssh capsicum
Saving password to keychain failed
[email protected]'s password:
Note: On OSX, when you run ssh host
, a window pops up asking to add your password to keychain, but if you enter it you'll either get an error message or it will just continue to pop up at every login. You can just press return
when this window pops up and continue by entering your password on the command line. I haven't found a solution to this yet.
append this to your config if you're getting a lot of broken pipes (use nohup, screen, or tmux for actually running jobs though...)
Host capsicum
HostName capsicum.eeb.utoronto.ca
Port 22
User username
ServerAliveInterval 60
Tmux is a terminal multiplexer - basically it lets you open a terminal window, detach it, and keep your job running in the background so you can log off the server and close your laptop. Screen is similar and also good (see below). Here is a guide to getting started with tmux (which is already installed).
If you've spent a significant amount of time on any of the servers, you know that you get broken pipes often.
This is bad if you're running a job and you get a broken pipe while it's still running.
You can end up losing all of your progress or falsely thinking that you have results when your job never actually finished.
In order to save lost time and protect against accidentally using unfinished data, you can use screen
.
Screen
functions by opening a new shell that is protected from broken pipes, and is thus separate from your shell session. This means it can run jobs uninterrupted in the background.
You can begin a screen session simply by running
screen
You will see a refreshed terminal window. So what does this look like on the server? We can run top
and see
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
71644 tyler.k+ 20 0 1683960 1.519g 1076 R 99.7 0.6 43:48.33 samtools
76600 tyler.k+ 20 0 130972 2840 1280 R 1.0 0.0 0:21.38 top
69207 tyler.k+ 20 0 135244 2228 1032 S 0.0 0.0 0:00.77 sshd
69211 tyler.k+ 20 0 116636 3484 1796 S 0.0 0.0 0:00.43 bash
71180 tyler.k+ 20 0 129976 1600 932 S 0.0 0.0 0:00.02 screen
71181 tyler.k+ 20 0 116660 3452 1788 S 0.0 0.0 0:00.09 bash
71643 tyler.k+ 20 0 113116 1448 1244 S 0.0 0.0 0:00.00 bash
PID 71180
shows screen is running. You can also see that my current server session (sshd
) is running separately. You can also see that the job I'm running on screen
(samtools
) is running apart from screen
. So screen
itself doesn't use much of any CPU or memory, but it functions as a separate shell. Once you have a job running on screen
, you can detach yourself from that shell and let it run in the background with ctrl + A
and then pressing d
. The first part let's you give screen
commands, and the second part says to detach. You can re-attach to a screen session with
screen -r
If you're running more than one screen
, you may see something like
There are several suitable screens on:
43972.pts-10.mustang (Detached)
39276.pts-10.mustang (Detached)
36362.pts-6.mustang (Detached)
34525.pts-6.mustang (Detached)
10156.pts-6.mustang (Detached)
Type "screen [-d] -r [pid.]tty.host" to resume one of them.
We can see that I have 5 screen
sessions running, and I can re-attach to any of them by specifying the name at the end of my re-attach command:
screen -r 43972.pts-10.mustang
will reattach to this specific screen.
You can also kill screens that you're no longer using by re-attaching to the screen and typing exit
.
It is good practice to reattach to screens in which you had a job running and exiting when you're done to avoid a buildup of screens running.
You'll probably have to install some specialized software on one or more servers at some point, and you may run into permissions issues if the software you're installing normally installs into /local/bin
(usually in C/C++ software that needs to be configured, made, and installed.
You can install software to another folder by specifying
make --prefix=path-to-install-in install
instead of just running make install
. For some programs like vcftools, you may have to run
./configure --prefix=path-to-install
make
make install
instead.
You can then call this program from any directory if you add the PATH
to your ~/.bashrc
cd
nano ~/.bashrc
If you don't have one, make the file (it should be in your home directory, but remember it is a hidden file so you won't see it with ls
).
Add the PATH
to this file like this:
export PATH=<path-to-software-bin>:$PATH
as an example, replacing <path-to-software-bin>
with your path. Then save and run:
source .bash_profile
You can now call this program in any directory.
If your software doesn't need to install, e.g. it is run with executables in its directory, you can still add the program to your .bash_profile
with an alias, such as
alias <EXECUTABLE>="<path-to-executable>"
replacing <EXECUTABLE>
with your desired alias name (I recommend using the name of the executable) and <path-to-executable>
with the path to your executable. Remember to source
after saving any changes. Now you can call this executable in any directory.
Sometimes it's better to have your own version of python if you're going to be using a specific version or using a lot of packages not already installed on the server. The easiest way to do this is to use Anaconda. This is a distribution of python that already includes a ton of useful science/maths packages and is incredibly easy to install and update. Just run the anaconda shell script and follow the instructions to install locally. The anaconda documentation also provides instructions for installing packages easily.
You can check on any jobs you're running using top
.
This will display an updating list of jobs, memory they're using, CPU they're using, how long they've been running, etc. This looks like
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
24802 aplatts 20 0 4572 820 440 R 100.0 0.0 344:36.93 gzip
39796 tyler.k+ 20 0 9883252 9.375g 852 R 100.0 1.9 99:05.67 HAPCUT
45458 tyler.k+ 20 0 130704 2544 1260 R 1.0 0.0 0:00.11 top
40076 tyler.k+ 20 0 130704 2584 1280 S 0.7 0.0 0:36.88 top
90 root 20 0 0 0 0 S 0.3 0.0 176:10.45 rcu_sched
3624 emilio.+ 20 0 267632 1532 1052 S 0.3 0.0 27:44.79 postgres
1 root 20 0 58828 4608 2812 S 0.0 0.0 1:22.07 systemd
2 root 20 0 0 0 0 S 0.0 0.0 1:01.59 kthreadd
3 root 20 0 0 0 0 S 0.0 0.0 1:31.01 ksoftirqd/0
5 root 0 -20 0 0 0 S 0.0 0.0 0:00.00 kworker/0:0H
If there's a lot of people running jobs (like on grandiflora), you can make yours easier to see by running
top -u <username>
instead, replacing <username>
with your username on the server. This cleans up the output to something like
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
39796 tyler.k+ 20 0 9883252 9.375g 852 R 100.0 1.9 101:17.34 HAPCUT
40076 tyler.k+ 20 0 130704 2584 1280 S 0.7 0.0 0:37.69 top
45572 tyler.k+ 20 0 130704 2560 1272 R 0.7 0.0 0:00.12 top
10156 tyler.k+ 20 0 129976 1600 928 S 0.0 0.0 0:00.07 screen
10157 tyler.k+ 20 0 116536 3424 1788 S 0.0 0.0 0:00.11 bash
34220 tyler.k+ 20 0 133280 2252 1040 S 0.0 0.0 0:01.22 sshd
Now I only see the jobs that I'm running.
To make things even easier, you probably don't want to remember to type out the option u and your username every time you want to check on a job.
You can make an alias for this instead and save it in your .bash_profile
(remember to source after you save). I use
alias topu="top -u tyler.kent"
So I can just run topu
and see my jobs.