3: Server tricks to make your life easier - SIWLab/Lab_Info GitHub Wiki

There's a lot to keep track of with 4 (or 5) servers, multiple software packages, multiple projects, multiple species, etc. But there's a few things you can do to make dealing with everything a bit easier.

~/.ssh/config

To log into each server, you need to either type out the whole server address and specify the port every time (don't do this), or you can go through your history to copy your last login. The latter gets messy if you do a lot of work locally. An easier way is to add to your ~/.ssh/config. This file is probably already at this spot on your computer, but you can add it if it's not.

You can tell your system to remember your ssh logins so you don't have to by adding this to ~/.ssh/config for each of your logins:

Host capsicum
        HostName capsicum.eeb.utoronto.ca
        Port 22
        User username
Host grandiflora
        HostName grandiflora.eeb.utoronto.ca
        Port 24
        User username
Host mustang
        HostName mustang.biol.mcgill.ca
        Port 26
        User username
Host ohta
	HostName ohta.eeb.utoronto.ca
	User username
Host gustave
	HostName gustave.eeb.utoronto.ca
	User username

replacing username with the username you were given for each (should correspond to your utor email). Remember to run source ~/.ssh/config every time you edit this file so the system will update.

Then to connect to a server, run ssh host and type your password when prompted, replacing host with your desired host e.g.:

esc3025-kent-wright:~ tvk$ ssh capsicum
Saving password to keychain failed
[email protected]'s password:

Note: On OSX, when you run ssh host, a window pops up asking to add your password to keychain, but if you enter it you'll either get an error message or it will just continue to pop up at every login. You can just press return when this window pops up and continue by entering your password on the command line. I haven't found a solution to this yet.

server alive interval to prevent broken pipes

append this to your config if you're getting a lot of broken pipes (use nohup, screen, or tmux for actually running jobs though...)

Host capsicum
        HostName capsicum.eeb.utoronto.ca
        Port 22
        User username
        ServerAliveInterval 60

Using tmux to avoid broken pipes

Tmux is a terminal multiplexer - basically it lets you open a terminal window, detach it, and keep your job running in the background so you can log off the server and close your laptop. Screen is similar and also good (see below). Here is a guide to getting started with tmux (which is already installed).

Using screen to avoid broken pipes

If you've spent a significant amount of time on any of the servers, you know that you get broken pipes often. This is bad if you're running a job and you get a broken pipe while it's still running. You can end up losing all of your progress or falsely thinking that you have results when your job never actually finished. In order to save lost time and protect against accidentally using unfinished data, you can use screen.

Screen functions by opening a new shell that is protected from broken pipes, and is thus separate from your shell session. This means it can run jobs uninterrupted in the background. You can begin a screen session simply by running

screen

You will see a refreshed terminal window. So what does this look like on the server? We can run top and see

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND                                                                                                                                                      
71644 tyler.k+  20   0 1683960 1.519g   1076 R  99.7  0.6  43:48.33 samtools                                                                                                                                                     
76600 tyler.k+  20   0  130972   2840   1280 R   1.0  0.0   0:21.38 top                                                                                                                                                          
69207 tyler.k+  20   0  135244   2228   1032 S   0.0  0.0   0:00.77 sshd                                                                                                                                                         
69211 tyler.k+  20   0  116636   3484   1796 S   0.0  0.0   0:00.43 bash                                                                                                                                                         
71180 tyler.k+  20   0  129976   1600    932 S   0.0  0.0   0:00.02 screen                                                                                                                                                       
71181 tyler.k+  20   0  116660   3452   1788 S   0.0  0.0   0:00.09 bash                                                                                                                                                         
71643 tyler.k+  20   0  113116   1448   1244 S   0.0  0.0   0:00.00 bash 

PID 71180 shows screen is running. You can also see that my current server session (sshd) is running separately. You can also see that the job I'm running on screen (samtools) is running apart from screen. So screen itself doesn't use much of any CPU or memory, but it functions as a separate shell. Once you have a job running on screen, you can detach yourself from that shell and let it run in the background with ctrl + A and then pressing d. The first part let's you give screen commands, and the second part says to detach. You can re-attach to a screen session with

screen -r

If you're running more than one screen, you may see something like

There are several suitable screens on:
	43972.pts-10.mustang	(Detached)
	39276.pts-10.mustang	(Detached)
	36362.pts-6.mustang	(Detached)
	34525.pts-6.mustang	(Detached)
	10156.pts-6.mustang	(Detached)
Type "screen [-d] -r [pid.]tty.host" to resume one of them.

We can see that I have 5 screen sessions running, and I can re-attach to any of them by specifying the name at the end of my re-attach command:

screen -r 43972.pts-10.mustang

will reattach to this specific screen.

You can also kill screens that you're no longer using by re-attaching to the screen and typing exit. It is good practice to reattach to screens in which you had a job running and exiting when you're done to avoid a buildup of screens running.

Installing software locally & using PATH & alias

You'll probably have to install some specialized software on one or more servers at some point, and you may run into permissions issues if the software you're installing normally installs into /local/bin (usually in C/C++ software that needs to be configured, made, and installed. You can install software to another folder by specifying

make --prefix=path-to-install-in install

instead of just running make install. For some programs like vcftools, you may have to run

./configure --prefix=path-to-install
make
make install

instead. You can then call this program from any directory if you add the PATH to your ~/.bashrc

cd
nano ~/.bashrc

If you don't have one, make the file (it should be in your home directory, but remember it is a hidden file so you won't see it with ls). Add the PATH to this file like this:

export PATH=<path-to-software-bin>:$PATH

as an example, replacing <path-to-software-bin> with your path. Then save and run:

source .bash_profile

You can now call this program in any directory. If your software doesn't need to install, e.g. it is run with executables in its directory, you can still add the program to your .bash_profile with an alias, such as

alias <EXECUTABLE>="<path-to-executable>"

replacing <EXECUTABLE> with your desired alias name (I recommend using the name of the executable) and <path-to-executable> with the path to your executable. Remember to source after saving any changes. Now you can call this executable in any directory.

Installing python and python packages locally

Sometimes it's better to have your own version of python if you're going to be using a specific version or using a lot of packages not already installed on the server. The easiest way to do this is to use Anaconda. This is a distribution of python that already includes a ton of useful science/maths packages and is incredibly easy to install and update. Just run the anaconda shell script and follow the instructions to install locally. The anaconda documentation also provides instructions for installing packages easily.

Using top to monitor your jobs

You can check on any jobs you're running using top. This will display an updating list of jobs, memory they're using, CPU they're using, how long they've been running, etc. This looks like

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND                                                                                                                                                      
24802 aplatts   20   0    4572    820    440 R 100.0  0.0 344:36.93 gzip                                                                                                                                                         
39796 tyler.k+  20   0 9883252 9.375g    852 R 100.0  1.9  99:05.67 HAPCUT                                                                                                                                                       
45458 tyler.k+  20   0  130704   2544   1260 R   1.0  0.0   0:00.11 top                                                                                                                                                          
40076 tyler.k+  20   0  130704   2584   1280 S   0.7  0.0   0:36.88 top                                                                                                                                                          
   90 root      20   0       0      0      0 S   0.3  0.0 176:10.45 rcu_sched                                                                                                                                                    
 3624 emilio.+  20   0  267632   1532   1052 S   0.3  0.0  27:44.79 postgres                                                                                                                                                     
    1 root      20   0   58828   4608   2812 S   0.0  0.0   1:22.07 systemd                                                                                                                                                      
    2 root      20   0       0      0      0 S   0.0  0.0   1:01.59 kthreadd                                                                                                                                                     
    3 root      20   0       0      0      0 S   0.0  0.0   1:31.01 ksoftirqd/0                                                                                                                                                  
    5 root       0 -20       0      0      0 S   0.0  0.0   0:00.00 kworker/0:0H    

If there's a lot of people running jobs (like on grandiflora), you can make yours easier to see by running

top -u <username>

instead, replacing <username> with your username on the server. This cleans up the output to something like

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND                                                                                                                                                      
39796 tyler.k+  20   0 9883252 9.375g    852 R 100.0  1.9 101:17.34 HAPCUT                                                                                                                                                       
40076 tyler.k+  20   0  130704   2584   1280 S   0.7  0.0   0:37.69 top                                                                                                                                                          
45572 tyler.k+  20   0  130704   2560   1272 R   0.7  0.0   0:00.12 top                                                                                                                                                          
10156 tyler.k+  20   0  129976   1600    928 S   0.0  0.0   0:00.07 screen                                                                                                                                                       
10157 tyler.k+  20   0  116536   3424   1788 S   0.0  0.0   0:00.11 bash                                                                                                                                                         
34220 tyler.k+  20   0  133280   2252   1040 S   0.0  0.0   0:01.22 sshd 

Now I only see the jobs that I'm running. To make things even easier, you probably don't want to remember to type out the option u and your username every time you want to check on a job. You can make an alias for this instead and save it in your .bash_profile (remember to source after you save). I use

alias topu="top -u tyler.kent"

So I can just run topu and see my jobs.

⚠️ **GitHub.com Fallback** ⚠️