unix_intro_tutorial - trinityrnaseq/BerlinTrinityWorkshop2018 GitHub Wiki

Introduction to Unix

This introduction to unix tutorial is based on unixtut developed by '[email protected], 19 October 2001' http://www.ee.surrey.ac.uk/Teaching/Unix/, adapted here and continued to be made available under https://creativecommons.org/licenses/by-nc-sa/2.0/.

Below, unix commands will be shown prefixed with a unix prompt character '%'.

Files and directories

pwd - print working directory

When you first login, you will be in your home directory.

To determine your current working directory, use the print working directory 'pwd' command:

%  pwd

.

/home/training

The root or base directory is '/', and the set of nested directories stemming from the root path are separated by '/' characters.

/home
     /training

The current directory can be referenced as '.', and the parent directory can be referenced as '..' . You'll often see this notation being used in commands.

Your home directory can be referenced as '~'.

ls - list files and directories

List the files and directories located in your current directory by using the 'ls' command:

% ls

The 'ls' command has many options to provide more information about the files and directories. You can add the '--color' parameter so that it will color files and directories differently:

% ls --color

You can generate 'long' listings like so:

% ls -l

where you'll be able to see many attributes of each file (which we'll separately cover below), including permission settings, group ownership, file size, and the date it was created or last modified.

Certain files and directories may be hidden - essentially, any file or directory that begins with a '.' is hidden. You'll recall above that '.' and '..' are special notations for current and parent directories. You can reveal these hidden paths by listing all files with the 'ls -a' parameter invocation:

%  ls -a

and you can combine parameters like so:

% ls -al --color

A very popular invocation of 'ls' is by 'ls -ltr', which provides a long-listing and orders files in descending order of their creation/modification time (so newest files are at the bottom of the list).

% ls -ltr

By default, ls will list the files and directories in the current working directory (defined by 'pwd'). You can specify the directory to list files. For example, from any location on the file system you can list the files in your home directory like so:

% ls ~

What would the following list correspond to?

% ls ~/..

mkdir - making directories

Create a directory by using the 'mkdir' command:

% mkdir mydir

Now list your files to verify that the directory 'mydir' has been created (hint: use the 'ls --color' command)

cd - changing directories

Use the 'cd' to change directories. To enter the new 'mydir' directory, you would:

%  cd mydir

Verify that you've changed directories (hint: use the 'pwd' command)

Summary - Files and Directories

Command Action
ls list files and directories
ls -a list all files and directories, including hidden files
ls -l long listing of files and directories
ls --color color files and directories differently
mkdir make a new directory
cd 'dir' change to directory 'dir'
cd ~ change to home directory
cd .. change to the parent directory
pwd print current working directory

Operating on Files

cp - copying files

The 'cp' copy command with syntax 'cp srcFile destFile' will copy the contents of srcFile to destFile.

To demonstrate this, we'll copy a file that already exists in our home area to our new 'mydir' directory.

Check that you're currently in your 'mydir' directory (hint: 'use pwd'). If you're not currently in your 'mydir' directory, then be sure to cd to it (hint: 'cd ~/mydir' ).

% cp ~/shared_ro/science.txt .

The above command will copy the 'science.txt' file from our /resources (the 'resources' directory in our home directory '') to our current directory ('.'), which should be 'mydir'.

Verify that the file has been copied (hint: use 'ls')

Create a backup copy of this file for safe keeping:

% cp science.txt  science.txt.bak

Again, verify that you now see this new backup file in your current directory.

'ln -s' - symbolic links

Instead of copying a file, especially if the file is very large and takes up lot of disk space, it's useful to instead create a shortcut to it (aka. symbolic link or symlink). To do this, you would use the 'ln' command with the '-s' (symbolic) parameter.

For example, we can create a symlink to our science.txt file called 'science.txt.shortcut' like so:

%  ln -s science.txt science.txt.shortcut

This 'science.txt.shortcut' takes up effectively no disk space but provides you with full access to the contents of the file it links to. Removing or renaming the symlink does not change the original file, but opening and editing the contents is the same as opening and editing the original file, since it's simply a name that points to the original file.

Execute 'ls' and 'ls -l' to examine how the symlink is represented in both contexts.

Moving or Renaming Files

The 'mv' command is used to move (ie. relocate) or rename files or directories. The syntax is 'mv src dest' where 'src' is the path of the source file or directory and 'dest' is the destination (new location or new name of the file or directory).

Create a 'backups' directory in your home area (hint: 'mkdir ~/backups') and move your science.txt.bak backup file to it for safe keeping.

% mv science.txt.bak ~/backups

Note, the above command could have referenced the destination as '/backups/' or '/backups/.', and they all mean the same thing. The choice here is stylistic, and you'll often see paths specified in different ways.

Verify that the file has been relocated (hint: 'ls' and 'ls ~/backups')

Instead of 'science.txt.bak', perhaps we decide to name the file more simply as 'science.bak'. Rename the file using the 'mv' command like so:

% mv ~/backups/science.txt.bak ~/backups/science.bak

Verify this filename changed.

Removing files and directories

The 'rm' and 'rmdir' commands can be used to remove files and directories, respectively.

To demonstrate the use of rm and rmdir, let's first create a temporary directory and a temporary file. We'll do this like so:

%  mkdir tmpdir
%  cp science.txt tmpdir/tmpfile.txt

Verify that you've created this directory and that it contains the new file.

Now, remove this new 'tmpfile.txt' using the 'rm' command:

% rm tmpdir/tmpfile.txt

Verify that this file no longer exists.

Remove this empty 'tmpdir' directory using the 'rmdir' command:

% rmdir tmpdir

Verify it no longer exists.

It's important to note that 'rmdir' will only allow you to remove directories that are completely empty. If any file (including a hidden file) exists, it will not let you. For example, let's do the following, similarly to as what we did above, but we'll make the tmpfile.txt a hidden file by prefixing the file name with '.':

%  mkdir tmpdir
%  cp science.txt tmpdir/.hidden.tmpfile.txt

Verify that the directory and that the hidden file exists (hint: 'ls -a tmpdir/')

Now, try removing that 'tmpdir' using 'rmdir'

% rmdir tmpdir

Did it work? Unlikely...

To remove a directory and all its contents, including all nested subdirectories and their contained files, you can use the 'rm -r' command.

% rm -r tmpdir

Note, use of 'rm -r' is powerful and dangerous. Be absolutely certain of what you're deleting with this command because once it's gone, it's gone, and a simple mistake can have serious consequences. So, be careful using this command, and be sure to back up important data in a safe place so if such a mistake does happen, you can gracefully recover from it.

Examining the contents of a file

There are several ways to explore the contents of a file, and some are more interactive than others.

cat

The simplest way is to just dump the contents of a file to the screen using the 'cat' (concatenate) command.

% cat sciencetxt

Then, you can scroll the window to identify the content of interest. This is not generally the best way to do this when you have large text files.

Instead, you can use a pager utility such as the 'more' or 'less' unix utilities.

more

Examine the file using 'more':

% more science.txt

This will show you one page at a time, and you can press the space bar to go to the next page. Press 'q' to quit and return to the command prompt.

less

A more powerful tool than 'more' is 'less' (ironically enough). Try viewing this document using the 'less' utility:

% less science.txt

Here, you can use the space bar to go to the next page, but you can also scroll up and down in the document using the up/down arrow keys.

The 'head' and 'tail' utilities are useful for quickly examining the first or last lines of the file, respectively.

To view the first ten (default) lines of the file, run:

% head science.txt

To view the last ten (default) lines of the file, run:

% tail science.txt

If you have a specified number of lines that you want to view, use the '-n' parameter to indicate that number of lines to either head or tail. (ex. 'head -n5 science.txt')

Searching the contents of a file

Several tools exist for searching contents of the file.

less is more

The enormously useful 'less' utility also enables you to search for content in the document. Typing the '/' character will give you a search prompt within the 'less' application, and you can type in any text that you want to search, press enter, and it will bring you to the first occurrence of that search string in your file. Press 'n' to go to the next occurrence, and eventually hit the 'escape' key to exist the search. Also, press 'q' to quit the 'less' application and return to the unix command prompt.

For example:

% less science.txt

Search for the occurrence of the word 'science':

/science

Press 'n' to go to the next occurrence.

Once you've had enough, hit escape to exit the search, and type 'q' to quit the 'less' application.

grep

If you want to retrieve all lines of a file that contain a match to some string or pattern, you can use 'grep'. For example, to retrieve all lines containing the word 'science' you would:

 %  grep science science.txt

You can include the '-i' parameter to ignore upper/lower case of letters:

 % grep -i science science.txt

The 'grep' utility has many options and can be quite powerful.

To ignore upper/lower case distinctions, use the -i option, i.e. type

% grep -i science science.txt

To search for a phrase or pattern, you must enclose it in single quotes (the apostrophe symbol). For example to search for spinning top, type

% grep -i 'spinning top' science.txt

Some of the other options of grep are:

-v display those lines that do NOT match
-n precede each maching line with the line number
-c print only the total count of matched lines 

Try some of them and see the different results. Don't forget, you can use more than one option at a time, for example, the number of lines without the words science or Science is

% grep -ivc science science.txt

wc (word count)

A handy little utility is the wc command, short for word count. To do a word count on science.txt, type

% wc -w science.txt

To find out how many lines the file has, type

% wc -l science.txt

Summary - Operations on Files

Command Action
cp file1 file2 copy file1 and call it file2
mv file1 file2 move or rename file1 to file2
rm file remove a file
rmdir directory remove a directory
rm -r directory recursively remove a directory
cat file display a file
more file display a file a page at a time
head file display the first few lines of a file
tail file display the last few lines of a file
grep 'keyword' file search a file for keywords
wc file count number of lines/words/characters in file

Input / Output, Redirection and Pipes

In the previous section, we learned about how to view the contents of files. Here, you'll learn how to write files and feed the content of files as input to other programs.

echo - (print)

To print a string of text, you would use the 'echo' command.

% echo I love science

.

I love science

You should notice the above output printed on your terminal. Printing text this way is useful if you want to communicate it to other tools or write it to a file.

'>' output to file operator

To write to a file, you would use the output redirection operator '>'.

For example:

% echo Science is great >  myfile.txt

and then view the contents of your new file 'myfile.txt' using any of the methods you've learned about earlier. What do you find?

'>>' append to file operator

You can append text to a file using the 'append' operator ('>>'). For example:

% echo What happens in Berlin stays in Berlin. >> myfile.txt

Now examine the contents of this file. How has it changed?

'<' input from file operator

Many take their input from the standard input (that is, they read it from the keyboard).

For example, let's look at the unix 'sort' command. The command 'sort' alphabetically or numerically sorts a list and can read from standard input.

Type

% sort

Then type in the names of some vegetables. Press [Return] after each one.

carrot
beetroot
artichoke
^D (control d to stop)

The output will be

artichoke
beetroot
carrot

We use the '<' symbol to redirect the input of a command. Using '<' you can redirect the input to come from a file rather than the keyboard. For example, to sort the list of fruit, first create a file containing your list of fruit:

echo "artichoke
beetroot
carrot
strawberry
pineapple" > biglist

Then sort it like so:

% sort < biglist

and the sorted list will be output to the screen.

To output the sorted list to a file, type,

% sort < biglist > slist

Examine the contents of file 'slist', which should now reflect a sorted version of 'biglist'.

Pipes

Instead of writing data to files, we might have the output of one process directly feed to the input of another process. We do this using pipes ('|').

For example, using two tools we discussed earlier - 'grep' and 'wc', let's cat the contents of file 'science.txt' to the standard input of grep, capture the lines that contain the word 'science', and output those lines into the 'wc' utility to count the number of lines that match.

% cat science.txt | grep -i science | wc -l

Using pipes is an efficient way to communicate between programs or steps in a computational workflow, and we'll see many examples of this in bioinformatics.

Advanced

  • Logical operators

Input / Output can also be controlled through the logical operators && (logical AND) and || (logical OR). && will return a successful exit code if both commands succeed. || will return a successful exit code if either command succeeds. Here are two examples that test for 1) the presence of a file and upon success echoes a message and 2) the absence of a directory and upon failure echoes a message.

% test -f biglist && echo file exists

% test -d non_existing_dir || echo "no such directory"
  • Other flow controls

In the console, running a command will take the prompt away; _i.e. you won't be able to interact with the console - as long as the process you started is running. There are possibilities to have processes (especially long ones) running in the background, so that you can go on interacting with the environment and perform other tasks. To do so, you can append an & at the end of your command.

% sleep 100000 &

The previous command will have a process sleeping for 100 thousand seconds in the background. It is possible to restore jobs running in the background to the foreground. This is done using the fg command.

% fg

Now the sleep process is in the foreground and you can't interact anymore with your environment. To send the process in the background again, we first need to interrupt it (Ctrl-Z) before using the bg function.

% Ctrl-Z
% bg

The job is now again in the background. To inspect all the current jobs in your environment, you can use the jobs -l command.

% jobs -l

Finally, let's resume that useless sleep process to the foreground and terminate it.

% fg
% Ctrl-C

Summary - Input / Output, Redirection and Pipes

Command Action
command > file redirect standard output to a file
command >> file append standard output to a file
command < file redirect standard input from a file
cat file1 file2 > file0 concatenate file1 and file2 to file0
sort sort data
cmd1 && cmd 2 execute cmd 2 after cmd 1 only if cmd1 succeeded
cmd1 || cmd 2 execute cmd 2 after cmd 1 whether or not cmd1 succeeded

File system security (access rights)

In your 'mydir' directory, type

% ls -l

You will see that you now get lots of details about the contents of your directory, similar to the example below.

Each file (and directory) has associated access rights, which may be found by typing ls -l. Also, ls -lg gives additional information as to which group owns the file (beng95 in the following example):

-rwxrw-r-- 1 ee51ab beng95 2450 Sept29 11:52 file1

In the left-hand column is a 10 symbol string consisting of the symbols d, r, w, x, -, and, occasionally, s or S. If d is present, it will be at the left hand end of the string, and indicates a directory: otherwise - will be the starting symbol of the string.

The 9 remaining symbols indicate the permissions, or access rights, and are taken as three groups of 3.

  • The left group of 3 gives the file permissions for the user that owns the file (or directory) (ee51ab in the above example)

  • the middle group gives the permissions for the group of people to whom the file (or directory) belongs (eebeng95 in the above example);

  • the rightmost group gives the permissions for all others.

The symbols r, w, etc., have slightly different meanings depending on whether they refer to a simple file or to a directory.

Access rights on files.

  • r (or -), indicates read permission (or otherwise), that is, the presence or absence of permission to read and copy the file
  • w (or -), indicates write permission (or otherwise), that is, the permission (or otherwise) to change a file
  • x (or -), indicates execution permission (or otherwise), that is, the permission to execute a file, where appropriate

Access rights on directories:

  • r allows users to list files in the directory;
  • w means that users may delete files from the directory or move files into it;
  • x means the right to access files in the directory. This implies that you may read files in the directory provided you have read permission on the individual files.

So, in order to read a file, you must have execute permission on the directory containing that file, and hence on any directory containing that directory as a subdirectory, and so on, up the tree.

Some examples

-rwxrwxrwx	a file that everyone can read, write and execute (and delete).
-rw-------	a file that only the owner can read and write - no-one else
            can read or write and no-one has execution rights (e.g. your
            mailbox file).

Changing access rights

chmod (changing a file mode)

Only the owner of a file can use chmod to change the permissions of a file. The options of chmod are as follows

Symbol Meaning
u user
g group
o other
a all
r read
w write (and delete)
x execute (and access directory)
+ add permission
- take away permission

For example, to remove read write and execute permissions on the file biglist for the group and others, type

% chmod go-rwx biglist

This will leave the other permissions unaffected.

To give read and write permissions on the file biglist to all,

% chmod a+rw biglist

Try changing access permissions on the file science.txt and on the directory backups

Use ls -l to check that the permissions have changed.

Advanced usage

The permissions consist of three rwx blocks, one for the user, one for the group and one for others. If we think as a computer, giving a permission (+) equals to setting a bit (a unit of information expressed as either a 0 or 1 in binary notation) to 1. Removing the permission (-) corresponds to setting that bit to 0. Hence, we can represent the a rwx block as three bits (r)0/1 (w)0/1 (x)0/1.

Some small binary to base10 conversion before we proceed (more [there] (https://en.wikipedia.org/wiki/Binary_number#Binary_counting)):

binary base10
000 0
001 1
010 2
011 3
100 4
101 5
110 6
111 7

How is that useful to us? Well, we can replace every rwx block by the corresponding base10 number to change the permissions. For example:

% chmod 666 biglist

will have the same effect as the last command we ran (chmod a+rw biglist); i.e. setting the read and write permissions to all.

Process monitoring

The 'top' utility is useful for monitoring running processes. Launch 'top' from the command line:

% top

.

top - 00:23:17 up  5:12,  2 users,  load average: 1.05, 0.62, 0.52
Tasks:  14 total,   2 running,  12 sleeping,   0 stopped,   0 zombie
%Cpu(s):  1.9 us,  0.7 sy,  0.0 ni, 97.4 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
KiB Mem : 26414144+total, 26185260+free,   752588 used,  1536232 buff/cache
KiB Swap:        0 total,        0 free,        0 used. 26246793+avail Mem

PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
4322 training  20   0  250596  86236   2768 R 238.9  0.0   6:31.59 hmmscan
5064 training  20   0   38296   3628   3124 R   0.3  0.0   0:00.01 top
1 root      20   0   57492  18404   7228 S   0.0  0.0   0:00.77 supervisord
9 root      20   0  315656  29124   8272 S   0.0  0.0   0:00.63 gateone
38 root      20   0   65520   3180   2460 S   0.0  0.0   0:00.00 sshd
46 root      20   0   73692   4384   3164 S   0.0  0.0   0:00.10 apache2
50 www-data  20   0  362848   6296   2716 S   0.0  0.0   0:00.97 apache2
51 www-data  20   0  362848   6296   2716 S   0.0  0.0   0:00.97 apache2
296 root      20   0   95392   6620   5688 S   0.0  0.0   0:00.03 sshd
305 training  20   0   95392   3992   3016 S   0.0  0.0   0:00.30 sshd
306 training  20   0   19928   3720   3212 S   0.0  0.0   0:00.12 bash
5025 root      20   0   95392   6896   5964 S   0.0  0.0   0:00.02 sshd
5050 training  20   0   95392   3304   2376 S   0.0  0.0   0:00.00 sshd
5053 training  20   0   19920   3688   3196 S   0.0  0.0   0:00.00 bash

Type ctrl-c to exit 'top'.

Environmental variables

The unix environment includes variables that set and accessed within software and are a convenient way to store information that may be shared by multiple processes. To access the list of currently set environmental variables, use the 'env' utility.

% env

You'll see a list of key=value pairs. To access the value of a variable from the terminal, prefix the variable name with a '$'. For example:

%  echo $SHELL

You can set your own environmental variables using the 'export' command. For example:

%  export MYVAR="my new environmental variable value"

Check that you've set your new environmental variable in a couple of ways (hit: echo or combine env and grep).

Note, environmental variables are set for the life of the current terminal session. To retain the setting for future terminal sessions, you'll need to add the above export command to your unix user start-up file (with the bash shell, it would be ~/.bashrc, and with tcsh it would be ~/.cshrc).

Aliases

Aliases can be thought of as shortcuts to unix commands. For example, a commonly set alias is 'll' for the 'ls -ltr' command, simply because 'ls -ltr' is used very frequently and 'll' has fewer characters to type. To set this, you would:

% alias ll='ls -ltr'

and then type:

% ll

Similarly to environmental variables above, set this in your start-up file in order to make it more permanent.

Using a text editor in unix

Popular text editors used in unix are emacs, vim, and nano (or pico). Emacs and vim are considered to be 'expert friendly' and each has a nontrivial learning curve. Those that spend considerable time in the linux environment will tend to learn one or both of these editors. The nano (linux clone of the unix pico editor) is simple to use and more accessible to those just getting started in the unix environment.

Use nano to create a text file 'my_nano.txt':

% nano my_nano.txt

Add any text you'd like. At the bottom of the screen, you'll see a menu of options incuding ctrl-X to exit. When you're ready, type ctrl-X to exit. It should prompt you to answer 'yes' or 'no' to save the file - type yes, of course.

Verify this file now exists in your directory and view it to verify its contents.

File Compression

The 'gzip' and 'gunzip' utilities can be used for compressing and decompressing files. Compressing files saves disk space, particularly for large text files (eg. large numbers of next generation read sequences)

To compress a file:

%  gzip science.txt

This will generate a compressed version of the file as 'science.txt.gz' with a '.gz' extension.

How much space does this file consume on the file system? (hint: ls -l)

To decompress such a file, use the 'gunzip' utility:

% gunzip science.txt.gz

and you'll see that it is restored.

How much space does the decompressed file consume? Do you notice space savings via compression?

You can easily 'cat' or 'less' gzipped files using the handy 'zcat' or 'zless' utilities.

Try

%  zcat science.txt.gz

and then

% zless science.txt.gz

These are very convenient, but instead, you could use 'gunzip -c' to decompress the contents of the file and output it to standard output, capturing it by the 'less' utility using pipes:

% gunzip -c science.txt.gz |  less

Command-line retrieval of files from the web:

Sometimes you have a URL to a large file that you want to retrieve from the command line. There are handy utilities for doing this, including 'wget' and 'curl'. You'll see these used often in bioinformatics.

For example, say we have a file on an ftp site that we want to retrieve from the command line (ex. ftp://ftp.broadinstitute.org/pub/users/bhaas/example_file.txt), we could pull it down using 'wget' like so:

% wget ftp://ftp.broadinstitute.org/pub/users/bhaas/example_file.txt

Does this file now appear in your working directory? Can you view the contents?

File validation

Transferring files from device to device or over the internet always involves the risk that data gets corrupted. Definitely not something you want to happen to your precious experiment results. Commonly, core/sequencing facilities will provide alongside your files, other file(s) that contain a checksum of these files. Most commonly, these will be md5 checksums. Let's retrieve the md5 for our example file:

% wget ftp://ftp.broadinstitute.org/pub/users/bhaas/example_file.txt.md5

Next, let's use the md5sum utility to check that the file was successfully transferred.

% md5sum -c example_file.txt.md5

You should get an output similar to the following:

example_file.txt OK

md5sum is the tool can also be used to create md5 checksums, e.g.

% md5sum example_file.txt

Saving this output to a file is the way we created the example_file.txt.md5 file and the way you can ensure a safe transfer of your data to your colleagues; i.e. if you are on the sender rather than the receiver end.

Manual for unix tools (man pages)

To access the manual for unix commands, use the 'man' command. For example:

% man cat

which will open up the manual in your default pager (which would be 'more' or 'less'), and you can explore the documentation for that command.

The 'screen' utility

The 'screen' utility is useful for encapsulating your running environment in a protected shell such that you can detach from your environment but keep it alive and running well on the server, allowing you to later reattach to it and continue working as if you never left it in the first place.

To start a screen session, type:

%  screen -S name

where 'name' is the name you want to give to your session.

After you do some work and have some processes running, you can detach from the screen session safely using the following combination:

%    cntrl-a, d

To check if you're within a screen session, since it's not at all obvious when you are, you can check to see if there is an environmental variable 'STY' set. Any time you're within a screen session, this value will be set to the name of the screen session.

%  echo $STY

If you're outside of a screen session and want to see what screen sessions are currently detached and running, you can type:

%  screen -ls

To resume any session, type:

%  screen -r session_name

where 'session_name' is the name of the session that you want to reattach to.

To exit a screen session, simply type 'exit' from within the session (instead of detaching from it).

Extra reading

  • Command line Reference: Excellent PDF summarising the most important UNIX commands.
  • Advanced Bash scripting [guide] (http://www.tldp.org/LDP/abs/html/)
  • Nice online tutorial at Codecademy (that basic tutorial content is free but requires an account)
⚠️ **GitHub.com Fallback** ⚠️