unix_intro_tutorial - trinityrnaseq/BerlinTrinityWorkshop2018 GitHub Wiki
This introduction to unix tutorial is based on unixtut developed by '[email protected], 19 October 2001' http://www.ee.surrey.ac.uk/Teaching/Unix/, adapted here and continued to be made available under https://creativecommons.org/licenses/by-nc-sa/2.0/.
Below, unix commands will be shown prefixed with a unix prompt character '%'.
When you first login, you will be in your home directory.
To determine your current working directory, use the print working directory 'pwd' command:
% pwd
.
/home/training
The root or base directory is '/', and the set of nested directories stemming from the root path are separated by '/' characters.
/home
/training
The current directory can be referenced as '.', and the parent directory can be referenced as '..' . You'll often see this notation being used in commands.
Your home directory can be referenced as '~'.
List the files and directories located in your current directory by using the 'ls' command:
% ls
The 'ls' command has many options to provide more information about the files and directories. You can add the '--color' parameter so that it will color files and directories differently:
% ls --color
You can generate 'long' listings like so:
% ls -l
where you'll be able to see many attributes of each file (which we'll separately cover below), including permission settings, group ownership, file size, and the date it was created or last modified.
Certain files and directories may be hidden - essentially, any file or directory that begins with a '.' is hidden. You'll recall above that '.' and '..' are special notations for current and parent directories. You can reveal these hidden paths by listing all files with the 'ls -a' parameter invocation:
% ls -a
and you can combine parameters like so:
% ls -al --color
A very popular invocation of 'ls' is by 'ls -ltr', which provides a long-listing and orders files in descending order of their creation/modification time (so newest files are at the bottom of the list).
% ls -ltr
By default, ls will list the files and directories in the current working directory (defined by 'pwd'). You can specify the directory to list files. For example, from any location on the file system you can list the files in your home directory like so:
% ls ~
What would the following list correspond to?
% ls ~/..
Create a directory by using the 'mkdir' command:
% mkdir mydir
Now list your files to verify that the directory 'mydir' has been created (hint: use the 'ls --color' command)
Use the 'cd' to change directories. To enter the new 'mydir' directory, you would:
% cd mydir
Verify that you've changed directories (hint: use the 'pwd' command)
Command | Action |
---|---|
ls | list files and directories |
ls -a | list all files and directories, including hidden files |
ls -l | long listing of files and directories |
ls --color | color files and directories differently |
mkdir | make a new directory |
cd 'dir' | change to directory 'dir' |
cd ~ | change to home directory |
cd .. | change to the parent directory |
pwd | print current working directory |
The 'cp' copy command with syntax 'cp srcFile destFile' will copy the contents of srcFile to destFile.
To demonstrate this, we'll copy a file that already exists in our home area to our new 'mydir' directory.
Check that you're currently in your 'mydir' directory (hint: 'use pwd'). If you're not currently in your 'mydir' directory, then be sure to cd to it (hint: 'cd ~/mydir' ).
% cp ~/shared_ro/science.txt .
The above command will copy the 'science.txt' file from our /resources (the 'resources' directory in our home directory '') to our current directory ('.'), which should be 'mydir'.
Verify that the file has been copied (hint: use 'ls')
Create a backup copy of this file for safe keeping:
% cp science.txt science.txt.bak
Again, verify that you now see this new backup file in your current directory.
Instead of copying a file, especially if the file is very large and takes up lot of disk space, it's useful to instead create a shortcut to it (aka. symbolic link or symlink). To do this, you would use the 'ln' command with the '-s' (symbolic) parameter.
For example, we can create a symlink to our science.txt file called 'science.txt.shortcut' like so:
% ln -s science.txt science.txt.shortcut
This 'science.txt.shortcut' takes up effectively no disk space but provides you with full access to the contents of the file it links to. Removing or renaming the symlink does not change the original file, but opening and editing the contents is the same as opening and editing the original file, since it's simply a name that points to the original file.
Execute 'ls' and 'ls -l' to examine how the symlink is represented in both contexts.
The 'mv' command is used to move (ie. relocate) or rename files or directories. The syntax is 'mv src dest' where 'src' is the path of the source file or directory and 'dest' is the destination (new location or new name of the file or directory).
Create a 'backups' directory in your home area (hint: 'mkdir ~/backups') and move your science.txt.bak backup file to it for safe keeping.
% mv science.txt.bak ~/backups
Note, the above command could have referenced the destination as '/backups/' or '/backups/.', and they all mean the same thing. The choice here is stylistic, and you'll often see paths specified in different ways.
Verify that the file has been relocated (hint: 'ls' and 'ls ~/backups')
Instead of 'science.txt.bak', perhaps we decide to name the file more simply as 'science.bak'. Rename the file using the 'mv' command like so:
% mv ~/backups/science.txt.bak ~/backups/science.bak
Verify this filename changed.
The 'rm' and 'rmdir' commands can be used to remove files and directories, respectively.
To demonstrate the use of rm and rmdir, let's first create a temporary directory and a temporary file. We'll do this like so:
% mkdir tmpdir
% cp science.txt tmpdir/tmpfile.txt
Verify that you've created this directory and that it contains the new file.
Now, remove this new 'tmpfile.txt' using the 'rm' command:
% rm tmpdir/tmpfile.txt
Verify that this file no longer exists.
Remove this empty 'tmpdir' directory using the 'rmdir' command:
% rmdir tmpdir
Verify it no longer exists.
It's important to note that 'rmdir' will only allow you to remove directories that are completely empty. If any file (including a hidden file) exists, it will not let you. For example, let's do the following, similarly to as what we did above, but we'll make the tmpfile.txt a hidden file by prefixing the file name with '.':
% mkdir tmpdir
% cp science.txt tmpdir/.hidden.tmpfile.txt
Verify that the directory and that the hidden file exists (hint: 'ls -a tmpdir/')
Now, try removing that 'tmpdir' using 'rmdir'
% rmdir tmpdir
Did it work? Unlikely...
To remove a directory and all its contents, including all nested subdirectories and their contained files, you can use the 'rm -r' command.
% rm -r tmpdir
Note, use of 'rm -r' is powerful and dangerous. Be absolutely certain of what you're deleting with this command because once it's gone, it's gone, and a simple mistake can have serious consequences. So, be careful using this command, and be sure to back up important data in a safe place so if such a mistake does happen, you can gracefully recover from it.
There are several ways to explore the contents of a file, and some are more interactive than others.
The simplest way is to just dump the contents of a file to the screen using the 'cat' (concatenate) command.
% cat sciencetxt
Then, you can scroll the window to identify the content of interest. This is not generally the best way to do this when you have large text files.
Instead, you can use a pager utility such as the 'more' or 'less' unix utilities.
Examine the file using 'more':
% more science.txt
This will show you one page at a time, and you can press the space bar to go to the next page. Press 'q' to quit and return to the command prompt.
A more powerful tool than 'more' is 'less' (ironically enough). Try viewing this document using the 'less' utility:
% less science.txt
Here, you can use the space bar to go to the next page, but you can also scroll up and down in the document using the up/down arrow keys.
The 'head' and 'tail' utilities are useful for quickly examining the first or last lines of the file, respectively.
To view the first ten (default) lines of the file, run:
% head science.txt
To view the last ten (default) lines of the file, run:
% tail science.txt
If you have a specified number of lines that you want to view, use the '-n' parameter to indicate that number of lines to either head or tail. (ex. 'head -n5 science.txt')
Several tools exist for searching contents of the file.
The enormously useful 'less' utility also enables you to search for content in the document. Typing the '/' character will give you a search prompt within the 'less' application, and you can type in any text that you want to search, press enter, and it will bring you to the first occurrence of that search string in your file. Press 'n' to go to the next occurrence, and eventually hit the 'escape' key to exist the search. Also, press 'q' to quit the 'less' application and return to the unix command prompt.
For example:
% less science.txt
Search for the occurrence of the word 'science':
/science
Press 'n' to go to the next occurrence.
Once you've had enough, hit escape to exit the search, and type 'q' to quit the 'less' application.
If you want to retrieve all lines of a file that contain a match to some string or pattern, you can use 'grep'. For example, to retrieve all lines containing the word 'science' you would:
% grep science science.txt
You can include the '-i' parameter to ignore upper/lower case of letters:
% grep -i science science.txt
The 'grep' utility has many options and can be quite powerful.
To ignore upper/lower case distinctions, use the -i option, i.e. type
% grep -i science science.txt
To search for a phrase or pattern, you must enclose it in single quotes (the apostrophe symbol). For example to search for spinning top, type
% grep -i 'spinning top' science.txt
Some of the other options of grep are:
-v display those lines that do NOT match
-n precede each maching line with the line number
-c print only the total count of matched lines
Try some of them and see the different results. Don't forget, you can use more than one option at a time, for example, the number of lines without the words science or Science is
% grep -ivc science science.txt
A handy little utility is the wc command, short for word count. To do a word count on science.txt, type
% wc -w science.txt
To find out how many lines the file has, type
% wc -l science.txt
Command | Action |
---|---|
cp file1 file2 | copy file1 and call it file2 |
mv file1 file2 | move or rename file1 to file2 |
rm file | remove a file |
rmdir directory | remove a directory |
rm -r directory | recursively remove a directory |
cat file | display a file |
more file | display a file a page at a time |
head file | display the first few lines of a file |
tail file | display the last few lines of a file |
grep 'keyword' | file search a file for keywords |
wc file | count number of lines/words/characters in file |
In the previous section, we learned about how to view the contents of files. Here, you'll learn how to write files and feed the content of files as input to other programs.
To print a string of text, you would use the 'echo' command.
% echo I love science
.
I love science
You should notice the above output printed on your terminal. Printing text this way is useful if you want to communicate it to other tools or write it to a file.
To write to a file, you would use the output redirection operator '>'.
For example:
% echo Science is great > myfile.txt
and then view the contents of your new file 'myfile.txt' using any of the methods you've learned about earlier. What do you find?
You can append text to a file using the 'append' operator ('>>'). For example:
% echo What happens in Berlin stays in Berlin. >> myfile.txt
Now examine the contents of this file. How has it changed?
Many take their input from the standard input (that is, they read it from the keyboard).
For example, let's look at the unix 'sort' command. The command 'sort' alphabetically or numerically sorts a list and can read from standard input.
Type
% sort
Then type in the names of some vegetables. Press [Return] after each one.
carrot
beetroot
artichoke
^D (control d to stop)
The output will be
artichoke
beetroot
carrot
We use the '<' symbol to redirect the input of a command. Using '<' you can redirect the input to come from a file rather than the keyboard. For example, to sort the list of fruit, first create a file containing your list of fruit:
echo "artichoke
beetroot
carrot
strawberry
pineapple" > biglist
Then sort it like so:
% sort < biglist
and the sorted list will be output to the screen.
To output the sorted list to a file, type,
% sort < biglist > slist
Examine the contents of file 'slist', which should now reflect a sorted version of 'biglist'.
Instead of writing data to files, we might have the output of one process directly feed to the input of another process. We do this using pipes ('|').
For example, using two tools we discussed earlier - 'grep' and 'wc', let's cat the contents of file 'science.txt' to the standard input of grep, capture the lines that contain the word 'science', and output those lines into the 'wc' utility to count the number of lines that match.
% cat science.txt | grep -i science | wc -l
Using pipes is an efficient way to communicate between programs or steps in a computational workflow, and we'll see many examples of this in bioinformatics.
- Logical operators
Input / Output can also be controlled through the logical operators &&
(logical AND) and ||
(logical OR). &&
will return a successful exit code if both commands succeed. ||
will return a successful exit code if either command succeeds. Here are two examples that test for 1) the presence of a file and upon success echoes a message and 2) the absence of a directory and upon failure echoes a message.
% test -f biglist && echo file exists
% test -d non_existing_dir || echo "no such directory"
- Other flow controls
In the console, running a command will take the prompt away; _i.e. you won't be able to interact with the console - as long as the process you started is running. There are possibilities to have processes (especially long ones) running in the background, so that you can go on interacting with the environment and perform other tasks. To do so, you can append an &
at the end of your command.
% sleep 100000 &
The previous command will have a process sleeping for 100 thousand seconds in the background. It is possible to restore jobs running in the background to the foreground. This is done using the fg
command.
% fg
Now the sleep
process is in the foreground and you can't interact anymore with your environment. To send the process in the background again, we first need to interrupt it (Ctrl-Z
) before using the bg
function.
% Ctrl-Z
% bg
The job is now again in the background. To inspect all the current jobs in your environment, you can use the jobs -l
command.
% jobs -l
Finally, let's resume that useless sleep process to the foreground and terminate it.
% fg
% Ctrl-C
Command | Action |
---|---|
command > file | redirect standard output to a file |
command >> file | append standard output to a file |
command < file | redirect standard input from a file |
cat file1 file2 > file0 | concatenate file1 and file2 to file0 |
sort | sort data |
cmd1 && cmd 2 | execute cmd 2 after cmd 1 only if cmd1 succeeded |
cmd1 || cmd 2 | execute cmd 2 after cmd 1 whether or not cmd1 succeeded |
In your 'mydir' directory, type
% ls -l
You will see that you now get lots of details about the contents of your directory, similar to the example below.
Each file (and directory) has associated access rights, which may be found by typing ls -l. Also, ls -lg gives additional information as to which group owns the file (beng95 in the following example):
-rwxrw-r-- 1 ee51ab beng95 2450 Sept29 11:52 file1
In the left-hand column is a 10 symbol string consisting of the symbols d, r, w, x, -, and, occasionally, s or S. If d is present, it will be at the left hand end of the string, and indicates a directory: otherwise - will be the starting symbol of the string.
The 9 remaining symbols indicate the permissions, or access rights, and are taken as three groups of 3.
-
The left group of 3 gives the file permissions for the user that owns the file (or directory) (ee51ab in the above example)
-
the middle group gives the permissions for the group of people to whom the file (or directory) belongs (eebeng95 in the above example);
-
the rightmost group gives the permissions for all others.
The symbols r, w, etc., have slightly different meanings depending on whether they refer to a simple file or to a directory.
- r (or -), indicates read permission (or otherwise), that is, the presence or absence of permission to read and copy the file
- w (or -), indicates write permission (or otherwise), that is, the permission (or otherwise) to change a file
- x (or -), indicates execution permission (or otherwise), that is, the permission to execute a file, where appropriate
- r allows users to list files in the directory;
- w means that users may delete files from the directory or move files into it;
- x means the right to access files in the directory. This implies that you may read files in the directory provided you have read permission on the individual files.
So, in order to read a file, you must have execute permission on the directory containing that file, and hence on any directory containing that directory as a subdirectory, and so on, up the tree.
Some examples
-rwxrwxrwx a file that everyone can read, write and execute (and delete).
-rw------- a file that only the owner can read and write - no-one else
can read or write and no-one has execution rights (e.g. your
mailbox file).
chmod (changing a file mode)
Only the owner of a file can use chmod to change the permissions of a file. The options of chmod are as follows
Symbol | Meaning |
---|---|
u | user |
g | group |
o | other |
a | all |
r | read |
w | write (and delete) |
x | execute (and access directory) |
+ | add permission |
- | take away permission |
For example, to remove read write and execute permissions on the file biglist for the group and others, type
% chmod go-rwx biglist
This will leave the other permissions unaffected.
To give read and write permissions on the file biglist to all,
% chmod a+rw biglist
Try changing access permissions on the file science.txt and on the directory backups
Use ls -l to check that the permissions have changed.
The permissions consist of three rwx
blocks, one for the user, one for the group and one for others. If we think as a computer, giving a permission (+
) equals to setting a bit (a unit of information expressed as either a 0 or 1 in binary notation) to 1
. Removing the permission (-
) corresponds to setting that bit to 0
. Hence, we can represent the a rwx
block as three bits (r)0/1 (w)0/1 (x)0/1
.
Some small binary to base10 conversion before we proceed (more [there] (https://en.wikipedia.org/wiki/Binary_number#Binary_counting)):
binary | base10 |
---|---|
000 | 0 |
001 | 1 |
010 | 2 |
011 | 3 |
100 | 4 |
101 | 5 |
110 | 6 |
111 | 7 |
How is that useful to us? Well, we can replace every rwx
block by the corresponding base10 number to change the permissions. For example:
% chmod 666 biglist
will have the same effect as the last command we ran (chmod a+rw biglist
); i.e. setting the read and write permissions to all.
The 'top' utility is useful for monitoring running processes. Launch 'top' from the command line:
% top
.
top - 00:23:17 up 5:12, 2 users, load average: 1.05, 0.62, 0.52
Tasks: 14 total, 2 running, 12 sleeping, 0 stopped, 0 zombie
%Cpu(s): 1.9 us, 0.7 sy, 0.0 ni, 97.4 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
KiB Mem : 26414144+total, 26185260+free, 752588 used, 1536232 buff/cache
KiB Swap: 0 total, 0 free, 0 used. 26246793+avail Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
4322 training 20 0 250596 86236 2768 R 238.9 0.0 6:31.59 hmmscan
5064 training 20 0 38296 3628 3124 R 0.3 0.0 0:00.01 top
1 root 20 0 57492 18404 7228 S 0.0 0.0 0:00.77 supervisord
9 root 20 0 315656 29124 8272 S 0.0 0.0 0:00.63 gateone
38 root 20 0 65520 3180 2460 S 0.0 0.0 0:00.00 sshd
46 root 20 0 73692 4384 3164 S 0.0 0.0 0:00.10 apache2
50 www-data 20 0 362848 6296 2716 S 0.0 0.0 0:00.97 apache2
51 www-data 20 0 362848 6296 2716 S 0.0 0.0 0:00.97 apache2
296 root 20 0 95392 6620 5688 S 0.0 0.0 0:00.03 sshd
305 training 20 0 95392 3992 3016 S 0.0 0.0 0:00.30 sshd
306 training 20 0 19928 3720 3212 S 0.0 0.0 0:00.12 bash
5025 root 20 0 95392 6896 5964 S 0.0 0.0 0:00.02 sshd
5050 training 20 0 95392 3304 2376 S 0.0 0.0 0:00.00 sshd
5053 training 20 0 19920 3688 3196 S 0.0 0.0 0:00.00 bash
Type ctrl-c to exit 'top'.
The unix environment includes variables that set and accessed within software and are a convenient way to store information that may be shared by multiple processes. To access the list of currently set environmental variables, use the 'env' utility.
% env
You'll see a list of key=value pairs. To access the value of a variable from the terminal, prefix the variable name with a '$'. For example:
% echo $SHELL
You can set your own environmental variables using the 'export' command. For example:
% export MYVAR="my new environmental variable value"
Check that you've set your new environmental variable in a couple of ways (hit: echo or combine env and grep).
Note, environmental variables are set for the life of the current terminal session. To retain the setting for future terminal sessions, you'll need to add the above export command to your unix user start-up file (with the bash shell, it would be ~/.bashrc, and with tcsh it would be ~/.cshrc).
Aliases can be thought of as shortcuts to unix commands. For example, a commonly set alias is 'll' for the 'ls -ltr' command, simply because 'ls -ltr' is used very frequently and 'll' has fewer characters to type. To set this, you would:
% alias ll='ls -ltr'
and then type:
% ll
Similarly to environmental variables above, set this in your start-up file in order to make it more permanent.
Popular text editors used in unix are emacs, vim, and nano (or pico). Emacs and vim are considered to be 'expert friendly' and each has a nontrivial learning curve. Those that spend considerable time in the linux environment will tend to learn one or both of these editors. The nano (linux clone of the unix pico editor) is simple to use and more accessible to those just getting started in the unix environment.
Use nano to create a text file 'my_nano.txt':
% nano my_nano.txt
Add any text you'd like. At the bottom of the screen, you'll see a menu of options incuding ctrl-X to exit. When you're ready, type ctrl-X to exit. It should prompt you to answer 'yes' or 'no' to save the file - type yes, of course.
Verify this file now exists in your directory and view it to verify its contents.
The 'gzip' and 'gunzip' utilities can be used for compressing and decompressing files. Compressing files saves disk space, particularly for large text files (eg. large numbers of next generation read sequences)
To compress a file:
% gzip science.txt
This will generate a compressed version of the file as 'science.txt.gz' with a '.gz' extension.
How much space does this file consume on the file system? (hint: ls -l)
To decompress such a file, use the 'gunzip' utility:
% gunzip science.txt.gz
and you'll see that it is restored.
How much space does the decompressed file consume? Do you notice space savings via compression?
You can easily 'cat' or 'less' gzipped files using the handy 'zcat' or 'zless' utilities.
Try
% zcat science.txt.gz
and then
% zless science.txt.gz
These are very convenient, but instead, you could use 'gunzip -c' to decompress the contents of the file and output it to standard output, capturing it by the 'less' utility using pipes:
% gunzip -c science.txt.gz | less
Sometimes you have a URL to a large file that you want to retrieve from the command line. There are handy utilities for doing this, including 'wget' and 'curl'. You'll see these used often in bioinformatics.
For example, say we have a file on an ftp site that we want to retrieve from the command line (ex. ftp://ftp.broadinstitute.org/pub/users/bhaas/example_file.txt), we could pull it down using 'wget' like so:
% wget ftp://ftp.broadinstitute.org/pub/users/bhaas/example_file.txt
Does this file now appear in your working directory? Can you view the contents?
Transferring files from device to device or over the internet always involves the risk that data gets corrupted. Definitely not something you want to happen to your precious experiment results. Commonly, core/sequencing facilities will provide alongside your files, other file(s) that contain a checksum of these files. Most commonly, these will be md5
checksums. Let's retrieve the md5 for our example file:
% wget ftp://ftp.broadinstitute.org/pub/users/bhaas/example_file.txt.md5
Next, let's use the md5sum
utility to check that the file was successfully transferred.
% md5sum -c example_file.txt.md5
You should get an output similar to the following:
example_file.txt OK
md5sum
is the tool can also be used to create md5
checksums, e.g.
% md5sum example_file.txt
Saving this output to a file is the way we created the example_file.txt.md5
file and the way you can ensure a safe transfer of your data to your colleagues; i.e. if you are on the sender rather than the receiver end.
To access the manual for unix commands, use the 'man' command. For example:
% man cat
which will open up the manual in your default pager (which would be 'more' or 'less'), and you can explore the documentation for that command.
The 'screen' utility is useful for encapsulating your running environment in a protected shell such that you can detach from your environment but keep it alive and running well on the server, allowing you to later reattach to it and continue working as if you never left it in the first place.
To start a screen session, type:
% screen -S name
where 'name' is the name you want to give to your session.
After you do some work and have some processes running, you can detach from the screen session safely using the following combination:
% cntrl-a, d
To check if you're within a screen session, since it's not at all obvious when you are, you can check to see if there is an environmental variable 'STY' set. Any time you're within a screen session, this value will be set to the name of the screen session.
% echo $STY
If you're outside of a screen session and want to see what screen sessions are currently detached and running, you can type:
% screen -ls
To resume any session, type:
% screen -r session_name
where 'session_name' is the name of the session that you want to reattach to.
To exit a screen session, simply type 'exit' from within the session (instead of detaching from it).
- Command line Reference: Excellent PDF summarising the most important UNIX commands.
- Advanced Bash scripting [guide] (http://www.tldp.org/LDP/abs/html/)
- Nice online tutorial at Codecademy (that basic tutorial content is free but requires an account)