Unix I: Command line first steps - BDC-training/VT25 GitHub Wiki

Course: VT25 Unix applied to genomic data (SC00036)

Most of these exercises are based on: Unix Tutorial and Learn Unix with applications to NGS data


Connect to the server

Open a Terminal (Mac users) or MobaXterm (Windows users). Connect to the server using the account name and password handed in during the practical. Use the program ssh:

ssh -Y your_account@server_address

Now let's practice some commands.

Changing your password

Change your password using passwd. Follow the instructions on the screen.

Manual pages

Try the man command with the ls command as input:

man ls

Q1. Which flag would you use to colorize the output?

To exit press q.

Graphical interfaces

It is possible to run programs with graphical interfaces on the server, if you used the -X option while connecting to the server. Try the following to get the graphical interface of a clock:

xclock

Close the clock. At the prompt type Ctrl + C to terminate the clock program.

There is a wide range of text editors. Those with an graphical user interface and those with a command line interface. Here we will try nedit for it's simplicity, type:

 nedit &

A window will open with the editor. Here you can type as with any text editor such as Notepad for instance. Save what you have typed (File-> Save or Ctrl+S) and close the window.

Q2. What does the & mean?

Listing files and directories

When you first login, your current working directory is your home directory. Check which files you have, you should see the file you just saved. Type:

 ls

Q3. What is the difference between "ls" and "ls -a"?

Making directories

Create a subdirectory in your home directory, type:

 mkdir unix
 ls

Q4. What are the permissions of the unix directory? Hint: use ls -l

Changing to a different directory

To change to the directory you have just made, type:

 cd unix

Check the content of the directory, it should be empty. Hint: use ls

Now, create another directory within the unix directory called backups. Check the content of the directory with ls -a. You should see two special directories called the current directory (.) and the parent directory (..).

Typing:

cd .

means stay where you are, in the unix directory. This may not seem very useful now, but using (.) as the name of the current directory saves a lot of typing, as we will see later on.

If you now, type:

cd ..

this will take you one directory up in the hierarchy. So you will be back to your home directory. Check with ls. NOTE: typing cd with no argument always returns you to your home directory. Useful when you are lost in the system!

Pathnames

Pathnames enable you to work out where you are in relation to the whole file-system. For example, to find out the absolute pathname of your home-directory, type cd to get back to your home-directory and then type:

pwd 

Now type:

 ls backups

You will get a message like this:

 ls: cannot access backups: No such file or directory

Why? backups is not in your current working directory. To use a command on a file (or directory) that is not in the current working directory (the directory you are currently in), you must either cd to the correct directory, or specify its full pathname. To list the contents of your backups directory, you must type:

 ls unix/backups 

Home directories can also be referred to by the tilde ~ character. It can be used to specify paths starting at your home directory. So typing:

ls ~/unix

will list the contents of your unix directory, no matter where you currently are in the file system.

Q5. What do you think the following commands will list?

 ls ~
 ls ~/..

Copying files

cp file1 file2 is the command which makes a copy of file1 in the current working directory and calls it file2

Let's copy a file to your unix directory. First, cd to your unix directory:

 cd ~/unix

Then at the UNIX prompt, type:

 cp /home/courses/Unix/intro/science.txt .
 ls

Note: Don't forget the dot at the end. Remember, in UNIX, the dot means the current directory. The above command means copy the file science.txt to the current directory, keeping the name the same.

Now create a backup of your science.txt file by copying it to a file called science.bak and check it is there by using ls

Moving files

mv file1 file2 moves (or renames) file1 to file2

This can also be used to rename a file, by moving the file to the same directory, but giving it a different name.

We are now going to move the file science.bak to your backup directory.

First, change directories to your unix directory. Then, inside the unix directory, type

 mv science.bak backups/.

Type ls and ls backups to see if it has worked.

Removing files and directories

rm (remove), rmdir (remove directory)

Inside your unix directory, type:

 cp science.txt tempfile.txt
 ls
 rm tempfile.txt
 ls

Try to remove the backups directory. You will not be able to since UNIX will not let you remove a non-empty directory.

Create a directory called temp using mkdir, then remove it using the rmdir command.

Displaying the contents of a file on the screen

clear (clear screen)

This command is quite handy when you have a busy screen. At the prompt, type:

 clear

This will clear all text and leave you with the % prompt at the top of the window.

cat (concatenate)

Let's display at our science.txt file:

 cat science.txt

As you can see, the file is longer than than the size of the window, so it scrolls past making it unreadable.

Try out the less command:

 less science.txt

Press the [space-bar] if you want to see another page, and type [q] if you want to quit reading. As you can see, less is used in preference to cat for long files.

It is common that we just want to see the first lines of a file. The head command helps us with this:

 head science.txt

Q6. How would you display the first 20 lines? Remember to use the man command to learn more about head

To have a look at the last lines, you can use the tail command. Have a try:

 tail science.txt

Searching the contents of a file

grep is one of many standard UNIX utilities. It searches files for specified words or patterns. First clear the screen, then type:

grep science science.txt

As you can see, grep has printed out each line containing the word science, in theory. Try typing:

 grep Science science.txt

Remember that grep command is case sensitive; it distinguishes between Science and science.

Q7. How many instances of science and Science do you find in the file? Remember to use the man command if you don't remember the flags that can be used with grep

Run the following command

 grep -ivc science science.txt

Q8. What is this command doing?

File permissions

Let's check how the permissions are set in /etc/passwd

 ls -l /etc/passwd

Q9. Who has access to read the file? Who can modify the file?

Check the permissions of /usr/bin/python3

Q10. Is everyone allowed to run the program?

Using the command touch create an empty file called test.sh. Then type the following:

 ./test.sh

This command is calling our file to be executed. Note that using ./ lets us point to the file in this directory. You should get a Permission denied error. If you check the permissions of test.sh you'll see that nobody can execute the file. Let's make it executable and run it:

 chmod +x test.sh
 ./test.sh  

Now you should not get any error!. However nothing is happening, since our file is empty.

Open test.sh with a text editor. Remember to add & at the end of the command line, so we can use the server in parallel. Write the following text:

  echo "Hi there! testing to run a program"

Save the changes and in the CMD line, run again the program:

  ./tesh.sh

Now apply what you have learned in the following exercises

Selecting top genes from a TSV file

  1. Create a directory called Some_exercises
  2. Create the following subdirectories within Some_exercises: Raw and Results
  3. Copy the VL_vs_Ctrl_DE.txt file from /home/courses/Unix/intro in the newly created directory
  4. Make a copy of the file under the Raw directory. Add original in the name file
  5. Inspect the file and describe the format of the file: number of columns and rows, type of data
  6. Extract the following genes:
    ENSG00000133742, ENSG00000196565, ENSG00000237568, ENSG00000119630, ENSG00000233705, 
    ENSG00000206178, ENSG00000166450, ENSG00000124749, ENSG00000251095, ENSG00000084734, 
    ENSG00000268460, ENSG00000178752, ENSG00000204644, ENSG00000174358
    
  7. Save them in a file called highly_expressed.tsv. Hint: this can be done one by one or all at the same time, which is faster. Don't forget to check the man pages of grep and give it a try
  8. Extract the following genes:
    ENSG00000142408, ENSG00000105366, ENSG00000234449, ENSG00000237647, ENSG00000138395, 
    ENSG00000170558, ENSG00000114744 
    
  9. Save them in a file alled lowly_expressed.tsv.
  10. Merge both file into a file called top_list_genes.tsv.
  11. Move this file under Results

Q11. How many genes do you have in this file?

Selecting sequences from a FASTA file

  1. Copy the rt.fa file from /home/courses/Unix/intro
  2. Inspect the file and describe the format
  3. How many sequences are there in the file? Hint: you can use grep, just find a suitable pattern
  4. Retrieve the sequence description and the nucleotide sequence of the gene annotated as LA17.RT. Hint: check the man pages of grep, there is a flag that can display X amount of lines after the matching lines
  5. Save it in a file called LA17.RT.fasta under the Results directory.
  6. Save the fasta sequences from all V clones to a file called V_clones.fasta, also under Results.
  7. Copy split_fasta.awk from /home/courses/Unix/intro. This is a tiny script that will save each sequence, of a fasta formated file, into its own file.
  8. Open the script in a text editor and replace * with rt.fa, so we process only this file and not other .fa files we might have in this location.
  9. At the end of the script, add echo done!, or some similar message so we know the program finished.
  10. Save the file and run it. Note: Remember to check the permissions! it has to be executable
  11. How many files where created? Hint: you can use regular expression (patterns) with ls, for instance ls *.txt will list all the files that end in .txt, like VL_vs_Ctrl_DE.txt
  12. Extract this motif TTCTGGGAAGT from all the newly created files.

Q12. Is the motif found in all sequences (in this case files)?

A little more of info

  1. Make a list of all your files and save them in a file called Files.txt

    Q13. How many files are there in your directory?

  2. Display all the commands you have been using and save them in a file called [Todays_date].history. Hint: use the history command. If you want to check the date, just type date. You can also format it in different ways, try date +%m%d%Y

    Q14. How many times did you use the ls command?

  3. Check who is logged in the server. Hint: use the who command

    Q15. List up to 5 users

  4. What are they running? Hint: use the top command

    Q16. List up to 5 commands


Home: Unix applied to genomic data


Modified by Marcela Dávila, 2018, 2021