Lesson 6: Computational Exercise 2 - joslynnlee/CHEM-454 GitHub Wiki

Updates

Found out a few examples below from the previous lession

Find all files (-type f) with extension .pdb -name "*.pdb"

find ./ -type f -name "*.pdb" 

Find all files (-type f) with extension .sh -name "*.sh" in your directory called shell-lesson-data

find ./shell-lesson-data/ -type f -name "*.sh"

How To Loop Through Files in a Directory

Move into the directory that you want to run through files. For the example from Friday, we wanted to look in the creatures:

for filename in basilisk.dat minotaur.dat unicorn.dat
> do
>     head -n 2 $filename | tail -n 1
> done

Well use wildcard * for the list of files, enter this into the command prompt:

for filename in *
> do
>     head -n 2 $filename | tail -n 1
> done

We can use append for loops to print to a file. Since the loop will be going through each file name to generate the output, it will write to the new file creatures.txt:

for filename in *
> do
>     head -n 2 $filename | tail -n 1 >> creatures.txt
> done

To view the output, you can use cat on your new file:

cat creatures.txt

Let's run a shell script from the Protein Data Bank

To download a PDB structure, you would got to https://www.rcsb.org/ and in the search box, enter the PDB ID. Toggle over to the Download Files option and select the format type. Let's say you have a list of PDB IDs that this repetitive process would take a long time:

7REE 3UGC 7REK 5F62 5F63 5F60 5F61 4PS5 6N7A 4O7A

All of these structures are crystal structures of the JAK2 kinase domain.

Thankfully someone wrote a script called batch-download.sh which will go to the PDB for you to download. All you need as an input is a file containing a comma-separated list of PDB ids:

7REE, 3UGC, 7REK, 5F62, 5F63, 5F60, 5F61, 4PS5, 6N7A, 4O7A

Let's see how you can run the script in your command line.

First, you will need to go to the website: https://www.rcsb.org/docs/programmatic-access/batch-downloads-with-shell-script

Here you will download the script batch-download.sh by clicking on “Obtain-the-batch-download script”. This may be downloaded in your Downloads folder.

In the terminal, make a new directory in your M:Drive called pdb-script:

mkdir pdb-script

You will move the downloaded batch-download.sh file into your pdb-script directory. You may need to do this outside of the terminal because moving around your M:Drive is a bit tricky.

Now move into your pdb-script directory

cd pdb-script

Check if you have the script available in your directory.

ls -lh

Once downloaded, make sure the script has execution permission by entering:

chmod +x batch_download.sh

Type in ls -lh and compare the two columns to the previous output.

Next you will need to create a list_file.txt which will contain comma-separated list of PDB id:

nano list_file.txt

In the nano editor, copy the following:

7REE, 3UGC, 7REK, 5F62, 5F63, 5F60, 5F61, 4PS5, 6N7A, 4O7A

You will need to hold CTRL-X. And type Y to save the file. The name you entered list_file.txt will be the File Name to Write. Hit Enter.

You can type ls to see if the list_file.txt was generated. Or use cat to view the contents in the file.

ls
cat list_file.txt

To test if the script is ready, type:

./batch_download.sh -h

This will have give information about input and flags (-f, -s, etc) available. To download .pdb.gz files you will enter the following:

./batch_download.sh -f list_file.txt -p

Type ls to see if you got all the files downloaded!

ls

If you were successful in downloading the list of files, take a screenshot of you directory with the list of files.

What to turn in

You can print an outline of the last 25 commands you submitted to the Kernal.

history 25 > PDB-submission.txt

Upload into Canvas

  • screenshot/photo of the directory with downloaded batch files
  • history print out PDB-submission.txt output file

There are multiple steps to perform but downloading scripts around many computational tools may be something you see in the future. This is a great understanding to have.