Lesson 6: Computational Exercise 2 - joslynnlee/CHEM-454 GitHub Wiki
Updates
Found out a few examples below from the previous lession
Find all files (-type f)
with extension .pdb
-name "*.pdb"
find ./ -type f -name "*.pdb"
Find all files (-type f)
with extension .sh
-name "*.sh"
in your directory called shell-lesson-data
find ./shell-lesson-data/ -type f -name "*.sh"
How To Loop Through Files in a Directory
Move into the directory that you want to run through files. For the example from Friday, we wanted to look in the creatures
:
for filename in basilisk.dat minotaur.dat unicorn.dat
> do
> head -n 2 $filename | tail -n 1
> done
Well use wildcard *
for the list of files, enter this into the command prompt:
for filename in *
> do
> head -n 2 $filename | tail -n 1
> done
We can use append for loops to print to a file. Since the loop will be going through each file name to generate the output, it will write to the new file creatures.txt
:
for filename in *
> do
> head -n 2 $filename | tail -n 1 >> creatures.txt
> done
To view the output, you can use cat
on your new file:
cat creatures.txt
Let's run a shell script from the Protein Data Bank
To download a PDB structure, you would got to https://www.rcsb.org/ and in the search box, enter the PDB ID. Toggle over to the Download Files option and select the format type. Let's say you have a list of PDB IDs that this repetitive process would take a long time:
7REE 3UGC 7REK 5F62 5F63 5F60 5F61 4PS5 6N7A 4O7A
All of these structures are crystal structures of the JAK2 kinase domain.
Thankfully someone wrote a script called batch-download.sh
which will go to the PDB for you to download. All you need as an input is a file containing a comma-separated list of PDB ids:
7REE, 3UGC, 7REK, 5F62, 5F63, 5F60, 5F61, 4PS5, 6N7A, 4O7A
Let's see how you can run the script in your command line.
First, you will need to go to the website: https://www.rcsb.org/docs/programmatic-access/batch-downloads-with-shell-script
Here you will download the script batch-download.sh
by clicking on “Obtain-the-batch-download script”. This may be downloaded in your Downloads folder.
In the terminal, make a new directory in your M:Drive called pdb-script
:
mkdir pdb-script
You will move the downloaded batch-download.sh
file into your pdb-script
directory. You may need to do this outside of the terminal because moving around your M:Drive is a bit tricky.
Now move into your pdb-script
directory
cd pdb-script
Check if you have the script available in your directory.
ls -lh
Once downloaded, make sure the script has execution permission by entering:
chmod +x batch_download.sh
Type in ls -lh
and compare the two columns to the previous output.
Next you will need to create a list_file.txt
which will contain comma-separated list of PDB id:
nano list_file.txt
In the nano editor, copy the following:
7REE, 3UGC, 7REK, 5F62, 5F63, 5F60, 5F61, 4PS5, 6N7A, 4O7A
You will need to hold CTRL-X
. And type Y
to save the file. The name you entered list_file.txt
will be the File Name to Write. Hit Enter
.
You can type ls
to see if the list_file.txt
was generated. Or use cat
to view the contents in the file.
ls
cat list_file.txt
To test if the script is ready, type:
./batch_download.sh -h
This will have give information about input and flags (-f, -s, etc) available. To download .pdb.gz
files you will enter the following:
./batch_download.sh -f list_file.txt -p
Type ls to see if you got all the files downloaded!
ls
If you were successful in downloading the list of files, take a screenshot of you directory with the list of files.
What to turn in
You can print an outline of the last 25 commands you submitted to the Kernal.
history 25 > PDB-submission.txt
Upload into Canvas
- screenshot/photo of the directory with downloaded batch files
- history print out
PDB-submission.txt
output file
There are multiple steps to perform but downloading scripts around many computational tools may be something you see in the future. This is a great understanding to have.