Lab_1 - heelsplitter/Grootmyers_EPP_531_Applied_Genome_Analytics GitHub Wiki
Lab 1 - Introduction to Command Line
- Create a new screen named "LAB_1".
screen –S LAB_1
- Create a new directory named "Commandline_Lab".
mkdir Commandline_Lab
- Change into the directory you just created.
cd Commandline_Lab
- Check the contents of the directory.
ls
- Make a text file named "Text_1.txt" and write "Welcome to Applied Genome Analytics" into it.
nano Text_1.txt
Copy "Welcome to Applied Genome Analytics" into text file
Used nano
- Save the file.
ctrl+o --> enter --> ctrl+x
- Rename the file to "Myfile_1.txt".
mv Text_1.txt Myfile_1.txt
- Export the file from Sphinx to your Local computer.
Used WinSCP
- Delete the file "Myfile_1.txt".
rm Myfile_1.txt
- Import the file "Myfile_1.txt" from local to Sphinx.
Used WinSCP
- Download the data onto your computer. Data
Used Google Chrome and PeaZip
- Import the folder to the current working directory.
Used WinSCP
- count the number of files in the folder.
cd Data
ls | wc -l
- Peek into the file named "mgGenes.txt".
more mgGenes.txt
- Count the number of genes in the file "mgGenes.txt".
cut -f 1 mgGenes.txt | wc -w
525 genes
- List the first 10 and last 10 lines of file "mgGenes.txt".
sed -n '1,10p;516,525p' mgGenes.txt
- Count the no. of genes on positive strand in "mgGenes.txt".
grep -o '+' mgGenes.txt | wc -l
299 genes
- Count the no. of genes on negative strand in "mgGenes.txt".
grep -o '-' mgGenes.txt | wc -l
228 genes
- Compare it to the number of gene.
299 + 228 = 527. This is 2 greater than the number of genes.
- Find the flaw.
cat mgGenes.txt | cut -f1,2,5- >> test.txt
grep -e '-' -e '+' test.txt > output.txt
The errors are the terms rpmG-2 and polC-2 in the 5th column. Searching within column 4 in previous steps would have avoided this problem, but would not have caught the flaw.
- Edit the file "mgGenes.txt". in the text editor and repeat step 17 & 18 and count the number of genes again.
nano mgGenes.txt
grep -o '+' mgGenes.txt | wc -l
grep -o '-' mgGenes.txt | wc -l
Number of genes now fits.
- Extract the "+" genes excluding the gene names into new file "Plus_genes.txt".
cut -f 1-4 mgGenes.txt | grep '+' > Plus_genes.txt
- Extract the "-" genes excluding the gene names into new file "Minus_genes.txt".
cut -f 1-4 mgGenes.txt | grep '-' > Minus_genes.txt
- Count the number of genes again in both the files.
grep -o '+' Plus_genes.txt | wc -l
grep -o '-' Minus_genes.txt | wc -l
- Extract line 250 from the file "mgGenes.txt".
sed -n '250p' mgGenes.txt
- Exit the screen.
screen –D LAB_1