Lab_1 - heelsplitter/Grootmyers_EPP_531_Applied_Genome_Analytics GitHub Wiki

Lab 1 - Introduction to Command Line

  1. Create a new screen named "LAB_1".
screen –S LAB_1
  1. Create a new directory named "Commandline_Lab".
mkdir Commandline_Lab
  1. Change into the directory you just created.
cd Commandline_Lab
  1. Check the contents of the directory.
ls
  1. Make a text file named "Text_1.txt" and write "Welcome to Applied Genome Analytics" into it.
nano Text_1.txt

Copy "Welcome to Applied Genome Analytics" into text file

Used nano

  1. Save the file.

ctrl+o --> enter --> ctrl+x

  1. Rename the file to "Myfile_1.txt".
mv Text_1.txt Myfile_1.txt
  1. Export the file from Sphinx to your Local computer.

Used WinSCP

  1. Delete the file "Myfile_1.txt".
rm Myfile_1.txt
  1. Import the file "Myfile_1.txt" from local to Sphinx.

Used WinSCP

  1. Download the data onto your computer. Data

Used Google Chrome and PeaZip

  1. Import the folder to the current working directory.

Used WinSCP

  1. count the number of files in the folder.
cd Data
ls | wc -l
  1. Peek into the file named "mgGenes.txt".
more mgGenes.txt
  1. Count the number of genes in the file "mgGenes.txt".
cut -f 1 mgGenes.txt | wc -w

525 genes

  1. List the first 10 and last 10 lines of file "mgGenes.txt".
sed -n '1,10p;516,525p' mgGenes.txt
  1. Count the no. of genes on positive strand in "mgGenes.txt".
grep -o '+' mgGenes.txt | wc -l

299 genes

  1. Count the no. of genes on negative strand in "mgGenes.txt".
grep -o '-' mgGenes.txt | wc -l

228 genes

  1. Compare it to the number of gene.

299 + 228 = 527. This is 2 greater than the number of genes.

  1. Find the flaw.
cat mgGenes.txt | cut -f1,2,5- >> test.txt 
grep -e '-' -e '+' test.txt > output.txt

The errors are the terms rpmG-2 and polC-2 in the 5th column. Searching within column 4 in previous steps would have avoided this problem, but would not have caught the flaw.

  1. Edit the file "mgGenes.txt". in the text editor and repeat step 17 & 18 and count the number of genes again.
nano mgGenes.txt
grep -o '+' mgGenes.txt | wc -l
grep -o '-' mgGenes.txt | wc -l

Number of genes now fits.

  1. Extract the "+" genes excluding the gene names into new file "Plus_genes.txt".
cut -f 1-4 mgGenes.txt | grep '+' > Plus_genes.txt
  1. Extract the "-" genes excluding the gene names into new file "Minus_genes.txt".
cut -f 1-4 mgGenes.txt | grep '-' > Minus_genes.txt
  1. Count the number of genes again in both the files.
grep -o '+' Plus_genes.txt | wc -l
grep -o '-' Minus_genes.txt | wc -l
  1. Extract line 250 from the file "mgGenes.txt".
sed -n '250p' mgGenes.txt
  1. Exit the screen.
screen –D LAB_1