EPP531_Applied Genome Analytics - AlinaPokhrel/Pokhrel-EPP531 GitHub Wiki
Lab 1:05-02-2024
- created a new screen named “LAB_1”
screen -S LAB_1
- Created a new directory named “Commandline_Lab”
mkdir Commandline_Lab
- changed into the directory ‘Commandline_Lab’
cd Commandline_Lab
- Checked the content of the directory
ls; ls-all; ls -a
- Created a text file named ‘Text_1.txt’ and wrote ‘welcome to Applied Genome Analytics’ into it
nano Text_1.txt
welcome to Applied Genome Analytics
control+x, y - Renamed the file ‘Text_1.txt’ to ‘Myfile_1.txt’
mv Text_1.txt Myfile_1.txt
- Exported the file ‘Myfile_1.txt’ from Sphinx to local computer desktop
In a new terminal:
scp [email protected]:/pickett_sphinx/projects/EPP531_AGA/Pokhrel/Commandline_Lab/Myfile_1.txt /Users/alinapokhrel/desktop
- Deleted the file ‘Myfile_1.txt’ from server
rm Myfile_1.txt
- Again imported file ‘Myfile_1.txt’ from local desktop to Sphinx
scp ./Myfile_1.txt [email protected]:/pickett_sphinx/projects/EPP531_AGA/Pokhrel/Commandline_Lab
- Downloaded a folder named DATA onto the computer
- Imported the folder DATA to current working directory i.e. Commandline_Lab
scp –r ./DATA [email protected]:/pickett_sphinx/projects/EPP531_AGA/Pokhrel/Commandline_Lab
- Counted the number of files in the folder
ls -l | grep "^-" | wc -l
lsl gives detailed information about files; grep "^" filters the regular files starting with “-”; wc -l counts the number of lines corresponding to the number of files - Peek into the file named ‘mgGenes.txt’
less mgGenes.txt
- Count the number of genes in the file ‘mgGenes.txt’
cat mgGenes.txt|grep "MG"| wc -l ; 525
- list of first 10 and last 10 lines of ‘mgGenes.txt’
head mgGenes.txt tail mgGenes.txt
- Count the number of genes on positive strand in ‘mgGenes.txt’
cat mgGenes.txt|grep "+"| wc -l
299 - Count the number of genes on negative strand in ‘mgGenes.txt’
cat mgGenes.txt|grep "-"| wc -l
227 - compared the number of positive and negative strands to the total number of genes
echo "299+227" | bc
526 - There was a flaw either in counting positive () or negative (-) strands as total number of genes was 525 but summing the positive and negative strands gave 526. The #16 and #17 step counted the number of "" and “-” in whole file, but the strand information is on only on the fourth column. As we peek into the file, we can see that there was a problem in column five of the file ‘mgGenes.txt’.So, to cut the column five to find out if there is any “+” or “-” sign in this column, following code was run:
cut -f 5 mgGenes.txt|grep "+"
there were no extra “+”
cut -f 5 mgGenes.txt|grep "-"
rpmG-2 and polC-2 - The file ‘mgGenes.txt’ was then edited in nano text editor to correct the flaw
- The “+” genes were extracted into a new file called ‘Plus_genes.txt’ excluding the gene names i.e. excluding column 5
cut -f 1-4 mgGenes.txt |grep "+" > Plus_genes.txt
- The “-” genes were extracted into a new file called ‘Minus_genes.txt’ excluding the gene names i.e. excluding column 5
cut -f 1-4 mgGenes.txt |grep "-" > Minus_genes.txt
- To extract line 250 from the file ‘mgGenes.txt’
head -n 250 mgGenes.txt|tail -n1
MG_228 274653 275135 + dhfR