EPP531_Applied Genome Analytics - AlinaPokhrel/Pokhrel-EPP531 GitHub Wiki

Lab 1:05-02-2024

  1. created a new screen named “LAB_1”
    screen -S LAB_1
  2. Created a new directory named “Commandline_Lab”
    mkdir Commandline_Lab
  3. changed into the directory ‘Commandline_Lab’
    cd Commandline_Lab
  4. Checked the content of the directory
    ls; ls-all; ls -a
  5. Created a text file named ‘Text_1.txt’ and wrote ‘welcome to Applied Genome Analytics’ into it
    nano Text_1.txt

    welcome to Applied Genome Analytics
    control+x, y
  6. Renamed the file ‘Text_1.txt’ to ‘Myfile_1.txt’
    mv Text_1.txt Myfile_1.txt
  7. Exported the file ‘Myfile_1.txt’ from Sphinx to local computer desktop
    In a new terminal:
    scp [email protected]:/pickett_sphinx/projects/EPP531_AGA/Pokhrel/Commandline_Lab/Myfile_1.txt /Users/alinapokhrel/desktop
  8. Deleted the file ‘Myfile_1.txt’ from server
    rm Myfile_1.txt
  9. Again imported file ‘Myfile_1.txt’ from local desktop to Sphinx
    scp ./Myfile_1.txt [email protected]:/pickett_sphinx/projects/EPP531_AGA/Pokhrel/Commandline_Lab
  10. Downloaded a folder named DATA onto the computer
  11. Imported the folder DATA to current working directory i.e. Commandline_Lab
    scp –r ./DATA [email protected]:/pickett_sphinx/projects/EPP531_AGA/Pokhrel/Commandline_Lab
  12. Counted the number of files in the folder
    ls -l | grep "^-" | wc -l 

    ls l gives detailed information about files; grep "^" filters the regular files starting with “-”; wc -l counts the number of lines corresponding to the number of files
  13. Peek into the file named ‘mgGenes.txt’
    less mgGenes.txt
  14. Count the number of genes in the file ‘mgGenes.txt’
    cat mgGenes.txt|grep "MG"| wc -l ; 525
  15. list of first 10 and last 10 lines of ‘mgGenes.txt’
    head mgGenes.txt
    tail mgGenes.txt
  16. Count the number of genes on positive strand in ‘mgGenes.txt’
    cat mgGenes.txt|grep "+"| wc -l

    299
  17. Count the number of genes on negative strand in ‘mgGenes.txt’
    cat mgGenes.txt|grep "-"| wc -l 

    227
  18. compared the number of positive and negative strands to the total number of genes
    echo "299+227" | bc

    526
  19. There was a flaw either in counting positive () or negative (-) strands as total number of genes was 525 but summing the positive and negative strands gave 526. The #16 and #17 step counted the number of "" and “-” in whole file, but the strand information is on only on the fourth column. As we peek into the file, we can see that there was a problem in column five of the file ‘mgGenes.txt’.So, to cut the column five to find out if there is any “+” or “-” sign in this column, following code was run:
    cut -f 5 mgGenes.txt|grep "+"

    there were no extra “+”
    cut -f 5 mgGenes.txt|grep "-"

    rpmG-2 and polC-2
  20. The file ‘mgGenes.txt’ was then edited in nano text editor to correct the flaw
  21. The “+” genes were extracted into a new file called ‘Plus_genes.txt’ excluding the gene names i.e. excluding column 5
    cut -f 1-4 mgGenes.txt |grep "+" > Plus_genes.txt
  22. The “-” genes were extracted into a new file called ‘Minus_genes.txt’ excluding the gene names i.e. excluding column 5
    cut -f 1-4 mgGenes.txt |grep "-" > Minus_genes.txt
  23. To extract line 250 from the file ‘mgGenes.txt’
    head -n 250 mgGenes.txt|tail -n1

    MG_228 274653 275135 + dhfR
⚠️ **GitHub.com Fallback** ⚠️