Microbiome Helper 2 Useful code - LangilleLab/microbiome_helper GitHub Wiki

Authors: Robyn Wright Modifications by: NA

Please note: We think that everything here should work, but we are still testing/developing this so use with caution :)

Introduction

All of the code shown here is random snippets that have been used at various points by me - they have lived in a "Useful_code" document of my own for a long time, so I thought that they may as well migrate here.

Random useful code

htop

Check processes:

htop

Exit: F10 or fn+F10 (Mac)

Change file/folder permissions

sudo chmod -R ugo+rw folder_path #the -R flag will do this recursively to everything inside this folder
chmod -R ugo-rw 

Change file/folder owner:

sudo chown -R USER folder_path 

Count lines in a file

less file_name.txt | wc -l

Or files in a directory:

ls directory | wc -l

See most recent files added to current directory

ls -Artlh | tail -n 10

Show certain number of files in directory (default is 10)

ls | head -20

List size of files

du -h | sort -h
su -sh

Show free space on server

df -h

Zipping files

Unzip files:

gunzip raw_data/*gz
tar -xf filename

Zip files:

gzip file_to_zip
tar -czvf name-of-archive.tar.gz /path/to/directory-or-file

rsync

Using rsync to copy files:

rsync --partial --progress W0.tar.bz2 [email protected]

Combine files:

cat folder/*.fasta > combined.fasta

Convert fastq to fasta

sed -n '1~4s/^@/>/p;2~4p' cat_reads/cDNA-N1-neg.fastq > cat_reads/cDNA-N1-neg.fasta

Convert bam to fastq

samtools bam2fq SAMPLE.bam > SAMPLE.fastq

Split a file

split -b 200G hash.k2d hash_split #by size
split -l 1000 hash.k2d hash_split #by line number
#last argument here is the prefix to give the new files (no suffix given)

Make md5 sums

md5sum opts.k2d taxo.k2d unmapped.txt > kraken2_RefSeqV205_Complete_500GB_2.md5  

Edit text document with vi

vi $file_name enter text editor
i enter insert mode - make any changes
esc exit insert mode
:x save changes and exit document
:q exit document (no changes made)
:q! exit document without saving changes\

BLAST

Make database:

makeblastdb -in TARA_004_DCM_0.22-1.6.16SrRNA.miTAG.fna -dbtype nucl

BLAST:

blastn -db TARA/test_blast/TARA_004_DCM_0.22-1.6.16SrRNA.miTAG.fna -query Bacillus_16S.fna -out bacillus_test.txt -perc_identity 90 -outfmt 6

Barrnap

ssu-align dereplicated_marref_assembly_16S.fasta marref_align_DNA --dna #align
ssu-mask marref_align_DNA --pf 0.001 --pt 0 #mask
ssu-mask -a --stk2afa marref_align_DNA #stockholm > fasta
hmmbuild marref.hmm marref_align_DNA/marref_align_DNA.bacteria.mask.stk #build HMM with bacterial 16S

#looking at identifying 16S
cd tools/
git clone https://github.com/tseemann/barrnap.git
cd barrnap/bin
./barrnap --help

RAxML

conda install -c genomedk raxml-ng
raxmlHPC -s sequence_file -n new_folder_name -m GTRGAMMA
raxmlHPC -s marref_align_DNA.bacteria.mask.afa -n marref_tree_2 -m GTRGAMMA
raxml-ng --evaluate --msa $REF_MSA --tree $TREE --prefix info --model GTR+G —threads 2

Build and run HMM

#first align sequence file using https://www.ebi.ac.uk/Tools/msa/clustalo/ (choose stockholm alignment)
hmmbuild output_file.hmm input_aligned_sequences.sto
hmmsearch hmm_file.hmm fasta_input.fa > output_file.out