UC Davis 2021 Exercise 1 - GenomeRIK/workshop_tutorials GitHub Wiki
Setup and running TAMA Collapse on aligned cluster/polish reads
cp -r /home/genomerik/exercises_tama .
git clone https://github.com/GenomeRIK/tama.git
Download hg38.fa reference: https://hgdownload.cse.ucsc.edu/goldenpath/hg38/bigZips/hg38.fa.gz
Load modules
samtools/1.10 biopython/1.71
TAMA Collapse for mapped FLNC sequences
go to folder
exercises_tama
Look at TAMA Collapse bash script This is for running TAMA Collapse
run_tama_collapse.sh
Script should contain
spath='/home/genomerik/tama/' pscript='tama_collapse.py' fpath='/home/genomerik/test/' sam='mm2_alz_flnc_hg38_sort.bam' fasta='/home/genomerik/ref_files/hg38.fa' prefix=`echo ${sam} | sed 's/\.bam//' | awk '{print "tc_nc_lde220_"$1}' ` capflag='no_cap' echo "python ${spath}${pscript} -s ${fpath}${sam} -f ${fasta} -p ${prefix} -d merge_dup -x ${capflag} -a 100 -z 100 -sj sj_priority -lde 2 -sjt 20 -log log_off -b BAM" python ${spath}${pscript} -s ${fpath}${sam} -f ${fasta} -p ${prefix} -d merge_dup -x ${capflag} -a 100 -z 100 -sj sj_priority -lde 2 -sjt 20 -log log_off -b BAM
It is using paths from GenomeRIK's folder but you can change the paths to reflect the locations within your folder structure.
run script
sh run_tama_collapse.sh
Summary bash script This provides a summary of the resulting annotation bed12 file.
Script should contain
file=$1 echo "Genes" cat ${file} | awk -F "\t" '{print $4}' | awk -F ";" '{print $1}' | sort | uniq | wc -l echo "Transcripts" cat ${file} | awk -F "\t" '{print $4}' | awk -F ";" '{print $2}' | sort | uniq | wc -l echo "Multi-exon Transcripts" cat ${file} | awk -F "\t" '{if($10>1)print $4}' | awk -F ";" '{print $2}' | sort | uniq | wc -l echo "Multi-exon Genes" cat ${file} | awk -F "\t" '{if($10>1)print $4}' | awk -F ";" '{print $1}' | sort | uniq | wc -l run script
You can run like so
sh run_summary_bed.sh tc_nc_lde220_mm2_alz_flnc_hg38_sort.bed
Filter annotation file for only chromosome level scaffolds.
Note that this is not a part of TAMA but we are doing this to make some results easier to understand.
Bash script should contain
file='tc_nc_lde220_mm2_alz_flnc_hg38_sort.bed' outfile='tc_nc_lde220_mm2_alz_flnc_hg38_sort_chrom_cleanup.bed' cat ${file} | grep -v "_" > ${outfile}
Run bash script
sh run_chrom_cleanup.sh