UC Davis 2021 Exercise 4 - GenomeRIK/workshop_tutorials GitHub Wiki
Using TAMA Read Support Levels to get read count info and transcript filtering
Let's take a look at the input filelist file:
filelist_read_support.txt
It should look like this:
flnc tc_nc_lde220_mm2_alz_flnc_hg38_sort_trans_read.bed trans_read
Take a look at the bash script for running tama_read_support_levels.py
spath='/home/genomerik/tama/tama_go/read_support/' pscript='tama_read_support_levels.py' filelist='filelist_read_support.txt' prefix='rs_tc_nc_lde220_mm2_alz_flnc_hg38_sort' python ${spath}${pscript} -f ${filelist} -o ${prefix} -m no_merge
Run the bash script:
sh run_read_support.sh
Now let's filter out all transcript models with less than 2 read counts. Let's take a look at the bash script:
run_filter_read_counts.sh
It should look like this:
spath='/home/genomerik/tama/tama_go/filter_transcript_models/' pscript='tama_remove_single_read_models_levels.py' bed='tc_nc_lde220_mm2_alz_flnc_hg38_sort_chrom_cleanup.bed' readsupport='rs_tc_nc_lde220_mm2_alz_flnc_hg38_sort_read_support.txt' prefix='fsm_tc_nc_lde220_mm2_alz_flnc_hg38_sort_chrom_cleanup' level='transcript' multi='remove_multi' python ${spath}${pscript} -b ${bed} -r ${readsupport} -o ${prefix} -l ${level} -k ${multi} -n 2
Let's run the script:
sh run_filter_read_counts.sh
Let's run the summary script to see what has changed:
sh run_summary.bed fsm_tc_nc_lde220_mm2_alz_flnc_hg38_sort_chrom_cleanup.bed
As an extra let's look at the output from running the TAMA ORF/NMD pipeline on this Iso-Seq annnotation. Open up the file below in IGV to view:
map_cds_tc_nc_nolde_mm2_alz_flnc_hg38.bed