UC Davis 2021 Exercise 4 - GenomeRIK/workshop_tutorials GitHub Wiki

Using TAMA Read Support Levels to get read count info and transcript filtering

Let's take a look at the input filelist file:

filelist_read_support.txt

It should look like this:

  flnc    tc_nc_lde220_mm2_alz_flnc_hg38_sort_trans_read.bed      trans_read

Take a look at the bash script for running tama_read_support_levels.py

  spath='/home/genomerik/tama/tama_go/read_support/'
  pscript='tama_read_support_levels.py'
  filelist='filelist_read_support.txt'
  prefix='rs_tc_nc_lde220_mm2_alz_flnc_hg38_sort'
  python ${spath}${pscript} -f ${filelist} -o ${prefix} -m  no_merge

Run the bash script:

  sh run_read_support.sh

Now let's filter out all transcript models with less than 2 read counts. Let's take a look at the bash script:

  run_filter_read_counts.sh

It should look like this:

  spath='/home/genomerik/tama/tama_go/filter_transcript_models/'
  pscript='tama_remove_single_read_models_levels.py'
  bed='tc_nc_lde220_mm2_alz_flnc_hg38_sort_chrom_cleanup.bed'
  readsupport='rs_tc_nc_lde220_mm2_alz_flnc_hg38_sort_read_support.txt'
  prefix='fsm_tc_nc_lde220_mm2_alz_flnc_hg38_sort_chrom_cleanup'
  level='transcript'
  multi='remove_multi'
  python ${spath}${pscript} -b ${bed}  -r ${readsupport} -o ${prefix} -l ${level} -k ${multi} -n 2

Let's run the script:

  sh run_filter_read_counts.sh

Let's run the summary script to see what has changed:

  sh run_summary.bed fsm_tc_nc_lde220_mm2_alz_flnc_hg38_sort_chrom_cleanup.bed

As an extra let's look at the output from running the TAMA ORF/NMD pipeline on this Iso-Seq annnotation. Open up the file below in IGV to view:

  map_cds_tc_nc_nolde_mm2_alz_flnc_hg38.bed
⚠️ **GitHub.com Fallback** ⚠️