Output - Oshlack/necklace GitHub Wiki

Alignment and Sequence Files:

  • superTranscriptome/SuperDuper.fasta - the assembled superTranscriptome sequences for each gene
  • mapped_reads/<sample_name>.bam - reads mapped back to the superTranscriptome for each sample
  • counts/blocks.gtf - the segmentation of superTranscripts based on splice junctions (similar to exons)

These files can be used for example, to visualise read coverage across a gene in IGV. To do this, load SuperDuper.fasta as the reference sequence using "Load Genome from File". Then load the sample bams and gtf as usual ("Load from file").

Count Files:

  • counts/gene.counts - gene-level counts for use in differential gene expression testing
  • counts/block.counts - block-level counts for use in differential transcript usage testing

For follow on analyses, please refer the relevant bioconductor packages. For example: edgeR, voom or DESeq for gene expression testing or for differential transcript usage: diffSpliceDGE (in edgeR), diffSplice (in voom) and DEXSeq

The count tables can be used exactly like the "count.txt" tables generated from genome-only based analysis, such as those created by featureCounts for which there are many examples on the internet. Here are a few examples of R code to get you started with analysing the count data: Differential gene expression with EdgeR, Differential transcript usage with DEXSeq, Differential transcript usage with DiffSplice

General Information:

  • stats/gene_info.txt - Number of genes found in the reference annotation, genome-based superTranscriptome (reference annotation and genome-guided assembly), and full superTranscriptome (reference annotation, genome-guided assembly and de novo assembly)
  • stats/mapping_info.txt - As above, but for the number of reads aligning
  • stats/size_info.txt - As above, but for the number of base pairs