Home - Magdoll/cDNA_Cupcake GitHub Wiki
What is cDNA_Cupcake
This is a GitHub repo and (mostly independent) Python/R scripts that I personally use on a daily basis to make my life easier. Most of the code is very simple, less than 20 lines, and does some kind of basic processing like reverse complementing a sequence.
This repo + wiki is meant to supplement the official Iso-Seq software.
How to use the scripts
The simplest way to use the script is to simply clone the GitHub repository, then add the GitHub repo path to your $PATH
variable. The scripts are organized into different sub-directories (ex: sequence/
, rarefaction/
etc) so you will have to add them individually.
git clone https://github.com/Magdoll/cDNA_Cupcake.git
export PATH=$PATH:<path_to_Cupcake>/sequence/
And so on...
However if you wish to use scripts such as collapse_isoforms_by_sam.py
and chain_samples.py
, you will need to install Cupcake. See Cupcake: supporting scripts for Iso-Seq after clustering step
# only if you need to use certain scripts
python setup.py build
python setup.py install
Installing Cupcake (optional)
The only exception is Cupcake: supporting scripts for Iso-Seq after clustering step, which does require compiling and installation.
Basic Nucleotide Assumption
All the scripts assume that the input/output sequences consist only of: A, T, C, G.
Other nucleotides such as N
, U
, R
, might cause incorrect behavior. Use at own risk.
Current list of maintained scripts
Annotation and Rarefaction
make_file_for_subsampling_from_collapsed.py
: Prepare file for running subsampling (rarefaction curve).subsample.py
andsubsample_with_category.py
: Running subsamping. Results can be plotted with Excel graphics and R, etc.
See Annotation and Rarefaction Wiki for usage details.
Targeted Iso-Seq Analysis
calc_probe_hit_from_sam.py
: calculate on-target rate based on FL read alignment + probe BED file.
See Targeted Iso-Seq Wiki for usage details.
Sequence Manipulation
get_seq_stats.py
: Summarize length distribution of a FASTA/FASTQ file.rev_comp.py
: Reverse complement a sequence from command line.fa2fq.py
andfq2fa.py
: Convert between FASTA and FASTQ format.sort_fasta_by_len.py
: sort fasta file by length (increasing or decreasing).get_seqs_from_list.py
: extract list of sequences given a fasta file and a list of IDs.err_correct_w_genome.py
: generate fasta sequences given genome and SAM file.calc_expected_accuracy_from_fastq.py
: calculate expected accuracy from FASTQ file. Can be used to calculate expected accuracies in Quiver/Arrow-polished low-quality isoform sequences.sam_to_bam.py
: quick script to run SAM to BAM conversion. Assumessamtools
is installed.sam_to_gff3.py
: use BCBio and BioPython to convert SAM file into GFF3 format.group_ORF_sequences.py
: group identical ORF sequences from different isoforms.
See Sequence Manipulation Wiki for usage details.
Cupcake ToFU: supporting scripts for Iso-Seq after clustering step
collapse_isoforms_by_sam.py
: Collapse HQ isoform results to unique isoforms (based on genome alignment).get_abundance_post_collapse.py
: Obtain count information post collapse to unique isoforms.filter_by_count.py
: Filter collapse result by FL count information.filter_away_subset.py
: Filter away 5' degraded isoforms.simple_stats_post_collapse.py
: Generating simple stats file to plot in R later.chain_samples.py
: Chaining together multiple samples.fusion_finder.py
: Finding fusion genes.fusion_collate_info.py
: Collate fusion information after running SQANTI(3).color_bed12_post_sqanti.py
: Color BED12 files using FL counts after running SQANTI(3).
See Cupcake: supporting scripts for Iso-Seq after clustering step for usage details.
List of Other Useful Tools
A list of useful tools that complements Cupcake: