Predicting putative lncRNAs with CPC2 - labbces/sugarcane_RNAome GitHub Wiki
Calculating coding potential with CPC2
CPC2 is a fast and accurate coding potential calculator based on sequence intrinsic features.
Version: CPC2 standalone-1.0.1
The first step of getting putative non-coding sequences from sugarcane transcriptomes was executed by this script, which runs CPC2 and generate a CPC2_output.txt
for every transcriptome.
Extracting CPC2 non-coding sequences
CPC2_output.txt
contains the following features:
| ID | peptide_length | Fickett_score | isoelectric_point | ORF_integrity | coding_probability | coding_label |
I wrote a simple python script to extract only sequences assigned as noncoding
by CPC2 (just looking into coding_label column).
Of the 16,268,762
initial sequences, CPC2 classified 11,178,089
as non-coding