Predicting putative lncRNAs with CPC2 - labbces/sugarcane_RNAome GitHub Wiki

Calculating coding potential with CPC2

CPC2 is a fast and accurate coding potential calculator based on sequence intrinsic features.

Version: CPC2 standalone-1.0.1

The first step of getting putative non-coding sequences from sugarcane transcriptomes was executed by this script, which runs CPC2 and generate a CPC2_output.txt for every transcriptome.

Extracting CPC2 non-coding sequences

CPC2_output.txt contains the following features:

| ID | peptide_length | Fickett_score | isoelectric_point | ORF_integrity | coding_probability | coding_label |

I wrote a simple python script to extract only sequences assigned as noncoding by CPC2 (just looking into coding_label column).

Of the 16,268,762 initial sequences, CPC2 classified 11,178,089 as non-coding