Estimating lncRNA origin - labbces/sugarcane_RNAome GitHub Wiki

Modern sugarcane hybrids are interspecific hybrids resulting from a series of crosses involving various species of the Saccharum complex. To gain an idea of the origin - i.e. the species of the Saccharum complex used in the breeding program - of the identified lncRNAs, genomic reads from the species S. barberi, S. officinarum, and S. spontaneum were downloaded from the SRA and used to assign this probable origin.

[!NOTE] The following genomic accessions were downloaded:

For S. barberi, paired reads from 991,287,643 fragments from SRR12929232, SRR12929240, and SRR12929242.

For S. officinarum, reads from 686,473,835 fragments from SRR15634581 and SRR7771851-SRR7771855.

For S. spontaneum, reads from 3,231,629,702 fragments from SRR7771987-SRR7771991, SRR5581514-SRR5581518, SRR7276904-SRR7276905, and SRR5481938.

The genomic reads from the three Saccharum species were compared against the pan-RNAome transcripts using Salmon, using this automated pipeline (steps shown below).

[!NOTE] Then, using this R script, the genomic reads were filtered to remove those with count values equal to zero for all species. Next, the number of reads assigned to each transcript for each species was normalized by correcting for the total number of reads available for each species and the transcript length, expressed as FPKM (Fragments Per Kilobase of transcript per Million mapped reads).

From the FPKM values and the FPKM ratios between pairs of species, it is possible to assign a probable origin to each read. Thus, each read was classified according to the criteria below, which were applied sequentially by the script mentioned above.

# common
countsOrigin[rownames(countsOrigin[which(is.na(countsOrigin$Origin) & abs(countsOrigin$log10ratioFPKM_SSPO_vs_SOFF) < 1 & abs(countsOrigin$log10ratioFPKM_SOFF_vs_SBAR) < 1 & abs(countsOrigin$log10ratioFPKM_SSPO_vs_SBAR) < 1),]),'Origin']<-'Common'

# SSPO
countsOrigin[rownames(countsOrigin[which(is.na(countsOrigin$Origin) & countsOrigin$log10ratioFPKM_SSPO_vs_SOFF >= 1 & abs(countsOrigin$log10ratioFPKM_SOFF_vs_SBAR) < 1 & countsOrigin$log10ratioFPKM_SSPO_vs_SBAR >= 1),]),'Origin']<-'SSPO'
countsOrigin[rownames(countsOrigin[which(is.na(countsOrigin$Origin) & countsOrigin$S._officinarum == 0 & countsOrigin$log10ratioFPKM_SSPO_vs_SBAR >= 1),]),'Origin']<-'SSPO'
countsOrigin[rownames(countsOrigin[which(is.na(countsOrigin$Origin) & countsOrigin$S._barberi == 0 & countsOrigin$log10ratioFPKM_SSPO_vs_SOFF >= 1),]),'Origin']<-'SSPO'

# SOFF
countsOrigin[rownames(countsOrigin[which(is.na(countsOrigin$Origin) & countsOrigin$log10ratioFPKM_SSPO_vs_SOFF <= -1 & countsOrigin$log10ratioFPKM_SOFF_vs_SBAR >= 1 & abs(countsOrigin$log10ratioFPKM_SSPO_vs_SBAR) < 1),]),'Origin']<-'SOFF'
countsOrigin[rownames(countsOrigin[which(is.na(countsOrigin$Origin) & countsOrigin$S._spontaneum == 0 & countsOrigin$log10ratioFPKM_SOFF_vs_SBAR >= 1),]),'Origin']<-'SOFF'
countsOrigin[rownames(countsOrigin[which(is.na(countsOrigin$Origin) & countsOrigin$S._barberi == 0 & countsOrigin$log10ratioFPKM_SSPO_vs_SOFF <= -1),]),'Origin']<-'SOFF'

# SBAR
countsOrigin[rownames(countsOrigin[which(is.na(countsOrigin$Origin) & abs(countsOrigin$log10ratioFPKM_SSPO_vs_SOFF) < 1 & countsOrigin$log10ratioFPKM_SOFF_vs_SBAR <= -1 & countsOrigin$log10ratioFPKM_SSPO_vs_SBAR <= -1),]),'Origin']<-'SBAR'
countsOrigin[rownames(countsOrigin[which(is.na(countsOrigin$Origin) & countsOrigin$S._officinarum == 0 & countsOrigin$log10ratioFPKM_SSPO_vs_SBAR <= -1),]),'Origin']<-'SBAR'
countsOrigin[rownames(countsOrigin[which(is.na(countsOrigin$Origin) & countsOrigin$S._spontaneum == 0 & countsOrigin$log10ratioFPKM_SOFF_vs_SBAR <= -1),]),'Origin']<-'SBAR'

# common between SSPO and SBAR
countsOrigin[rownames(countsOrigin[which(is.na(countsOrigin$Origin) & countsOrigin$S._officinarum == 0 & abs(countsOrigin$log10ratioFPKM_SSPO_vs_SBAR) < 1),]),'Origin']<-'CommonSSPO_SBAR'

# common between SOFF and SBAR
countsOrigin[rownames(countsOrigin[which(is.na(countsOrigin$Origin) & countsOrigin$S._spontaneum == 0 & abs(countsOrigin$log10ratioFPKM_SOFF_vs_SBAR) < 1),]),'Origin']<-'CommonSOFF_SBAR'

# common between SSPO and SOFF
countsOrigin[rownames(countsOrigin[which(is.na(countsOrigin$Origin) & countsOrigin$S._barberi == 0 & abs(countsOrigin$log10ratioFPKM_SSPO_vs_SOFF) <= 1),]),'Origin']<-'CommonSSPO_SOFF'
countsOrigin[rownames(countsOrigin[which(is.na(countsOrigin$Origin)),]),'Origin']<-'UNK'