Estimating lncRNA origin - labbces/sugarcane_RNAome GitHub Wiki
Modern sugarcane hybrids are interspecific hybrids resulting from a series of crosses involving various species of the Saccharum complex. To gain an idea of the origin - i.e. the species of the Saccharum complex used in the breeding program - of the identified lncRNAs, genomic reads from the species S. barberi, S. officinarum, and S. spontaneum were downloaded from the SRA and used to assign this probable origin.
[!NOTE] The following genomic accessions were downloaded:
For S. barberi, paired reads from
991,287,643
fragments fromSRR12929232
,SRR12929240
, andSRR12929242
.For S. officinarum, reads from
686,473,835
fragments fromSRR15634581
andSRR7771851-SRR7771855
.For S. spontaneum, reads from
3,231,629,702
fragments fromSRR7771987-SRR7771991
,SRR5581514-SRR5581518
,SRR7276904-SRR7276905
, andSRR5481938
.
The genomic reads from the three Saccharum species were compared against the pan-RNAome transcripts using Salmon, using this automated pipeline (steps shown below).
[!NOTE] Then, using this R script, the genomic reads were filtered to remove those with count values equal to zero for all species. Next, the number of reads assigned to each transcript for each species was normalized by correcting for the total number of reads available for each species and the transcript length, expressed as FPKM (Fragments Per Kilobase of transcript per Million mapped reads).
From the FPKM values and the FPKM ratios between pairs of species, it is possible to assign a probable origin to each read. Thus, each read was classified according to the criteria below, which were applied sequentially by the script mentioned above.
# common
countsOrigin[rownames(countsOrigin[which(is.na(countsOrigin$Origin) & abs(countsOrigin$log10ratioFPKM_SSPO_vs_SOFF) < 1 & abs(countsOrigin$log10ratioFPKM_SOFF_vs_SBAR) < 1 & abs(countsOrigin$log10ratioFPKM_SSPO_vs_SBAR) < 1),]),'Origin']<-'Common'
# SSPO
countsOrigin[rownames(countsOrigin[which(is.na(countsOrigin$Origin) & countsOrigin$log10ratioFPKM_SSPO_vs_SOFF >= 1 & abs(countsOrigin$log10ratioFPKM_SOFF_vs_SBAR) < 1 & countsOrigin$log10ratioFPKM_SSPO_vs_SBAR >= 1),]),'Origin']<-'SSPO'
countsOrigin[rownames(countsOrigin[which(is.na(countsOrigin$Origin) & countsOrigin$S._officinarum == 0 & countsOrigin$log10ratioFPKM_SSPO_vs_SBAR >= 1),]),'Origin']<-'SSPO'
countsOrigin[rownames(countsOrigin[which(is.na(countsOrigin$Origin) & countsOrigin$S._barberi == 0 & countsOrigin$log10ratioFPKM_SSPO_vs_SOFF >= 1),]),'Origin']<-'SSPO'
# SOFF
countsOrigin[rownames(countsOrigin[which(is.na(countsOrigin$Origin) & countsOrigin$log10ratioFPKM_SSPO_vs_SOFF <= -1 & countsOrigin$log10ratioFPKM_SOFF_vs_SBAR >= 1 & abs(countsOrigin$log10ratioFPKM_SSPO_vs_SBAR) < 1),]),'Origin']<-'SOFF'
countsOrigin[rownames(countsOrigin[which(is.na(countsOrigin$Origin) & countsOrigin$S._spontaneum == 0 & countsOrigin$log10ratioFPKM_SOFF_vs_SBAR >= 1),]),'Origin']<-'SOFF'
countsOrigin[rownames(countsOrigin[which(is.na(countsOrigin$Origin) & countsOrigin$S._barberi == 0 & countsOrigin$log10ratioFPKM_SSPO_vs_SOFF <= -1),]),'Origin']<-'SOFF'
# SBAR
countsOrigin[rownames(countsOrigin[which(is.na(countsOrigin$Origin) & abs(countsOrigin$log10ratioFPKM_SSPO_vs_SOFF) < 1 & countsOrigin$log10ratioFPKM_SOFF_vs_SBAR <= -1 & countsOrigin$log10ratioFPKM_SSPO_vs_SBAR <= -1),]),'Origin']<-'SBAR'
countsOrigin[rownames(countsOrigin[which(is.na(countsOrigin$Origin) & countsOrigin$S._officinarum == 0 & countsOrigin$log10ratioFPKM_SSPO_vs_SBAR <= -1),]),'Origin']<-'SBAR'
countsOrigin[rownames(countsOrigin[which(is.na(countsOrigin$Origin) & countsOrigin$S._spontaneum == 0 & countsOrigin$log10ratioFPKM_SOFF_vs_SBAR <= -1),]),'Origin']<-'SBAR'
# common between SSPO and SBAR
countsOrigin[rownames(countsOrigin[which(is.na(countsOrigin$Origin) & countsOrigin$S._officinarum == 0 & abs(countsOrigin$log10ratioFPKM_SSPO_vs_SBAR) < 1),]),'Origin']<-'CommonSSPO_SBAR'
# common between SOFF and SBAR
countsOrigin[rownames(countsOrigin[which(is.na(countsOrigin$Origin) & countsOrigin$S._spontaneum == 0 & abs(countsOrigin$log10ratioFPKM_SOFF_vs_SBAR) < 1),]),'Origin']<-'CommonSOFF_SBAR'
# common between SSPO and SOFF
countsOrigin[rownames(countsOrigin[which(is.na(countsOrigin$Origin) & countsOrigin$S._barberi == 0 & abs(countsOrigin$log10ratioFPKM_SSPO_vs_SOFF) <= 1),]),'Origin']<-'CommonSSPO_SOFF'
countsOrigin[rownames(countsOrigin[which(is.na(countsOrigin$Origin)),]),'Origin']<-'UNK'