Integration of expression profile - GianlucaMattei/methyl.O GitHub Wiki

Expression Data Integration:

In order to evaluate how methylation affects gene activity it is important to integrate expression profiles, when available. methyl.O permits the integration of expression data and computes both correlation and a ranking score. The integration process assigns a score to each gene based on the beta value and the log. fold change. The higher the values the higher is the score, positive in case the beta and the expression values are discordant, negative if concordant. It is known that hyper-methylated sequences repress gene expression, however hypo-methylated sequences can not be directly associated with upregulation of genes. In fact, hyper-methylation usually is a long term regulation involved in cells' fate commitment, while hypo-methylated genes still have the possibility to be expressed but undergo several types of other regulation mechanisms. For this reason methyl.O can also compute the correlation between the expression and the beta differences of hyper-methylated and hypo-methylated genes separately as shown in figure 6. methyl.O allows to plot the correlations and to assign to each methylated gene its expression.

Figure 3: Correlation plot between methylation and expression. Linear regression of gene expression and hyper-methylated genes is represented by the orange dashed line. The grey dashed line represents the linear regression of gene expression with hypo-methylated genes. In this case we can observe an inverse correlation between methylation levels and expression in both cases

R:

The function associateFeat2Exprs, needs the annotated DMRs and the normalized expression profile to perform integration of data. The function parameters permit to select the methylated features considered most affecting the expression (active.features), to indicate the column of the expression file where to find the gene IDs (col.genes), the statistics to use (col.stat), the log. fold change (col.logFC), the statistical (stat.thr) and the log. fold change (logfc.thr) thresholds. You can also set a threshold for the beta value (beta.thr), the metrics to filter DMRs (param.type), as explained in Annotate Methylated Enhancers paragraph, and the threshold value (overlap.param.thr). The correlation is customizable by setting the type of correlation to compute and plot (plot.type), and the method to compute the correlation coefficient (cor.type). The function accepts by default the symbol IDs but can also convert automatically other types of IDs to symbols (convert.genes). In this case it is necessary to indicate from which type of IDs they must convert the genes to symbols. The function can return a data.frame (return.table) where the beta difference and the log. fold change are associated with each gene, otherwise the function can return a plot displaying the two correlations. The customizable parameters permit to set the correlation line colors (lmfit.col1, lmfit.col2), the axis lines colors (line.col), the color palette (among hcl.pals()) reflecting the score values for the plot (pal) and whether show the gene names next to each dot (show.text). For both data.frame and plots it is possible to filter the genes (filter.by.genes) to study how the methylation can affect for instance the gene expression of a specific pathway.

data(expressionSubset)
associateFeat2Exprs(annotatedDMRs, expressionSubset, active.features = c("promoters", "heads"), col.genes =1 , 
col.stat = 6, stat.thr = 0.05, col.logFC = 2, logfc.thr = 0, beta.thr = .3, plot.type = 'splitted', 
cor.type = 'pearson', return.table = FALSE)

GUI:

The tab Methylation vs Expression permits to integrate expression profiles to methylation data. In the main page are shown both the plot of correlations and the data frame where each annotated gene is associated with the beta value and the log. fold change. The upper part of the page permits to select the methylated gene regions to associate to the expression while in the left panel it is possible to upload the expression file. The following parameters are available:

Command Description
Type of Parameter to filter Overlapping Methylation The metrics to filter the DMRs can be 1) the percentage of the enhancer overlapped with the DMR, 2) the length, in bp, of the overlap between the enhancer and 3) the DMR and the length, in bp of the DMR.
Overlapping Methylation Value Threshold for Filtering Threshold value for the selected parameter.
Statistic Threshold Threshold value for the selected statistics.
LogFC Threshold Threshold value for expression log. fold change.
Beta Diff Threshold The beta threshold for each DMRs to be considered in the annotation
Filter by DB Filters the genes from a specific database, resulting from enrichment analysis, to study how the methylation can affect the gene expression.
Select Correlation Type Set the method to compute the correlation
Select Path for filtering Genes Filters the genes to study how the methylation can affect the gene expression of a specific pathway resulting from enrichment analysis.
Set the column position in the expression file where to find gene IDs Column Position of Used Statistics
Set the column position in the expression file where to find the statistics (p.value or adj p.value) Column position of logFC
Set the column position in the expression file where to get the log. fold changes values. Select TRUE if Gene IDs are not Symbols
If selected, gene IDs are not official symbols Select Gene IDs Annotation Type to Translate
Table 7: Parameters for Methylation vs Expression tab

Other settings to customize the plot can be found in the red gear button above the plot. Here can be also found the option to compute splitted correlation for hyper-methylated and hypo-methylated genes.