Computing gene pair relationships with Pearson correlation - labbces/sugarcane_RNAome GitHub Wiki

Computing Pearson correlation

I employed this script to compute the Pearson correlation for all gene pairs within the filtered matrix containing genes with the highest coefficient of variation (CV), as outlined in the process described here.

Only genes with correlation > 70% were retained.

To give greater importance to correlations close to 1.0 compared to those near 0.7, a transformation was applied to values in the range of 0.0 to 0.3 by subtracting 0.7 from the original correlation value. This approach was adopted to increase the granularity between distant correlations, considering that lower correlations are more common but have a comparable impact on subsequent analyses. In this way, we emphasize the stronger correlations and attenuate the weaker ones, maintaining a greater distinction between them. The transformed Pearson correlations for all gene pairs were saved in the format ABC (gene A, gene B, correlation).

After computing the Pearson correlation of gene pairs, we proceeded to identify co-expressed gene modules using the Markov Clustering Algorithm (MCL).

Clustering was performed with a set of 10 inflation values, as described here