Analyzing MCL clusterings (effects of Inflation value) - labbces/sugarcane_RNAome GitHub Wiki
Analyzing MCL clusterings
MCL clustering was performed with this script, using 10 Inflation values for each dataset - 1.3, 1.8, 2.3, 2.8, 3.3, 3.8, 4.3, 4.8, 5.3, 5.8
.
The correlation of gene expression leads to distinct networks for the 3 datasets (varying in levels of inflation and also cluster size).
Note: These results do not indicate a single clustering as the 'best' option, as all clusterings appear to be at least acceptable. They do help in illustrating the relative advantages of each clustering.
I developed the following scripts to plot Cluster Size Distribution and Efficiency Peak
Hoang2017
Almost all genes are part of three modules (size ~30k). As inflation increases, the network gradually breaks down.
All clusterings captures huge edge mass (~70 percent) using only ~6 percent of 'area'.
Correr2020
Almost all genes are part of four modules (size ~35k). As inflation increases, the network gradually breaks down.
All clusterings captures huge edge mass fraction (~99 percent) using only ~10 percent of 'area'.
Perlo2022
Almost all genes are part of a single connected component (size ~14k). As inflation increases, the network gradually breaks down, eventually forming clusters with 1 and 2 genes.
This data shows that there is little variaton in the cluster structure. The 1.3 clustering captures nearly all edge mass (89 percent) using only 6 percent of 'area'. The 5.8 clustering captures 66 percent of the mass using 1.2 percent of area.