02_heatmap Correlation Analysis - Yiwei666/08_computional-chemistry-learning-materials- GitHub Wiki
1. Predictive Modeling of the Hot Metal Sulfur Content in a Blast Furnace Based on Machine Learning
4.3.1. Correlation Analysis
The Pearson correlation coefficient is widely used to measure the degree of correlation between two variables; it is a linear correlation coefficient. The range of correlation coefficient is from –1 to 1, where greater than 0 represents a positive correlation, and less than 0 represents a negative correlation. The Pearson correlation coefficient can be expressed by the absolute value of R, where |R| < 0.3 represents a weak correlation, 0.3 < |R| < 0.5 represents a low correlation, 0.5 < |R| < 0.8 represents a moderate correlation, and 0.8 < |R| <1 represents a high correlation. The lighter the color of the heatmap, the higher the negative correlation, and the darker the color of the heatmap, the higher the positive correlation. As seen in Figure 5, the Pearson correlation coefficients among 45 parameters were highly correlated, and the information redundancy between the parameters was also high (due to the limitation of drawing, the order of the 45 parameters in Figure 5 is the same as that in Table 1). In addition, coal consumption (CC), coal ratio (CLR), and sinter consumption (SC) were all positively correlated with hot metal sulfur content. This shows that the increase in sulfur-containing items, of which the main sources are sinter and coal, increased the sulfur content during the production process in the blast furnace. This finally led to the increase in sulfur content in hot metal. The oxygen enrichment rate (OER) was negatively related to the sulfur content in the hot metal; this shows that when the oxygen concentration was increased appropriately, the sulfur content in the hot metal was reduced. An Xiu-we [25] found that the content of sulfur in the dripping iron of an oxygen blast furnace was much lower than that of a conventional blast furnace. This was due to the desulfurization capacity of the oxygen blast furnace slag being improved, thus reducing the sulfur content in the hot metal.
2. An explainable machine learning model to predict and elucidate the compressive behavior of high-performance concrete
4. Description of available data
The names of variables in the dataset and their respective statistical descriptions are listed in Table 1. Five engineered variables listed in Table 2 are also included in the model. Henceforth all the variables will be referred by their ID provided in Table 1, Table 2 instead of their long names. The feature vector x for the predictive model are C, BFS, FASH, W, SP, CA, FA, AGE, W/B, BFS/W, FASH/W, CA/B, CA/FA and the target vector y is HPCCS.
The statistical correlations between the features (x) and the target (y) are shown in Fig. 3. This shows that the engineered features W/B and CA/B have the strongest correlation (r) of −0.62 and −0.56, respectively, with the HPCCS - suggesting that HPCCS increases as the W/B and CA/B ratios decrease. C has a moderate correlation (r) of 0.5 with the HPCCS indicating that the effect of C on the HPCCS is being impacted by other supplementary binders. Besides, there is some weak correlation (r) of 0.37 and 0.33, respectively, between SP and AGE with the HPCCS.
Although statistical correlations are beneficial for preliminary data analysis, they do not capture the combinatorial non-linear dependencies between the multivariate x and y that are extremely important in ML-based studies. Such information is critical to strategically create novel HPCs based on different constraints such as material availability, time, cost, and compressive strength requirements for various civil and construction projects. Therefore, we have adopted a game-theoretic approach to understanding the profound non-linear dependencies & interactions between the features and the HPCCS based on the predictions from the ML models.
3. Research on predicting compressive strength of magnesium silicate hydrate cement based on machine learning
In the analysis of ML modeling, it is necessary to examine the linear relationship among various parameter indicators. If a significant linear relationship exists between the feature parameters, it has a substantial detrimental impact on the predictive accuracy of the ML model. The Pearson correlation coefficient (PCC) is widely used to assess the correlation between parametric indicators. The specific calculation method is shown in section 3.2. A PCC value approaching 1 or −1 indicates a stronger linear relationship between the variables.
Fig. 4 depicted PCC values among various input variables as well as between the input variables and the output variable. It was observed that the majority of PCC values between input variables were below 0.4, suggesting a lack of significant multicollinearity among the variables that could have influenced the predictive outcomes. Further analysis of the linear relationship between input variables and the output variable revealed a notably significant correlation between the compressive strength of MSCH and the curing age, with a PCC value of 0.47. In contrast, the correlation with other feature parameters was relatively weaker.
4. Carbon in the deep upper mantle and transition zone under reduced conditions: Insights from high-pressure experiments and machine learning models
A correlation matrix is useful for demonstrating absolute correlations between pairs of features in the dataset. For example, the heatmap of Pearson coefficient values shows that negative correlations are obtained between S and C and between Fe and Ni (Fig. 6). Alternatively, scatter plots provide an overall idea of the distributions of these features in the dataset. For example, most pairplots of the distributions of different features (along the diagonal in Supplementary Fig. S6) have left-skewed distributions, except temperature and Fe, which have nearly normal and right-skewed distributions, respectively.
5. Machine learning prediction of magnetic properties of Fe-based metallic glasses considering glass forming ability
The number of features was further reduced by checking the linear correlation of the original 15 features. In order to achieve that, the Pearson correlation coefficient (PCC) of any two features (X, Y) defined by (1) was estimated, where E is the expectation, is the mean of X, is the mean of Y, is the standard deviation of , and is the standard deviation of . The PCC value ranges from -1 to 1, and an absolute value closer to 1 indicates more linearly related variables. In this work, if the PCC absolute value of two features ( ) was higher than 0.80, one of the two features was discarded. Fig. 2 shows the correlation matrix generated by the PCC of each feature pair of the 15 features in the original feature list. The feature pairs of δ-cB, V-ρ, V-cB and -cFe presented strong correlations. Combined with the theoretical formulas in Table 1, it could be concluded that δ, V and were dominated by the contribution of B, B and Fe, respectively. Therefore, δ, V and were chosen to be dropped from the feature list. And the following reduced 12 features were obtained: ΔTx, ρ, Tm, , χ, VEC, VEC', cFe, cCo, cNi, cB and cSi. The pedictive peformance of the ML models trained by the original 15 features and the reduced 12 features was compared below.
6. Actinides in complex reactive media: A combined ab initio molecular dynamics and machine learning analytics study of transuranic ions in molten salts
3.4. Descriptor correlation matrix We now analyze the correlation between structural properties and electronic properties. Fig. 4a and b shows the heatmap of the Pearson correlation coefficients (PCCs) for feature listed in Table 2, with data obtained from 120 configurations for each An/MS system (totally, 600 configurations of Ans in NaCl or FLiBe were considered). Here we chose to use structural features (descriptors) that are conventional and easily accessible. Fig. S4-10 show the distribution of some of these features in the two MSs.
Our analysis indicates that the number of f-electrons (F1) is indeed a significant feature for electronic properties. Fig. 4a and b shows that F1 is most correlated with Ofp (F18) and is most anticorrelated with the Bader charge of Ans (F17). While the F1-F17 PCCs in NaCl and FLiBe are similar, ∼0.89, the F1-F18 PCC in NaCl is 0.78 which is sizably smaller than that in FLiBe, 0.91. This indicates that the correlation between the number of f electrons and the polarization of An-ligand bonds, and as a result their electrostatic bonding, may not be strongly affected by the host MS but the correlation between the number of f electrons and the f-p orbital degeneracy, i.e., covalent bonding, is. Furthermore, the anticorrelation between F1 and the CN (F2) is stronger in the case of NaCl, in which PCC is −0.29 compared to −0.1 in the case of FLiBe. Conversely, the PCC between F1 and the average An-ligand bond distance (F3) is −0.27 in NaCl, which is smaller than −0.40 in FLiBe. In addition, one can also see the role in MS in the correlation between F1 and other properties, for examples, F1 and the average ligand–ligand distance (F5) is correlated in NaCl while anticorrelated in FLiBe.
Fig. 4a and b also reveals that F2 is strongly correlated with F3, the variance of An-ligand bond lengths (F4), the variance of the tetrahedral angle (F10), the average volume of An-ligand–ligand–ligand tetrahedra (F11), and the variance of the An-ligand–ligand–ligand tetrahedron volumes (F12). Consistent with Fig. 2c, a higher number of ligands in the solvation shell of an An leads to weaker An-ligand interatomic interactions, i.e., longer An-ligand bonds. Larger CNs also lead to higher diversity (larger variance) in the An-ligand distances, consistent with the first peaks in the RDFs, Fig. 2a and b. Likewise, F2 has the same effect on the volume and the shape of the solvation shell. This indicates that the solvation shell of each An becomes more flexible with higher CNs. Conversely, F1 is anticorrelated with the average ligand-An-ligand angle (F7), the average tetrahedral parameter (F9), the An-ligand–ligand–ligand solid angle (F13), and the local structure index (LSI – F15) 54(https://www.sciencedirect.com/science/article/pii/S0167732222016543#b0275). We also found an anticorrelation between F2 and Ofp (F18) that is indicative of a weaker covalency character from f orbitals caused by the anionic exchange between the first solvation shell of each An and the rest of the system. Here, we also clearly see the correlation effect on the partial atomic charge of Ans: higher CNs leads to higher charges. Therefore, the charges in Fig. 1d and e can be understood as the result of a subtle interplay between the electron affinity and the coordination effect. This emphasizes the critical role of (classical) statistical mechanics in understanding electronic properties of Ans in liquid media, where the coordination is dynamic and not fixed.