TOF‐SIMS Data: Multivariate Statistical Analysis - mikee9265/SIMS-Wiki GitHub Wiki

TOF-SIMS data sets lend themselves to multivariate statistical analysis (MVA) techniques of various types (Henderson 2013; Tyler 2006, Tyler 2013). This is because, while there are hundreds of separate peaks in most spectra, many of the peaks covary in intensity, and there are far fewer truly independent features in the data. One way to think of this is to imagine a surface having a limited set of chemical compounds. Each compound will produce a spectrum, independent of any other compound present. The final spectrum will be the sum of the signals coming from each chemical species. Given an additional number of spectra of samples with different relative amounts of the set of chemical species from the first sample, it makes sense that it would be possible to use a mathematical method to determine the spectra for each of the compounds and how much of each compound is present at the surface of each sample. Chemometrics, which is really another term for the use of multivariate statistics to take complex data and to determine what components of the data apply to each of a series of independent chemical species, is thus directly applicable to TOF-SIMS results.

The problem with doing this simply is that matrix effect again. If you could mix chemical species and the relative intensities of the peaks in each separate spectrum would stay fixed, with all the peaks in that spectrum increasing or decreasing linearly with concentration, the Chemometric approach to SIMS data would be perfect. The problem is that matrix effects cause the changes in peak intensities to sometimes vary nonlinearly with concentration, and even within the spectrum of one species, peaks may vary in intensity relative to one another depending on what else is around. It is for this reason that multivariate statistical analysis of TOF-SIMS data is often used more qualitatively than quantitatively, and even then the approach is not always successful. Still, given all the concern over matrix effects, it is remarkable how often the statistical approach does work, and how useful it is when it does. Further, you can tell when you are getting a good fit to the data and when you are not.

In most cases, you can successfully analyze the data without MVA, so you might wonder why go to such trouble when a direct approach would be sufficient. The answer is that MVA is increasingly easy to do once you have made it sufficiently far up the learning curve. Some methods are now incorporated into the instrument software, saving the step of having to export the data. As with any data analysis approach, the easier and faster the method, the more often it is worthwhile to use it.

As noted above, some multivariate statistical analyses can be done using the vendor-supplied software. Multiple platforms exist for more sophisticated and complete analysis packages that work well with TOF-SIMS data. The NESAC/BIO center in Seattle offers free software along with tutorials and examples (NESAC/BIO 2015). Commercially available software can be obtained from Eigenvector Research (“Eigenvector Research: Chemometrics Software, Consulting and Training” 2015). The Eigenvector software also comes with tutorials and detailed documentation. There are now many types of MVA that may be applied to TOF-SIMS data analysis. A discussion of four major categories of these techniques follow.

Principal Component Analysis

As noted earlier, some peaks in the spectrum will tend to vary together, and some independently. For example, the peaks belonging to one species (say, the substrate) may all go down while peaks representative of another species (say, an overlayer) may all go up in a series of spectra where there is an increasing concentration of the second species. The variations in the spectra can be between samples, or between areas of the same sample, in an image. The variations can be captured mathematically using principal component analysis (PCA). The technique linearly recombines the peaks in the spectrum to make new components. Mathematicians describe this as a rotation in multidimensional space, where each peak represents one of the original dimensions. The first component is a linear combination of the peaks that captures the most variation one can capture with a single linear combination of the peaks. The second, orthogonal to the first, is a second linear combination of the peaks designed to capture as much of the remaining variation as can be captured with a single linear combination of the peaks. The factors that together describe this linear combination are called loadings. Additional components are calculated until the analyst concludes that any remaining variation is noise. The amount of each component that contributes to the overall spectrum of a particular sample (or pixel) is called a score.

PCA does not produce components that describe an individual chemical species. This is because the signals for different species on the surface are rarely independent of each other. In the example above, the substrate is attenuated when the coverage of the overlayer increases. Most of the variations in this set of samples may be captured with one component, because the two species have signals that are inversely proportional to one another. A plot of the loadings for this component versus the mass of the peaks will look like two different spectra, one spectrum that is positive going, and one negative going. One will represent the substrate and, the other, the overlayer. Let’s say that, for our example, the peaks most representative of the overlayer have positive loadings and the peaks representing the substrate have negative loadings. For samples where there is almost no overlayer, the score for this component will be very negative. For samples where the substrate is largely attenuated and the signals for the overlayer are ascendant, the score will be very positive. In simple cases like this, one can quickly get used to the presence of positive and negative going loadings in the component.

It gets more complicated when you have more than two species varying in concentration. Imagine a sample with a substrate and submonolayer levels of species A and B. If the total coverage varies and the relative amounts of A and B vary, a number of PCA results may be obtained, and it is possible that peaks from multiple species will be in multiple components. You may get a component with positive A peaks and negative substrate peaks along with a component with positive B peaks and negative substrate peaks. This might occur if the variability in the sample set is dominated by the overall coverage of the various overlayers. If the variation is dominated by species B replacing species A as the dominant source of variation in the data, the major component could have positive going A peaks and negative B peaks, with a second component where a mixture of A and B peaks are positive and the substrate is negative. Real samples will often produce even more complicated components.

PCA components can take some getting used to, but with a little bit of experience, one gets used to their quirks, and the results of these analyses become more readily apparent. Practice with real data sets, starting with some that are well understood, is recommended. PCA can also be a good starting point for statistical analysis. In image analysis, it will certainly reveal where the variations in the image are by region. One can run PCA prior to performing one of the other methods described below. One advantage of PCA is that it requires no user input (except for preprocessing, see following text). The results are determined directly from the data, so the user does not need to fear how their choices may have produced spurious results.

An additional use of PCA is in reducing noise in the data. One fits the data as described above until the remaining components appear to be just noise. Using the components that contain the signal and not the noise, one reverses the calculation to reproduce the original results, but now without much of the noise.

Partial Least Squares

Partial least squares (PLS) is like PCA, but instead of looking for the linear combination of peaks that best describes the variability in the spectral data set, PLS draws out the variation in the data set that captures the variability in another variable. Take our first example with overlayer A and a substrate. An additional measurement (e.g., ellipsometry or X-ray reflectance {XRR} may have been used to determine the thickness of the overlayer. This set of values for a series of samples becomes your Y vector, each value of Y is associated with a spectrum (a row in matrix X). PLS will capture the linear combination of spectral peaks that best describes the variation in overcoat thickness.

The terminology for PLS is different in that, instead of components, one has latent variables. One strives to use as many latent variables that still describe signal and not noise. The latent variables tend to have less individual meanings. The combined regression factors (a combination of the loadings from the latent variables) allow you to take any future spectrum and predict the physical property Y, or to predict a spectrum from a value of Y. Of course, the model will only work well if the relationship between the spectrum and the Y values is linear, and if your original data is sufficient to really capture the variability one is likely to find in future samples. If it is not linear, or if the original data set is limited, the model may give a qualitative understanding of how the spectral peaks relate to the physical property, but the model will be less good at providing accurate predictions.

PLS analyses with Y values such as coverage, thickness, or the percent of one component are straightforward. One can try to do PLS analyses on less straightforward Y value types such as surface energy, adhesion strength, or protein affinity. In these cases, the regression values will point to the peaks that most strongly vary with the physical property in question, and may then give insight into the mechanism for the variations in these properties.

Partial Least Squares-Discriminant Analysis (PLS-DA) is a very useful variant of PLS. When one wants to understand what in the spectra co-varies with a population difference, this is likely the method of choice. Imagine a set of samples, some of which work for a particular application, and others which do not. One way of approaching this is to assign a variable Y, which takes a value of 1 if the sample “passes” your test and 0 if it “fails.” This is in essence what one does with PLS-DA, except that with PLS-DA, one can specify multiple population groupings.

Multivariate Curve Resolution

Less mathematically elegant and less well determined, the intention behind multivariate curve resolution (MCR) is to actually separate out the spectra of the chemical components that vary from sample to sample in the data set. For TOF-SIMS data, one invariably sets the constraint that there should be no negative going peaks in the resulting components. This forces the components to come closer to being like real component spectra of the mixed spectra found in the real data. The results are thus more directly interpreted. However, there is often little apparent reason for the order of the components in the analysis. Unlike for PCA, the components are not determined in such a way that they capture successively lower levels of variance in the data. Thus the most significant component can be far down the list.

MCR is most usefully applied to image data. The appearance of the image of a component provides a check on the data. Components that represent noise look that way in their images. Components capturing real variations show real segregation in the images by region. You know these variations are real because the information about the distances between the pixels is not used in the calculation. If all the pixels in a region have similar scores for a particular component, that component is certainly describing something real about that region of the image.

It is possible to further constrain MCR analysis to look for the result that maximizes the pixel contrast, or alternatively, to cause it to maximize the spectral contrast (Gallagher et. al. 2004). These in a sense capture the range of solutions that MCR can produce (since unlike PCA, MCR does not necessarily produce unique solutions). If they are close, this is well and good. If they are different they may tell different stories about the sample, each valid in their own way.

MCR in most cases does not produce components one should trust to accurately portray the exact relative intensities of peaks one would find in the spectra of pure components. One would not, for example, seek to use MCR to produce pure spectra for a standards database from a set of samples containing mixtures of materials. However, the components generally come close enough to pure spectra for them to be recognizable and for identifications of materials to be made. Therein lies the great utility of MCR.

Maximum Auto-Correlation Factors

Maximum autocorrelation factors (MAF), unlike the other methods described earlier, is purely for use with imaging. It is like PCA, except that the information about pixel location within the image is included in the analysis, and the components capture regional variance within the images. This has the advantage of often capturing the variation that is of greatest interest to the analyst. The main disadvantage is that the appearance of significant contrast in the score images no longer provides an independent validation that a component has significance. Fortunately, correlated noise disguised as a significant component is rare.

Like PCA, MAF will not capture individual chemical components, and the components will have both positive and negative loadings, as described earlier in the PCA section. The interpretation of the component loadings is thus similarly challenging. On the other hand, unlike for PCA, MAF requires no preprocessing, and thus it is in fact the simplest of the directly determined MVA techniques that one can use for image analysis.

Data Preprocessing

Even before any preprocessing, one must deal with issues of dead time corrections and sample-to-sample intensity variations described earlier. The issues for MVA are similar to those for quantification, and the same rules apply.

With the exception of MAF, preprocessing is a required step for MVA analysis. The ideal preprocessing method for any given analysis can be a matter of debate. The good news is that you can try one, and if you are dissatisfied with the results or want to see if you can do better trying something else, you can simply do that. There are quite a few different approaches. Three are mentioned here.

Poisson scaling most reasonably adjusts the peak intensities for the relative degrees of noise in TOF-SIMS, where the data is typically acquired by pulse counting. Signal to noise generally increases as the square root of the signal intensity. Thus when taking data, doubling the length of the analysis will improve the signal to noise by a factor of the square root of two. Dividing each magnitude by the square root of the mean (Poisson scaling or square root mean scaling) effectively scales all the peaks to equal signal to noise. This only works if the data has been properly dead time corrected. Mistakes in dead time corrections can have significant effects on multivariate analyses (Tyler 2014). The correction is clearly the correct one to use for MCR analysis (Ohlhausen et al. 2004; Smentkowski et al. 2008; Windig, Keenan, and Wise 2008). It is arguably the method of choice, in fact, for all MVA methods when applied to TOF-SIMS analyses. Note, however, that when using the Orbitrap portion of the Hybrid SIMS, Poisson scaling is not appropriate. as the noise is not Poisson distributed (Gilmore et. al. 2024).
Mean centering is performed by subtracting the mean of a given peak’s distribution of intensities from the peak intensity in each spectrum. This makes the average of all the spectra zero. Mean centering has typically been the least preprocessing done for PCA and PLS analysis and their relatives. Without it, the first PCA component will describe the distance of the data from a mean of zero, which is essentially the total ion image. In most cases, this is not useful, although it is easy enough to ignore. Mean scaling alone before MVA has the effect of accentuating the large peaks in the spectrum at the expense of the small. This can make interpretation seem easier, but one can easily miss the importance of significant small peaks as a result. For MCR, mean scaling has no effect on the result.
Auto scaling starts with mean centering. Next, each peak’s intensity is divided by the variance in that peak’s intensity. The scaling by variance has the effect of making small peaks as significant to the analysis as large peaks. Whether this is a good thing may vary from data set to data set, and there is much discussion on the point of auto scaling versus simple mean centering. Auto scaling is used in Chemometrics when the different variables have different units, for example, if temperature, pressure, and concentration are all variables, and the choice of units will arbitrarily affect the MVA results. One can argue that, for example, counts of CH₃⁺ are effectively a different unit from counts of C₇H₇⁺. Certainly, TOF-SIMS counts can vary over many orders of magnitude, and the small peaks may end up being as significant to the analysis as the large ones. Assuming that the Poisson correction accurately equalizes the noise from peak to peak, it may do a better job of evening the model’s search for variance in large and small peaks. Still, not too much energy should be wasted on the argument since it is easy enough to try multiple processes and see what works best. Again for MCR, which is not being driven by the process of capturing variance, auto scaling is not used.