xxx Not a current SIMS Wiki Page 6. Analyzing the Data - mikee9265/SIMS-Wiki GitHub Wiki

In many cases the analyst knows exactly what to look for in the data. Peaks of interest have already been identified, added to a working peak list, and images were obtained even as the data was being acquired. Relative peak intensities from spectrum to spectrum or images within a single data set can be added to a report, and the analysis is complete.

Often, however, it is not possible to thoroughly plan the analysis at the outset. The effort is exploratory or unknowns are involved. In other cases, the signal to noise for individual peaks is not sufficient for the task, and it proves better to group many peaks together after the analysis is complete. In these cases it is necessary to more thoroughly explore the data the instrument has produced, both directly and via statistical methods. All this is made possible because the instrument will store all of the raw data, raster position, time of arrival at the detector relative to the primary pulse, and the point in the analysis at which the secondary ion was detected.

6.1 Information Content in TOF-SIMS Data

Every secondary ion detected comes with the following:

The time it takes from the primary gun’s pulse to the arrival of an ion at the detector is saved. This is translated via a mass calibration into a mass to charge ratio (m/e), which in most cases is simply the mass of the detected secondary ion.
The plot of m/e versus counts places every secondary ion in the context of a spectrum. Every secondary ion has a distance in m/e from other peaks in the spectrum. The patterns of peaks in the spectrum can have significance beyond the peak positions in the spectra.
The position of the primary ion beam on the sample (x, y) within the raster is saved. The combination of raster positions for all of the ions in a given peak gives the ion image for that peak.
The time of the pulse that produced each secondary ion is saved. When producing static data, this information is not particularly useful. When passing the static limit, or when the analysis is coupled with the use of a sputter gun, the time of the pulse becomes significant. Every mass range can be monitored for intensity changes through the course of the analysis. When a sputter beam is used, these plots amount to depth profiles.

Every peak in the spectrum or selected mass range can be used to produce an image. Every pixel in an image has an associated spectrum, and therefore spectra can be reconstructed from any selected area within an image. Patterns can be found in the spectra, and similarly, patterns may be evident in the images. Profiles of intensity versus analysis time can be made for any mass range, and similarly a profile for every mass is available for every pixel. When the sample is eroded during the analysis, this is the basis for 3D imaging.

This wealth of data also allows the analyst to correct some anomalies in the data. Charging effects not completely eliminated via charge compensation may broaden peaks due to differential charging either from location to location or in depth. Mass calibration can be performed differently for different areas/depths to compensate for the effects differential charging can have on ion flight times, thus sharpening spectra otherwise broadened by these effects. Similarly, height differences and other topographical effects can be somewhat compensated in this way.

6.2 Retrospective Analysis

Retrospective analysis best begins with a check of the mass calibration, because during the analysis, the calibration is often performed with only a subset of the data, but at this stage, the entire data set is available for calibration. This is obviously more important when the data has been acquired in a mode that allows for higher mass resolution. In cases where the sample is not completely flat or level, or when the analysis area is large, recalibration may also be appropriate when taking region of interest (ROI) spectra. IonTof software also has an “advanced TOF correction” routine, which relies on the fact that in principle each pixel has an associated mass spectrum, and thus each could have its own mass calibration. In practice there is often not enough data at each pixel to do independent mass calibrations, but the routine allows for the binning of as many pixels as it takes to get localized mass calibration corrections.

The acquisition of spectra and images from the raw data is a process that closely resembles the initial data acquisition. Obviously, it is not possible to change instrument settings such as primary ion species, mass range, and so on, at this stage. It is possible, however, to specify new mass intervals from within the full range to obtain new images and profiles. It is also possible to choose a subset of the area analyzed, a portion of a depth profile, or a volume from the 3D rendering of the data from which to obtain a spectrum.

Advances in software for some systems now allows changes to be made while an acquisition is in progress, thus allowing the addition of a new peak/mass interval, or the adjustment of an existing one, the selection of an area as an ROI, the grouping of mass intervals, etc. After the acquisition is completed, such changes can be made, and the results viewed without "replaying" the data stream.

Data sets can be quite large. The time of flight of each secondary ion is recorded by placing each ion that arrives at the detector into a channel, typically 156 ps long. Each ion is also associated with a pixel in an image that can range vastly in pixel count (especially for large area scans). For 3D data, the pixels are really voxels, separate from each other in x, y, and z. Uncompressed, such data sets can span many gigabytes. The data is, however, sparse. That is, many channels will have zero counts. Many more pixels will have zero counts from a given channel. Further, the fact that counts accumulate in a given channel, and not in the neighboring channel, is often purely a matter of chance. The same secondary ion will have counts spread across a number of channels. Similarly, the presence of counts in one pixel and not in a neighboring pixel may also be a matter of chance because the pixel size will often be significantly smaller than the primary ion beam spot size.

It is a common procedure when analyzing these data sets to bin data. In the image, this is easy. Neighboring pixels (usually in a square) can be combined to reduce the number of pixels. In the spectra, the process is a little more complicated as the relationship between the channel number and the mass is not linear. Nonetheless, after mass calibration it is a common procedure to rebin the data in set mass units (0.1 amu, or ditching mass accuracy, in 1 amu bins). A more meaningful approach is to attempt to bin the spectral data into peaks. In this exercise, one is trying to group counts together that represent a single secondary ion type. This is the approach taken by many other mass spectrometry methods. The challenge for time of flight secondary ion mass spectrometry (TOF-SIMS) is not only that the channel scale is nonlinear relative to the mass scale, but also that peak intensities can vary over more than six orders of magnitude. It has been a struggle to make accurate and reliable algorithms that will define peaks even as well as the human eye. The good news is that the instrument software does a reasonable job of this. The bad news is that after the software has defined a peak list/mass interval list for you, it is still a good idea to look though the spectrum and make sure the software has left nothing that you would find significant out. The effort is worthwhile in many cases, especially when samples have unknowns or where subtle differences in surfaces are under study.

Given a peak list, one can quickly explore the data for clues. When the data consists of a series of spectra of relatively homogeneous samples or sample areas, an exploration of the differences between the spectra in the series may immediately reveal the answers sought. In case of inhomogeneous sample areas, the analyst turns to an examination of the images generated from each of the peaks in the list for clues. When clearly different regions are revealed in the images, the identities of the peaks that map the different areas may be sufficient to answer the question posed. In cases where the relative intensities of the peaks mapping for distinct regions in the ion images help identify the materials in each region, one can obtain spectra for each of these areas. ROIs can be defined in the instrument software using a variety of techniques. Regions can be selected by manually drawing shapes on an ion image. Peak intensities can be used to specify regions in which the pixels have a defined range of intensities for a given peak or set of peaks. Sometimes when the signals are intense, the raw intensities can be used. In others, image-smoothing filters should be applied before defining a region. In any case, once a region is defined, the spectrum for just that region can be readily obtained. As a practical matter, in most instances, the qualitative analysis of the data can stop here.

Quantitation can follow if you have standards of similar and known surface composition. The relationship between signal intensities and concentration may or may not be linear, but even nonlinear calibration curves can be used. As noted earlier, this can be done with samples that have been analyzed by other methods (e.g., Auger, or X-ray photoelectron spectroscopy {XPS}). In order to do this, there are two prerequisites. First, the peaks you will use must not be saturated and the appropriate dead time corrections must be performed where needed. Simple dead time corrections are available in the instrument software. These may not be sufficient, however, especially in cases where multiple peaks are present at the same nominal mass; where there is a substantial metastable signal preceding a peak, dead time corrections that come directly from the instrument are not currently adequate (Tyler 2014; Tyler and Peterson 2013). In cases where peaks are too intense to be dead time corrected, they should simply not be used in the analysis. One can reduce the need for dead time corrections by simply using a lower primary ion flux and reducing the secondary ion count rate. Second, the absolute intensities must somehow be made comparable from spectrum to spectrum.

6.3 Multivariate Statistical Analysis and Image Processing

TOF-SIMS data sets lend themselves to multivariate statistical analysis (MVA) techniques of various types (Henderson 2013; Tyler 2006, Tyler 2013). This is because, while there are hundreds of separate peaks in most spectra, many of the peaks covary in intensity, and there are far fewer truly independent features in the data. One way to think of this is to imagine a surface having a limited set of chemical compounds. Each compound will produce a spectrum, independent of any other compound present. The final spectrum will be the sum of the signals coming from each chemical species. Given an additional number of spectra of samples with different relative amounts of the set of chemical species from the first sample, it makes sense that it would be possible to use a mathematical method to determine the spectra for each of the compounds and how much of each compound is present at the surface of each sample. Chemometrics, which is really another term for the use of multivariate statistics to take complex data and to determine what components of the data apply to each of a series of independent chemical species, is thus directly applicable to TOF-SIMS results.

The problem with doing this simply is that matrix effect again. If you could mix chemical species and the relative intensities of the peaks in each separate spectrum would stay fixed, with all the peaks in that spectrum increasing or decreasing linearly with concentration, the Chemometric approach to SIMS data would be perfect. The problem is that matrix effects cause the changes in peak intensities to sometimes vary nonlinearly with concentration, and even within the spectrum of one species, peaks may vary in intensity relative to one another depending on what else is around. It is for this reason that multivariate statistical analysis of TOF-SIMS data is often used more qualitatively than quantitatively, and even then the approach is not always successful. Still, given all the concern over matrix effects, it is remarkable how often the statistical approach does work, and how useful it is when it does. Further, you can tell when you are getting a good fit to the data and when you are not.

In most cases, you can successfully analyze the data without MVA, so you might wonder why go to such trouble when a direct approach would be sufficient. The answer is that MVA is increasingly easy to do once you have made it sufficiently far up the learning curve. Some methods are now incorporated into the instrument software, saving the step of having to export the data. As with any data analysis approach, the easier and faster the method, the more often it is worthwhile to use it.

As noted above, some multivariate statistical analyses can be done using the vendor-supplied software. Multiple platforms exist for more sophisticated and complete analysis packages that work well with TOF-SIMS data. The NESAC/BIO center in Seattle offers free software along with tutorials and examples (NESAC/BIO 2015). Commercially available software can be obtained from Eigenvector Research (“Eigenvector Research: Chemometrics Software, Consulting and Training” 2015). The Eigenvector software also comes with tutorials and detailed documentation. There are now many types of MVA that may be applied to TOF-SIMS data analysis. A discussion of four major categories of these techniques follow.

6.3.1 Principal Component Analysis

As noted earlier, some peaks in the spectrum will tend to vary together, and some independently. For example, the peaks belonging to one species (say, the substrate) may all go down while peaks representative of another species (say, an overlayer) may all go up in a series of spectra where there is an increasing concentration of the second species. The variations in the spectra can be between samples, or between areas of the same sample, in an image. The variations can be captured mathematically using principal component analysis (PCA). The technique linearly recombines the peaks in the spectrum to make new components. Mathematicians describe this as a rotation in multidimensional space, where each peak represents one of the original dimensions. The first component is a linear combination of the peaks that captures the most variation one can capture with a single linear combination of the peaks. The second, orthogonal to the first, is a second linear combination of the peaks designed to capture as much of the remaining variation as can be captured with a single linear combination of the peaks. The factors that together describe this linear combination are called loadings. Additional components are calculated until the analyst concludes that any remaining variation is noise. The amount of each component that contributes to the overall spectrum of a particular sample (or pixel) is called a score.

PCA does not produce components that describe an individual chemical species. This is because the signals for different species on the surface are rarely independent of each other. In the example above, the substrate is attenuated when the coverage of the overlayer increases. Most of the variations in this set of samples may be captured with one component, because the two species have signals that are inversely proportional to one another. A plot of the loadings for this component versus the mass of the peaks will look like two different spectra, one spectrum that is positive going, and one negative going. One will represent the substrate and, the other, the overlayer. Let’s say that, for our example, the peaks most representative of the overlayer have positive loadings and the peaks representing the substrate have negative loadings. For samples where there is almost no overlayer, the score for this component will be very negative. For samples where the substrate is largely attenuated and the signals for the overlayer are ascendant, the score will be very positive. In simple cases like this, one can quickly get used to the presence of positive and negative going loadings in the component.

It gets more complicated when you have more than two species varying in concentration. Imagine a sample with a substrate and submonolayer levels of species A and B. If the total coverage varies and the relative amounts of A and B vary, a number of PCA results may be obtained, and it is possible that peaks from multiple species will be in multiple components. You may get a component with positive A peaks and negative substrate peaks along with a component with positive B peaks and negative substrate peaks. This might occur if the variability in the sample set is dominated by the overall coverage of the various overlayers. If the variation is dominated by species B replacing species A as the dominant source of variation in the data, the major component could have positive going A peaks and negative B peaks, with a second component where a mixture of A and B peaks are positive and the substrate is negative. Real samples will often produce even more complicated components.

PCA components can take some getting used to, but with a little bit of experience, one gets used to their quirks, and the results of these analyses become more readily apparent. Practice with real data sets, starting with some that are well understood, is recommended. PCA can also be a good starting point for statistical analysis. In image analysis, it will certainly reveal where the variations in the image are by region. One can run PCA prior to performing one of the other methods described below. One advantage of PCA is that it requires no user input (except for preprocessing, see following text). The results are determined directly from the data, so the user does not need to fear how their choices may have produced spurious results.

An additional use of PCA is in reducing noise in the data. One fits the data as described above until the remaining components appear to be just noise. Using the components that contain the signal and not the noise, one reverses the calculation to reproduce the original results, but now without much of the noise.

6.3.2 Partial Least Squares

Partial least squares (PLS) is like PCA, but instead of looking for the linear combination of peaks that best describes the variability in the spectral data set, PLS draws out the variation in the data set that captures the variability in another variable. Take our first example with overlayer A and a substrate. An additional measurement (e.g., ellipsometry or X-ray reflectance {XRR}) may have been used to determine the thickness of the overlayer. This set of values for a series of samples becomes your Y vector, each value of Y is associated with a spectrum (a row in matrix X). PLS will capture the linear combination of spectral peaks that best describes the variation in overcoat thickness.

The terminology for PLS is different in that, instead of components, one has latent variables. One strives to use as many latent variables that still describe signal and not noise. The latent variables tend to have less individual meanings. The combined regression factors (a combination of the loadings from the latent variables) allow you to take any future spectrum and predict the physical property Y, or to predict a spectrum from a value of Y. Of course, the model will only work well if the relationship between the spectrum and the Y values is linear, and if your original data is sufficient to really capture the variability one is likely to find in future samples. If it is not linear, or if the original data set is limited, the model may give a qualitative understanding of how the spectral peaks relate to the physical property, but the model will be less good at providing accurate predictions.

PLS analyses with Y values such as coverage, thickness, or the percent of one component are straightforward. One can try to do PLS analyses on less straightforward Y value types such as surface energy, adhesion strength, or protein affinity. In these cases, the regression values will point to the peaks that most strongly vary with the physical property in question, and may then give insight into the mechanism for the variations in these properties.

Partial Least Squares-Discriminant Analysis (PLS-DA) is a very useful variant of PLS. When one wants to understand what in the spectra co-varies with a population difference, this is likely the method of choice. Imagine a set of samples, some of which work for a particular application, and others which do not. One way of approaching this is to assign a variable Y, which takes a value of 1 if the sample “passes” your test and 0 if it “fails.” This is in essence what one does with PLS-DA, except that with PLS-DA, one can specify multiple population groupings.

6.3.3 Multivariate Curve Resolution

Less mathematically elegant and less well determined, the intention behind multivariate curve resolution (MCR) is to actually separate out the spectra of the chemical components that vary from sample to sample in the data set. For TOF-SIMS data, one invariably sets the constraint that there should be no negative going peaks in the resulting components. This forces the components to come closer to being like real component spectra of the mixed spectra found in the real data. The results are thus more directly interpreted. However, there is often little apparent reason for the order of the components in the analysis. Unlike for PCA, the components are not determined in such a way that they capture successively lower levels of variance in the data. Thus the most significant component can be far down the list.

MCR is most usefully applied to image data. The appearance of the image of a component provides a check on the data. Components that represent noise look that way in their images. Components capturing real variations show real segregation in the images by region. You know these variations are real because the information about the distances between the pixels is not used in the calculation. If all the pixels in a region have similar scores for a particular component, that component is certainly describing something real about that region of the image.

It is possible to further constrain MCR analysis to look for the result that maximizes the pixel contrast, or alternatively, to cause it to maximize the spectral contrast (Gallagher et. al. 2004). These in a sense capture the range of solutions that MCR can produce (since unlike PCA, MCR does not necessarily produce unique solutions). If they are close, this is well and good. If they are different they may tell different stories about the sample, each valid in their own way.

MCR in most cases does not produce components one should trust to accurately portray the exact relative intensities of peaks one would find in the spectra of pure components. One would not, for example, seek to use MCR to produce pure spectra for a standards database from a set of samples containing mixtures of materials. However, the components generally come close enough to pure spectra for them to be recognizable and for identifications of materials to be made. Therein lies the great utility of MCR.

6.3.4 Maximum Auto-Correlation Factors

Maximum autocorrelation factors (MAF), unlike the other methods described earlier, is purely for use with imaging. It is like PCA, except that the information about pixel location within the image is included in the analysis, and the components capture regional variance within the images. This has the advantage of often capturing the variation that is of greatest interest to the analyst. The main disadvantage is that the appearance of significant contrast in the score images no longer provides an independent validation that a component has significance. Fortunately, correlated noise disguised as a significant component is rare.

Like PCA, MAF will not capture individual chemical components, and the components will have both positive and negative loadings, as described earlier in the PCA section. The interpretation of the component loadings is thus similarly challenging. On the other hand, unlike for PCA, MAF requires no preprocessing, and thus it is in fact the simplest of the directly determined MVA techniques that one can use for image analysis.

6.3.5 Data Preprocessing

Even before any preprocessing, one must deal with issues of dead time corrections and sample-to-sample intensity variations described earlier. The issues for MVA are similar to those for quantification, and the same rules apply.

With the exception of MAF, preprocessing is a required step for MVA analysis. The ideal preprocessing method for any given analysis can be a matter of debate. The good news is that you can try one, and if you are dissatisfied with the results or want to see if you can do better trying something else, you can simply do that. There are quite a few different approaches. Three are mentioned here.

Poisson scaling most reasonably adjusts the peak intensities for the relative degrees of noise in TOF-SIMS, where the data is typically acquired by pulse counting. Signal to noise generally increases as the square root of the signal intensity. Thus when taking data, doubling the length of the analysis will improve the signal to noise by a factor of the square root of two. Dividing each magnitude by the square root of the mean (Poisson scaling or square root mean scaling) effectively scales all the peaks to equal signal to noise. This only works if the data has been properly dead time corrected. Mistakes in dead time corrections can have significant effects on multivariate analyses (Tyler 2014). The correction is clearly the correct one to use for MCR analysis (Ohlhausen et al. 2004; Smentkowski et al. 2008; Windig, Keenan, and Wise 2008). It is arguably the method of choice, in fact, for all MVA methods when applied to TOF-SIMS analyses. Note, however, that when using the Orbitrap portion of the Hybrid SIMS, Poisson scaling is not appropriate. as the noise is not Poisson distributed (Gilmore et. al. 2024).
Mean centering is performed by subtracting the mean of a given peak’s distribution of intensities from the peak intensity in each spectrum. This makes the average of all the spectra zero. Mean centering has typically been the least preprocessing done for PCA and PLS analysis and their relatives. Without it, the first PCA component will describe the distance of the data from a mean of zero, which is essentially the total ion image. In most cases, this is not useful, although it is easy enough to ignore. Mean scaling alone before MVA has the effect of accentuating the large peaks in the spectrum at the expense of the small. This can make interpretation seem easier, but one can easily miss the importance of significant small peaks as a result. For MCR, mean scaling has no effect on the result.
Auto scaling starts with mean centering. Next, each peak’s intensity is divided by the variance in that peak’s intensity. The scaling by variance has the effect of making small peaks as significant to the analysis as large peaks. Whether this is a good thing may vary from data set to data set, and there is much discussion on the point of auto scaling versus simple mean centering. Auto scaling is used in Chemometrics when the different variables have different units, for example, if temperature, pressure, and concentration are all variables, and the choice of units will arbitrarily affect the MVA results. One can argue that, for example, counts of CH₃⁺ are effectively a different unit from counts of C₇H₇⁺. Certainly, TOF-SIMS counts can vary over many orders of magnitude, and the small peaks may end up being as significant to the analysis as the large ones. Assuming that the Poisson correction accurately equalizes the noise from peak to peak, it may do a better job of evening the model’s search for variance in large and small peaks. Still, not too much energy should be wasted on the argument since it is easy enough to try multiple processes and see what works best. Again for MCR, which is not being driven by the process of capturing variance, auto scaling is not used.

6.3.6 Image Processing

The world of image manipulation and analysis is a wide one (Russ 2015), more of it applicable to TOF-SIMS images than is in common use. Features can be counted and sized. Borders between regions can be defined. Noise can be reduced.

Sophisticated techniques can be used to define regions of interest (ROIs). Intensity histograms can reveal multimodal distributions among pixels that in turn can also be used to define ROI. Edges can be sharpened. Repeat distances in sample patterns can be quantified. In addition, there is a great deal of prior art in image representation for best presentation of nonoptical images.

6.4 A Data Analysis Example

While no one example can span the varied paths that the interpretation of a TOF-SIMS data set can take, this one is a good demonstration of a basic approach beginning with the initial assessment of the results and ending with MVA.

The sample was created during the assessment of a new material for potential use in disk drive construction. The part, primarily a polyoxymethylene (POM) polymer, was placed onto a magnetic recording disk and treated at elevated temperature and humidity for a set time. The part was removed, and the disk examined optically. The disk had a light haze and was submitted for TOF-SIMS analysis. The SIMS results readily showed the presence of transferred materials.

6.4.1 The Initial Assessment of the Data

The figure below showing the positive ion spectrum of the haze is dominated by the familiar (to a TOF-SIMS analyst in the disk drive industry) pattern of peaks consistent with a perfluoro-polyether (PFPE) generally to be found on magnetic recording disks. The familiar pattern is augmented by obvious added high mass ions. The most prominent of these includes a trio of peaks with 28 amu mass differences centered near 283 amu, and a prominent lone cluster of peaks with the largest at a nominal mass of 425 amu.

Figure: Log scale positive ion TOF-SIMS spectrum of a haze formed on a magnetic disk surface upon exposure to a part constructed of Poly Oxy Methylene (POM).

The 28 amu pattern is suggestive of two carbon chain length differences (CH₂CH₂). The 3 peaks are homologues. The 425 amu peak likely belongs to a separate compound, because if the lower mass trio of peaks were fragments of the larger 425 species, the 425 peak would also have to have neighbors with 28 amu mass differences. All of these peaks could be fragments of a much larger molecule whose molecular ion was not detected, however.

Figure: Positive ion images from the TOF-SIMS analysis of the sample described in above (scale bar is 100 microns). (a) Image formed from ions belonging to the disk lubricant, (b) Image formed from the peaks at nominal masses 283 and 311 amu, and (c) Image formed from the peak at nominal mass 425 amu.

The figure above shows the images of the combined peaks associated with the disk lubricant (for this sample this material is part of the substrate, essentially forming the background to the analysis), a combined image from the homologue peaks at 283 and 311 amu, and finally the image of the peak at 425 amu. The images clearly show than the 425 amu peak is coming from a different compound than the 283 and 311 amu peaks. In essence, imaging laterally inhomogeneous samples allows some separation in the SIMS, the kind of separation that other mass spectrometric techniques are afforded with various forms of chromatography.

6.4.2 Using ROIs

Given the images, it is possible to revisit the raw data and take spectra from selected regions of interest (ROIs). This is typically done by selecting the pixels in the images with intensities for a peak of interest. In this case, we can define the first ROI as the areas where the 283 and 311 amu peaks are most intense. The second ROI can be for pixels with intense 425 amu signals. The resulting spectra are shown below. Note that it is possible to get more sophisticated in one’s ROI selections, for example, picking pixels with large intensities for the 283 and 311 peaks but low intensities for the lubricant. Such methods can produce cleaner spectra for one species or another.

Figure: Positive ion spectra from selected ROIs from the TOF-SIMS analysis of the sample described above. (a) Spectrum from ROI containing intense 283, 311 amu peaks, and (b) Spectrum from ROI containing an intense 425 amu peak.

Note that the resulting spectra are not “pure”. There is a bit of the 425 amu component in the 283 and 311 species spectrum, and vice versa. Nonetheless, one has a fair sense of what the pure spectra would be like; enough to start attempting matches with spectral databases. Indeed, the 283 and 311 materials bear a strong resemblance to a library spectrum of ethylene glycol monostearate, which has a major peak at 311 amu, although that spectrum has a molecular ion M+1 peak at 329 amu missing in the unknown’s spectrum. The near match suggests the possibility of a related compound.

This is where the use of multiple techniques pays off. The Nuclear Magnetic Resonance (NMR) analysis of the extract from this part shows the presence of ethylene glycol di-esters. Now, a close inspection of the spectrum of the 283- and 311-rich ROI reveals very weak peaks at the masses for the ethylene glycol di-esters of stearic and palmitic acids. The material is, therefore, clearly identified. Even the mechanism for the formation of the 283 and 311 amu ions can be readily understood as shown below.

Figure: Mechanism for the formation of the 311 amu ion from the di-ester of ethylene glycol.

Inspection of the spectrum of the ROI with the most intense 425 amu signals (Spectrum labelled b in this above figure) shows peaks that belong in the spectrum of that unknown which include those at 155.152, 271.224, 425.341, and a weak peak at 595.546 amu. The mass accuracy at the lowest of these allows us to identify its empirical formula as C₁₀H₁₉. We can see from the degree of mass excess that the peaks at 271 and 425 amu have relatively more unsaturation or are more cyclic than the ions found at 155 and 595 amu. Note that the other way to obtain this list of peaks is to create an exhaustive peak list from the total spectrum and to then inspect all the images one gets from each of those peaks, determining the ones that map similarly.

We have enough information to recognize the material if we ever see it again, but the lack of a match in a spectral database makes it difficult to identify the compound. This 425 amu species remains an unknown. This kind of incomplete result from the analysis of a data set is unfortunately common. This is the kind of situation where MS/MS is most useful.

The steps we take to determine which species map similarly or which peaks are also to be found in the spectrum of pixels containing intense signals for a peak of interest are simple ways to determine what peaks covary with what other peaks in the data set. Given the ease with which MVA can be applied to a SIMS data set, it can be faster to start with MVA analysis in some cases. Alternatively, one can apply MVA to a set of data that has already yielded much to direct methods for completeness.

6.4.3 Using MVA

The figure below shows the results of the MCR analysis of this data. To obtain these results, first a complete list of peaks was defined for the spectrum. A routine that parses the raw data based on the list of peaks was used to create the matrix of peaks versus intensities, each row representing a pixel and each column a peak. The data had been taken at low enough secondary ion currents that dead time effects were not significant. Division by the square root of the mean (Poisson noise correction) was performed prior to performing the MCR analysis. Each image in the figure represents the scores for the component, that is, how much of that component can be found in each pixel. The intensities are shown in a thermal scale where black would be no intensity, and dark red the highest. The loadings show the component “spectra”.

Figure: Score images and loading plots for five components from the MCR analysis of the TOF-SIMS analysis of the sample described above. The data was binned from its original 256 × 256 size down to 64 × 64 for computational speed and was preprocessed via division by the square root of the mean.

Components 1 and 3 show the ethylene glycol di-esters. Interestingly, the mixture of the di-ester homologues is not homogeneous; it varies on this sample from location to location in the relative amounts of the ester chain lengths. As noted above, it is possible to apply added constraints to the analysis to look for the most spectrally pure of the solutions possible, but that was not done in this case.

Component 4 is the unknown 425 amu compound. As has been noted before, the loadings are like a spectrum, but the relative intensities accentuate the most unique peaks in this compound’s spectrum rather than giving an accurate rendition of what the pure spectrum would actually be. It is possible from this analysis to see peaks that are associated with the 425 amu component that were previously easily overlooked, such as the ion at 81 amu (C,sub>6H₉⁺).

Component 5 is the disk lubricant. The MCR analysis cleanly separates it from the other components. The ROI spectra had more lubricant in them, since they were created by looking for regions with the highest intensity for specific peak combinations rather than by looking for regions with the least lubricant signal. One of the strengths of MCR is its ability to produce components similar to pure spectra for compounds that are not isolated anywhere on the sample.

Finally, Component 2 is mapping for feature edges. The result appears to be catching an effect of the topography on the SIMS data. Ions scattered off the higher haze feature edges and subsequently sputtering the nearby disk are likely the cause of this interesting component. Note that the result, while an “artifact” of sorts, represents real data coming from the sample, containing information about sample topography and is thus not noise. It is also a finding that “ordinary” analysis of the data would not be likely to reveal.

References

“Eigenvector Research: Chemometrics Software, Consulting and Training.” 2015. www.eigenvector.com/ (accessed October 12, 2015).

Gallagher, Neal B., Jeremy M. Shaver, Elaine B. Martin, Julian Morris, Barry M. Wise, and Willem Windig. “Curve Resolution for Multivariate Images with Applications to TOF-SIMS and Raman.” Chemometrics and Intelligent Laboratory Systems, 8th Scandinavian Symposium on Chemometrics (SSC8), Mariehamn, Aland, Finland 14-18 June 2003, 73, no. 1 (September 28, 2004): 105–17. https://doi.org/10.1016/j.chemolab.2004.04.003.

Gilmore, Ian, Michael Keenan, Gustavo Trindade, Alexander Pirkl, Clare Newell, Yuhong Jin, Konstantin Aizikov, et al. “Orbitrap Noise Structure and Method for Noise-Unbiased Multivariate Analysis,” February 19, 2024. https://doi.org/10.21203/rs.3.rs-3911895/v1.

Henderson, A. 2013. “Multivariate Analysis of SIMS Spectra.” In ToF-SIMS: Materials Analysis by Mass Spectrometry, 449–84. 2nd ed. Chichester, West Sussex, UK: IM Publications LLP.

NESAC/BIO. 2015. “Multivariate Surface Analysis Homepage.” www.nb.uw.edu/mvsa/multivariate-surface-analysis-homepage (accessed October 12, 2015).

Ohlhausen, J.A.T., M.R. Keenan, P.G. Kotula, and D.E. Peebles. 2004. “Multivariate Statistical Analysis of Time-of-Flight Secondary Ion Mass Spectrometry Images Using AXSIA.” Applied Surface Science 231–232, pp. 230–34. doi:10.1016/j.apsusc.2004.03.020

Russ, J.C. 2015. The Image Processing Handbook. 7th ed. CRC Press.

Smentkowski, V.S., S.G. Ostrowski, F. Kollmer, A. Schnieders, M.R. Keenan, J.A. Ohlhausen, and P.G. Kotula. 2008. “Multivariate Statistical Analysis of Non-Mass-Selected ToF-SIMS Data.” Surface and Interface Analysis 40, no. 8, pp. 1176–82. doi:10.1002/sia.2862

Tyler, B.J. 2006. “Multivariate Statistical Image Processing for Molecular Specific Imaging in Organic and Bio-Systems.” Applied Surface Science 252, no. 19, pp. 6875–82. doi:10.1016/j.apsusc.2006.02.160

Tyler, B.J. 2013. “TOF-SIMS Image Analysis.” In ToF-SIMS: Materials Analysis by Mass Spectrometry, 485–502. 2nd ed. Chichester, West Sussex, UK: IM Publications LLP.

Tyler, B.J. 2014. “The Accuracy and Precision of the Advanced Poisson Dead-Time Correction and Its Importance for Multivariate Analysis of High Mass Resolution ToF-SIMS Data.” Surface and Interface Analysis 46, no. 9, pp. 581–90. doi:10.1002/sia.5543

Tyler, B.J., and R.E. Peterson. 2013. “Dead-Time Correction for Time-of-Flight Secondary-Ion Mass Spectral Images: A Critical Issue in Multivariate Image Analysis.” Surface and Interface Analysis 45, no. 1, pp. 475–78. doi:10.1002/sia.5106

Windig, W., M.R. Keenan, and B.M. Wise. 2008. “The Effects of Pre-Processing of Image Data on Self-Modeling Image Analysis.” Journal of Chemometrics 22, no. 9, pp. 500–509. doi:10.1002/cem.1164

⚠️ GitHub.com Fallback ⚠️