Relationship Between Decoding and Conventional ERP Analyses - ucdavis/erplab GitHub Wiki

To understand the nature of decoding, it is helpful to compare decoding accuracy with the conventional approach of measuring the difference in voltage between the ERPs for two classes. This is illustrated in Figure 7, which shows the grand average difference wave at three electrode sites, constructed by subtracting the ERP elicited by ID 1 from the ERP elicited by ID 2. The shading shows the standard error of the mean (SEM) at each time point, which reflects the variability across participants.

You can see that the difference wave exceeded -1 µV at the Cz electrode site just before 200 ms, indicating that the brain differentiated between the two face identities during this time period. Is this a real effect or just noise? In a traditional analysis, you could answer this question by computing a one-sample t test at each time point (i.e., comparing the difference between conditions at each time point to zero). The t value is simply the mean across participants and divided by the SEM (where the SEM is the standard deviation across participants divided by the square root of the number of participants). So, you can estimate the t value “by eye” at each time point from the data shown in Figure 7 by dividing the mean voltage at a given time point by the SEM at that time point. The t value would be approximately -2 just before 200 ms at the Cz electrode site because the mean is approximately twice as great as the SEM. (Note that the shaded error shows ±1 SEM, so you should look at the SEM on one side of the mean when performing the division.)

Slide7

With only 5 participants, we would need a t value of less than -2.776 (or greater than +2.776) to achieve statistical significance. In addition, we would need to perform a correction for multiple comparisons if we looked at several time points. This effect is pretty far away from being significant, and it’s unlikely that we would find a significant difference between ID 1 and ID 2 even if we looked at the full sample of 16 participants. We can easily find a significant difference in the ERPs between faces and other physically different stimulus categories such as cars, but the difference between two individual faces is usually too small (relative to the variability across participants) to see in conventional ERP analyses. That’s why we use decoding!

One major source of variability in voltage across participants at a given electrode site is that everyone’s brain is folded up differently, so the scalp distribution of a given ERP effect can differ widely across participants. For example, the difference between ID 1 and ID 2 in Figure 6 might be largest at Cz in one participant and largest at Fz in another participant. If you’ve spent much time looking at single-participant ERPs, you’ll know that scalp distributions can vary widely across participants. Because the t value is computed by dividing the mean by the SEM, this variability in scalp distribution across participants makes it difficult for a true difference between classes to be statistically significant. One common solution to this problem is to average the data across a cluster of electrode sites. However, for any given participant, the mean across the cluster of sites will always be smaller than the voltage at that participant’s “best” electrode site.

To summarize, conventional ERP analyses ask whether the difference in voltage between the two classes (when averaged across participants) is large relative to the variability across participants. Anything that produces variability across participants—including differences in brain folding that produce differences in scalp distribution—will make it less likely that a real difference in brain activity between the two classes is statistically significant.

To perform a decoding analysis of these same data from ID 1 and ID 2, we would perform the decoding separately for each subject, and then we would use a t test to ask whether the decoding accuracy at each time point is significantly greater than chance. For example, Figure 2 shows the mean decoding accuracy across our 5 participants at each time point, and the shading is the standard error of this mean. To compute the t value for a given time point, we take the mean decoding accuracy minus chance, and we divide this by the standard error. It doesn’t matter if different subjects have different scalp distributions; all that matters is the decoding accuracy. This tends to minimize the standard error, giving us bigger t values than we would get in a conventional analysis.

If you look at Figure 2 (which I’ve repeated again here), you’ll see that the difference between the mean decoding accuracy and chance is approximatley 0.2 at the peak of the decoding accuracy, with a standard error of approximatley 0.05. This would give us a t value of approximately 4, which is approximately twice as great as the t value from the conventional ERP amplitude analysis. As a result, decoding is typically more sensitive than traditional ERP amplitude approach (i.e., decoding gives us greater statistical power).

In a stastistical analysis of decoding accuracy, the t value is driven by both the mean accuracy across participants and the variability in accuracy across participants. It is therefore important to understand the factors that determine mean decoding accuracy and the variability in decoding accuracy. As we discussed earlier, the decoding accuracy for a given participant depends on how different the voltage patterns for different classes are and the amount of trial-to-trial variability in this pattern. Thus, trial-to-trial variability directly reduces decoding accuracy and statistical power. In conventional analyses, trial-to-trial variability impacts the variability across subjects but does not impact the expected value of the mean across subjects. Consequently, trial-to-trial variability is more detrimental to statistical power in decoding analyses than in conventional amplitude analyses. Moreover, differences between participants in the amount of trial-to-trial variability will cause differences between participants in decoding accuracy, thereby decreasing statistical power. By contrast, differences in trial-to-trial variability across participants do not have such a direct impact on statistical power in conventional amplitude analyses.

The key to statistical power in decoding analyses is therefore to minimize trial-to-trial variability. Of course, it is also important to have large overall differences between the classes, but that is also true in conventional amplitude analyses. The difference is that trial-to-trial variability has a larger impact on statistical power for decoding analyses than for traditional amplitude analyses. Our lab therefore puts extra effort into recording clean data in our decoding analyses. For example, we try to obtain electrode impedances of less than 20 KΩ in our decoding experiments, whereas we allow impedances of up to 50 KΩ in traditional experiments.