G Coefficients - G-String-Legacy/GS_MV GitHub Wiki

A good test has to measure what it claims to measure. Performance tests provide mean test scores for candidates $\overline X_{i} =\frac{1}{30} \sum \limits_{j=0}^{29}X_{i,j}$, but claim to rate their ability. The more the test scores of candidates reflect their abilities, the more generalizable is the test.

The test results of record are the mean scores of each student. But desired ranking of students depends on their ability. So the indicator of test quality (G Coefficient) is the ratio of the variance of student ability effect to the variance of their mean scores. In other words: "how much of the test score variance is explained by the ability effect variance?". Mathematically, that looks like this:

$$ \huge {\text{G Coefficient} = \frac{\sigma^2_s}{\sigma^2(\overline{X_i})} \qquad \text{(Equation 2)}} $$

More generally speaking, the literature recognizes two types of G Coefficients: Eρ2, and Φ, depending on how the student score variance has been defined. Unfortunately we don't get around looking more carefully at 'variance', and how it is calculated.

Let us consider a set of N random numbers yk, where k ranges from 0 to N-1. As we have seen, its arithmetic mean is calculated as:

$$\large \overline y =\frac{1}{N} \sum \limits_{k=0}^{N-1}y_{k} \qquad \text{(Equation 3)}$$ while its harmonic mean results from:

$$\large \tilde y=\frac{N}{\sum \limits_{k=0}^{N-1} \frac{1}{y_k}} \qquad \text{(Equation 4)}$$ and its variance:

$$\large \sigma^2 (y) = \frac{1}{N-1} \sum \limits_{k=0}^{N-1}(y_{k}-\overline y)^{2} \qquad \text{(Equation 5)}$$

It turns out, however, that the mean $\overline y$ no longer has a fixed value, it now has its own, admittedly smaller variance (variance of the mean):

$$\large \sigma^{2} (\overline y)=\frac{\sigma^2 (y)}{N} \qquad \text{(Equation 6)}$$

Let us now get back to the output of urGenova, after it has processed the data entered, we are looking at the variance components for the two facets - student, question, and their interaction:

Variance Components

We are left with figuring out, how to express $\sigma^2(\overline{X_i})$ in terms of the calculated variance components. We must keep in mind that we are interested in the variance of the facet of differentiation in relationship to the total score variance, i.e. averaged over all facets of generalization. This leads us to the following expression:

$$ \huge \text{G Coefficient} = \frac{\sigma^2 (\tau)}{\sigma^2(\tau)+ \sigma^2(\text{}. . . .)} \qquad \text{(Equation 7)} $$

where σ2(τ); stands for the sum of all variance components that contain the facet of differentiation, but not any facets of generalization (Brennan's Rule I.).

Depending on the type of G Coefficient we want to calculate, formula (7) becomes:

$$ \large \text{Generalization coefficient: } \huge\qquad E\rho^2 = \frac{\sigma^2 (\tau)}{\sigma^2(\tau)+ \sigma^2(\delta)} \qquad \text{(Equation 7a)} $$

or

$$ \large \text{Index of dependability: } \huge\qquad \Phi = \frac{\sigma^2 (\tau)}{\sigma^2(\tau)+ \sigma^2(\Delta)} \qquad \text{(Equation 7b)} $$

where σ2(Δ) is the sum of all variance components, except for σ2(τ) itself, divided by the size product of the respective facets of generalization (Brennan's Rule II.), and σ2(δ) the sum of all variance components that contains the facet of differentiation and at least one facet of generalization, divided by the sample size product of such. (Brennan's Rule III.)

Both Brennan's Rule II, and Brennan's Rule III require summing variances of means for facets of generalization. According to equation 6 this requires division of the sample variance by the total number of sample items. For set variances of single facet sets, this is simply the sample size of this facet. For purely crossed facets, it becomes the product of the facet sample sizes. But in the case of nested facets, there are multiple sample sizes, depending on the value of the nesting facet. In this case it becomes necessary to first calculate a mean of sample sizes for the nested facet. But it is no longer the arithmetic mean, the harmonic mean is required.

As a general convention, Eρ2 should have at least a value of 0.80 for 'High Stakes Exams'.

Next

⚠️ **GitHub.com Fallback** ⚠️