aWE CBM Model 1 - shmercer/writeAlizer GitHub Wiki

Automated Written Expression CBM (aWE-CBM) Model 1

General Description

Total Words Written(TWW) scores are generated directly from the GAMET word count score. Words Spelled Correctly (WSC) scores are generated by subtracting the GAMET misspelling score from the GAMET word count score.

Correct Word Sequences (CWS) and Correct Minus Incorrect Word Sequences (CIWS) scores are based on emsemble models originally trained to predict CBM scores on 7 min narrative writing samples ("I once had a magic pencil and ...") from students in the fall, winter, and spring of Grades 2-5 (Mercer et al., 2019).

More details on the sample are available in Mercer et al. (2019).

Mercer, S. H., Keller-Margulis, M. A., Faith, E. L., Reid, E. K., & Ochs, S. (2019). The potential for automated text evaluation to improve the technical adequacy of written expression curriculum-based measurement. Learning Disability Quarterly, 42, 117-128. https://doi.org/10.1177/0731948718803296

The CWS and CIWS models are detailed below (from Mercer et al., 2021).

Mercer, S. H., Cannon, J. E., Squires, B., Guo, Y., & Pinco, E. (2021). Accuracy of automated written expression curriculum-based measurement scoring. Canadian Journal of School Psychology, 36, 304-317. https://doi.org/10.1177/0829573520987753 link to pre-print of accepted article

Correct Word Sequences Model

Metric Overall GBM SVM ENET MARS
Word Count 75.48 86.79 67.10 77.17 77.84
Spelling 14.26 0.62 0.00 21.41 22.05
%Spelling 8.78 12.28 27.95 0.40 0.11
Grammar 0.85 0.05 2.77 0.11 0.00
%Grammar 0.01 0.06 0.01 0.00 0.00
Duplication 0.04 0.12 0.12 0.00 0.00
Typography 0.38 0.08 1.33 0.00 0.00
White Space 0.20 0.00 0.71 0.92 0.00

Note. The weightings sum to 100; thus, they can be viewed as the percentage contribution of each metric to the predicted scores. Overall = the ensemble model of all algorithms, GBM = stochastic gradient boosted regression trees, SVM = support vector machines (radial kernel), ENET = elastic net regression, MARS = bagged multivariate adaptive regression splines. The following regression equation was used to weight the algorithms in the CWS ensemble model: .162 + .074 * GBM + .281 * SVM + .001 * ENET + .642 * MARS.

Correct Minus Incorrect Word Sequences Model

Metric Overall GBM SVM ENET MARS
Word Count 55.60 55.76 47.57 61.43 61.35
Spelling 19.25 1.48 6.57 35.80 35.04
%Spelling 22.31 41.99 42.74 0.00 0.00
Grammar 0.82 0.00 1.69 0.00 0.62
%Grammar 0.04 0.23 0.00 0.00 0.00
Duplication 0.28 0.10 0.76 0.00 0.00
Typography 1.37 0.41 0.07 1.55 2.97
White Space 0.34 0.04 0.60 1.22 0.00

Note. The weightings sum to 100; thus, they can be viewed as the percentage contribution of each metric to the predicted scores. Overall = the ensemble model of all algorithms, GBM = stochastic gradient boosted regression trees, SVM = support vector machines (radial kernel), ENET = elastic net regression, MARS = bagged multivariate adaptive regression splines. The following equation was used for the CIWS model: -.170 + .180 * GBM + .346 * SVM + .100 * ENET + .375 * MARS.