ReaderBench Model 2c Variable Importance - shmercer/writeAlizer GitHub Wiki

Ensemble Weightings and Metric Importance

ReaderBench Model 2c

This model used Coh-Metrix scores from 7 min narrative writing samples (I once had a magic pencil and ...) from 124 students in the spring of Grades 2-5 (Mercer et al., 2019) to predict holistic writing quality on the samples (elo ratings calculated from paired comparisons).

Highly correlated ReaderBench metrics (r > |.90|) were excluded during pre-processing (see section on Scoring Model Development for more details).

Mercer, S. H., Keller-Margulis, M. A., Faith, E. L., Reid, E. K., & Ochs, S. (2019). The potential for automated text evaluation to improve the technical adequacy of written expression curriculum-based measurement. Learning Disability Quarterly, 42, 117-128. https://doi.org/10.1177/0731948718803296

Algorithm Weightings in Ensemble

Abbreviations:

  • all = ensemble model
  • pls = partial least squares regression
  • rf = random forest regression
  • mars = bagged multivariate adaptive regression splines
  • gbm = stochastic gradient boosted trees
  • svm = support vector machines
  • cube = cubist regression

The table below presents the linear weightings of each algorithm for the ensemble model.

Intercept pls rf mars gbm svm cube
-7.3027 0.2354 0.1868 0.1595 0.1816 0.2191 0.0704

Metric Importance in Each Algorithm and Ensemble

Each column sums to 100 (so values can be interpreted as % contribution to the model).

Metric overall pls rf mars gbm svm cube
Content.words 11.99 4.55 5.81 30.16 21.71 4.24 11.11
WdEnt 7.28 4.3 5.74 0 21.09 4.12 12.09
AvgDepsSen_compound 3.97 2.07 1.98 13.22 2.22 1.52 6.82
AvgWdLen 3.87 2.64 2.65 7.11 4.85 2.04 7.02
LxcDiv 3.77 4.06 4.13 0 7.72 3.59 0.78
AvgChainSpan 3.36 3.09 2.64 5.1 4.15 2.66 2.34
TCorefChainBigSpan 2.64 2.23 1.48 10.59 0.43 0.93 0
Sentences 2.37 3.33 2.15 0 2.63 2.24 4.87
AvgDepsSen_mark 2.21 0.38 1.17 10.59 0.08 1.45 0
AvgDepsSen_dobj 2 0.81 0.96 8.72 0.14 1.13 0.97
AvgSenAdjCoh_LSA 1.95 2.68 1.87 0 3.17 2.26 0
AvgCorefChain 1.94 2.2 1 5.1 0.28 1.28 2.73
WdDiffWdStem 1.92 2.4 1.86 0 2.95 2.09 1.56
LexChainMaxSp 1.82 3.13 2.35 0 1.28 2.01 0.97
WdLettStdDev 1.79 3 1.66 0 1.64 2.28 0.97
TCorefChainDoc 1.62 3.23 1.85 0 0.17 1.92 2.14
CharEnt 1.59 2.56 0.9 0 0.29 2.1 5.46
WdSylCnt 1.53 2.45 1.7 0 1.52 1.55 1.36
FrqRhythmId 1.47 2.67 1.7 0 1.03 1.59 0.97
AvgDepsSen_punct 1.36 1.82 1.57 0 0.72 1.83 2.53
AvgAOEDoc_InverseLinearRegressionSlope 1.32 1.31 0.73 4.26 0.24 0.89 0.39
RdbltyDaleChall 1.25 1.81 1.27 0 1.04 1.02 3.51
AvgAOADoc_Shock 1.2 2.2 1.24 0 0.69 1.8 0
LangRhythmCoeff 1.06 1.61 1.41 0 1.03 1.24 0.19
LexChainAvgSpan 1.05 1.94 1.36 0 0.16 1.66 0
SenAsson 1.05 1.63 0.58 3.07 0.02 0.56 0
AvgVoice 1 2.62 0.58 0 0 1.36 0.39
AvgNounSen 0.97 1.09 1.59 0 0.47 1.06 2.14
WdDiffLemmaStem 0.94 1.65 1.01 0 0.36 1.32 0.78
TActCorefChainWd 0.94 0.93 0.74 0 1.05 0.71 4.09
AvgAOADoc_Cortese 0.93 1.16 0.9 0 0.56 1.89 0.39
AvgAOASen_Bristol 0.92 0.35 1.28 2.08 1.15 0.34 0.39
SenStdDevWd 0.92 1.6 0.97 0 0.09 1.78 0
AvgDepsSen_xcomp 0.83 0.41 1.61 0 1.42 0.99 0
AvgAdjectiveSen 0.83 1.24 1 0 0.18 1.21 1.36
AvgDepsSen_nmod 0.81 0.16 1.23 0 0.38 1.25 3.51
AvgAOADoc_Kuperman 0.8 0.7 1.18 0 0.65 1.44 0.39
AvgDepsSen_amod 0.79 1.11 0.91 0 0.1 1.33 1.36
AvgDepsSen_ccomp 0.78 1.06 1.41 0 0.27 1.18 0
AvgAOASen_Kuperman 0.78 0.78 0.83 0 0.41 0.99 2.73
AvgNmdEntSen 0.78 0.93 1 0 1.05 1.05 0
AvgAOESen_IndexPolynomialFitAboveThreshold.0.3. 0.76 0.58 1.12 0 0.4 0.84 2.73
AvgConnSen_simple_subordinators 0.74 0.25 0.86 0 1.12 1.52 0.39
AvgPronounSen 0.72 1.09 1.13 0 0.02 0.99 0.97
AvgAOASen_Shock 0.69 0.48 1.51 0 0.21 1.32 0
AvgConnSen_reason_and_purpose 0.68 0.16 1.31 0 0.82 1.29 0
AvgAOASen_Cortese 0.66 1.25 0.45 0 0.24 1.22 0
AvgAOESen_InverseLinearRegressionSlope 0.66 0.99 1.02 0 0.31 0.68 0.97
AvgAOEDoc_InflectionPointPolynomial 0.65 0.64 0.36 0 0.37 0.61 3.7
AvgConnSen_addition 0.65 0.88 0.88 0 0.12 1.21 0.39
AvgConnSen_order 0.64 0.44 0.48 0 0.92 1.41 0
AvgInferenceDistChain 0.64 0.8 0.91 0 0.7 0.83 0
WdPolysemyCnt 0.62 0.27 0.93 0 0.37 1.61 0
AvgAOEDoc_IndexPolynomialFitAboveThreshold.0.3. 0.61 0.83 0.71 0 0.07 1.07 0.97
AvgRhythmUnits 0.61 0.3 1.24 0 0.32 1.3 0
AvgDepsSen_aux 0.57 0.03 1.26 0 0.38 1.32 0
SynSoph 0.57 0.59 0.85 0 0.08 1.02 0.97
AvgDepsSen_cop 0.55 0.87 0.48 0 0.05 1.25 0
AvgRhythmUnitStreesSyll 0.52 0.76 1.17 0 0.14 0.56 0
AvgDepsSen_advmod 0.48 0.33 0.6 0 0.2 1.26 0
AvgDepsSen_det 0.48 0.22 1.04 0 0.45 0.68 0.39
AggPronSen_third_person 0.47 0.86 0.8 0 0.08 0.58 0
AvgAOADoc_Bristol 0.45 0.36 0.71 0 0.12 1.02 0.19
AvgDepsSen_acl 0.44 1.28 0.29 0 0.16 0.36 0
AvgAOADoc_Bird 0.44 0.38 0.84 0 0.13 0.89 0
WdAvgDpthHypernymTree 0.43 0.79 0.71 0 0.06 0.54 0
RdbltyFlesch 0.43 0.42 1.35 0 0.03 0.44 0
AvgDepsSen_dep 0.42 0.68 0.6 0 0.02 0.75 0
AggPronSen_indefinite 0.41 0.34 0.51 0 0.14 1.05 0
AvgConnSen_semi_coordinators 0.39 0 1.01 0 0.13 0.92 0
AvgDepsSen_mwe 0.38 0.6 1.17 0 0.1 0.07 0
AvgDepsSen_advcl 0.38 0.06 0.44 0 0.01 1.4 0
AvgDepsSen_neg 0.37 0.51 0.97 0 0.4 0.05 0
WdPathCntHypernymTree 0.36 0.89 0.33 0 0.19 0.35 0
AvgAOESen_IndexAboveThreshold.0.3. 0.35 0.27 0 0 0.41 1.05 0
AvgAOASen_Bird 0.33 0.04 0.75 0 0.59 0.42 0
LxcSoph 0.31 0.02 0.61 0 0.16 0.18 1.95
AvgConnSen_oppositions 0.26 0.11 0.98 0 0.36 0 0
LangRhythmDiameter 0.24 0.3 0.84 0 0.12 0.03 0
AvgConnSen_temporal_connectors 0.17 0.23 0.58 0 0.09 0.01 0
LangRhythmId 0.09 0.22 0.23 0 0.02 0.01 0