ReaderBench Model 2a Variable Importance - shmercer/writeAlizer GitHub Wiki

Ensemble Weightings and Metric Importance

ReaderBench Model 2a

This model used ReaderBench scores from 7 min narrative writing samples ("I once had a magic pencil and ...") from 136 students in the fall of Grades 2-5 (Mercer et al., 2019) to predict holistic writing quality on the samples (elo ratings calculated from paired comparisons).

Highly correlated ReaderBench metrics (r > |.90|) were excluded during pre-processing (see section on Scoring Model Development for more details).

Mercer, S. H., Keller-Margulis, M. A., Faith, E. L., Reid, E. K., & Ochs, S. (2019). The potential for automated text evaluation to improve the technical adequacy of written expression curriculum-based measurement. Learning Disability Quarterly, 42, 117-128. https://doi.org/10.1177/0731948718803296

Algorithm Weightings in Ensemble

Abbreviations:

  • all = ensemble model
  • pls = partial least squares regression
  • rf = random forest regression
  • mars = bagged multivariate adaptive regression splines
  • svm = support vector machines
  • cube = cubist regression

The table below presents the linear weightings of each algorithm for the ensemble model.

Intercept pls rf mars svm cube
-4.338 0.2371 0.1755 0.1780 0.2234 0.2532

Metric Importance in Each Algorithm and Ensemble

Each column sums to 100 (so values can be interpreted as % contribution to the model).

Metric overall pls rf mars svm cube
WdEnt 20.53 4.67 10.12 73.84 5.16 18.67
AvgDepsSen_dep 4.65 1.23 0.88 16.82 0.88 5.25
Content.words 4.59 4.32 4.77 0 4.68 7.87
Words 3.72 4.44 4.67 0 4.67 4.17
LxcDiv 3.1 4.08 3.29 0 4.06 3.4
AvgAOASen_Shock 2.77 1.45 1.16 9.34 1.39 1.7
TCorefChainDoc 2.62 2.98 0.81 0 2.06 5.86
AvgChainSpan 2.59 3.27 3.83 0 3.1 2.47
WdDiffWdStem 2.46 2.73 3.07 0 2.24 3.7
SynSoph 2.12 1.71 0.92 0 1.82 5.09
AvgDepsSen_punct 2.03 2.48 1.68 0 1.74 3.55
TActCorefChainWd 1.93 1.6 1.91 0 1.47 4.01
WdDiffLemmaStem 1.66 1.52 0.72 0 2.44 2.93
RdbltyFlesch 1.55 0.77 1.22 0 1.09 4.01
WdLettStdDev 1.44 2.35 1.51 0 2.13 0.93
AvgAOESen_InverseAverage 1.37 1.44 1.22 0 1.09 2.62
Sentences 1.3 2.84 1.77 0 1.82 0
AvgWdLen 1.27 2.65 1.57 0 2.02 0
LexChainMaxSp 1.26 2.89 1.19 0 2.02 0
AvgAOADoc_Shock 1.26 2.36 1.89 0 1.68 0.31
AvgAOADoc_Kuperman 1.25 0.72 1.01 0 1.3 2.78
WdSylCnt 1.15 1.57 1.83 0 1.51 0.77
CharEnt 1.14 2.65 0.96 0 1.85 0
LexChainAvgSpan 1.12 2.18 1.5 0 1.86 0
AvgDepsSen_advcl 1.07 0.93 0.85 0 1.35 1.85
AvgAOASen_Kuperman 1.04 1.23 1.48 0 1.46 0.93
AvgCorefChain 1 1.86 0.85 0 0.9 1.08
WdAvgDpthHypernymTree 1 1.14 0.87 0 0.97 1.7
SenStdDevWd 0.98 1.96 1.43 0 1.49 0
TCorefChainBigSpan 0.95 2.16 1.44 0 1.13 0
AvgAOADoc_Bristol 0.94 1.75 1.03 0 1.1 0.62
LxcSoph 0.92 1.64 1.2 0 0.85 0.77
AvgAdverbSen 0.88 0.89 1.38 0 1.46 0.62
RdbltyDaleChall 0.87 1.75 1.63 0 1 0
AvgSenAdjCoh_LDA 0.82 1.97 0.64 0 1.33 0
AvgRhythmUnits 0.82 1.12 1.13 0 1.15 0.62
FrqRhythmId 0.8 1.69 1.07 0 1.18 0
AvgAOADoc_Bird 0.78 0.95 0.3 0 1.43 0.93
AvgVoice 0.78 2.01 0.76 0 0.99 0
AvgAOADoc_Cortese 0.77 0.69 1.3 0 1.57 0.31
WdPathCntHypernymTree 0.71 1.45 0.84 0 1.17 0
AvgConnSen_simple_subordinators 0.7 0.51 2.49 0 0.82 0
AvgAOASen_Bristol 0.68 0.66 0.71 0 1.29 0.62
AvgRhythmUnitStreesSyll 0.63 0.08 0.91 0 0.81 1.23
AvgInferenceDistChain 0.62 1.39 0.34 0 1.2 0
AggPronSen_indefinite 0.62 0.45 0.63 0 1.31 0.62
AvgAOASen_Bird 0.6 1.13 0.37 0 1.37 0
AvgDepsSen_compound 0.6 0.72 0.5 0 0.48 1.08
WdPolysemyCnt 0.58 0 1.09 0 1.9 0
AvgDepsSen_ccomp 0.57 0.09 1.32 0 0.9 0.62
AvgAOASen_Cortese 0.55 1.15 0.3 0 1.17 0
AvgDepsSen_cop 0.54 0.24 0.58 0 0.97 0.77
AvgPronounSen 0.54 0.12 0.93 0 0.48 1.08
AvgNmdEntSen 0.52 0.24 1.12 0 1.33 0
AvgNounSen 0.52 0.24 0.15 0 0.18 1.7
AvgDepsSen_nmod 0.48 0.7 0.69 0 1 0
AvgDepsSen_aux 0.48 0.24 0.92 0 1.31 0
AvgConnSen_addition 0.48 1.1 0.6 0 0.66 0
AvgDepsSen_dobj 0.48 0.23 1.51 0 0.16 0.62
AvgAOEDoc_InverseLinearRegressionSlope 0.44 0.4 0.8 0 0.68 0.31
AvgDepsSen_mark 0.41 0.43 0.95 0 0.73 0
AvgConnSen_temporal_connectors 0.41 0.32 0.64 0 1.11 0
AvgDepsSen_det 0.4 0.18 0.4 0 0.72 0.62
AvgConnSen_semi_coordinators 0.38 0.8 0.15 0 0.16 0.62
AvgConnSen_order 0.36 0.31 1.74 0 0.03 0
AggPronSen_third_person 0.36 0.57 0.91 0 0.41 0
LangRhythmDiameter 0.35 0.57 0.79 0 0.08 0.31
SenAsson 0.35 0.8 0.83 0 0.16 0
AvgAOEDoc_IndexAboveThreshold.0.3. 0.33 0.03 0.43 0 0.87 0.31
AvgDepsSen_amod 0.29 0.33 0.98 0 0.27 0
AvgAdjectiveSen 0.28 0.1 1.28 0 0.21 0
AvgConnSen_oppositions 0.27 0.54 0.82 0 0.07 0
AvgDepsSen_xcomp 0.24 0.01 0.13 0 1.04 0
AvgAOEDoc_IndexPolynomialFitAboveThreshold.0.3. 0.21 0.12 0.1 0 0.78 0
LangRhythmId 0.19 0.47 0.45 0 0.05 0
AvgDepsSen_neg 0.18 0.03 1.05 0 0 0
AvgDepsSen_mwe 0.17 0.38 0.47 0 0.04 0
LangRhythmCoeff 0.16 0 0.22 0 0.61 0
AvgDepsSen_acl 0.06 0.25 0 0 0.02 0