ReaderBench Model 2a Variable Importance - shmercer/writeAlizer GitHub Wiki

Ensemble Weightings and Metric Importance

ReaderBench Model 2a

This model used ReaderBench scores from 7 min narrative writing samples ("I once had a magic pencil and ...") from 136 students in the fall of Grades 2-5 (Mercer et al., 2019) to predict holistic writing quality on the samples (elo ratings calculated from paired comparisons).

Highly correlated ReaderBench metrics (r > |.90|) were excluded during pre-processing (see section on Scoring Model Development for more details).

Mercer, S. H., Keller-Margulis, M. A., Faith, E. L., Reid, E. K., & Ochs, S. (2019). The potential for automated text evaluation to improve the technical adequacy of written expression curriculum-based measurement. Learning Disability Quarterly, 42, 117-128. https://doi.org/10.1177/0731948718803296

Algorithm Weightings in Ensemble

Abbreviations:

all = ensemble model
pls = partial least squares regression
rf = random forest regression
mars = bagged multivariate adaptive regression splines
svm = support vector machines
cube = cubist regression

The table below presents the linear weightings of each algorithm for the ensemble model.

Intercept	pls	rf	mars	svm	cube
-4.338	0.2371	0.1755	0.1780	0.2234	0.2532

Metric Importance in Each Algorithm and Ensemble

Each column sums to 100 (so values can be interpreted as % contribution to the model).

Metric	overall	pls	rf	mars	svm	cube
WdEnt	20.53	4.67	10.12	73.84	5.16	18.67
AvgDepsSen_dep	4.65	1.23	0.88	16.82	0.88	5.25
Content.words	4.59	4.32	4.77	0	4.68	7.87
Words	3.72	4.44	4.67	0	4.67	4.17
LxcDiv	3.1	4.08	3.29	0	4.06	3.4
AvgAOASen_Shock	2.77	1.45	1.16	9.34	1.39	1.7
TCorefChainDoc	2.62	2.98	0.81	0	2.06	5.86
AvgChainSpan	2.59	3.27	3.83	0	3.1	2.47
WdDiffWdStem	2.46	2.73	3.07	0	2.24	3.7
SynSoph	2.12	1.71	0.92	0	1.82	5.09
AvgDepsSen_punct	2.03	2.48	1.68	0	1.74	3.55
TActCorefChainWd	1.93	1.6	1.91	0	1.47	4.01
WdDiffLemmaStem	1.66	1.52	0.72	0	2.44	2.93
RdbltyFlesch	1.55	0.77	1.22	0	1.09	4.01
WdLettStdDev	1.44	2.35	1.51	0	2.13	0.93
AvgAOESen_InverseAverage	1.37	1.44	1.22	0	1.09	2.62
Sentences	1.3	2.84	1.77	0	1.82	0
AvgWdLen	1.27	2.65	1.57	0	2.02	0
LexChainMaxSp	1.26	2.89	1.19	0	2.02	0
AvgAOADoc_Shock	1.26	2.36	1.89	0	1.68	0.31
AvgAOADoc_Kuperman	1.25	0.72	1.01	0	1.3	2.78
WdSylCnt	1.15	1.57	1.83	0	1.51	0.77
CharEnt	1.14	2.65	0.96	0	1.85	0
LexChainAvgSpan	1.12	2.18	1.5	0	1.86	0
AvgDepsSen_advcl	1.07	0.93	0.85	0	1.35	1.85
AvgAOASen_Kuperman	1.04	1.23	1.48	0	1.46	0.93
AvgCorefChain	1	1.86	0.85	0	0.9	1.08
WdAvgDpthHypernymTree	1	1.14	0.87	0	0.97	1.7
SenStdDevWd	0.98	1.96	1.43	0	1.49	0
TCorefChainBigSpan	0.95	2.16	1.44	0	1.13	0
AvgAOADoc_Bristol	0.94	1.75	1.03	0	1.1	0.62
LxcSoph	0.92	1.64	1.2	0	0.85	0.77
AvgAdverbSen	0.88	0.89	1.38	0	1.46	0.62
RdbltyDaleChall	0.87	1.75	1.63	0	1	0
AvgSenAdjCoh_LDA	0.82	1.97	0.64	0	1.33	0
AvgRhythmUnits	0.82	1.12	1.13	0	1.15	0.62
FrqRhythmId	0.8	1.69	1.07	0	1.18	0
AvgAOADoc_Bird	0.78	0.95	0.3	0	1.43	0.93
AvgVoice	0.78	2.01	0.76	0	0.99	0
AvgAOADoc_Cortese	0.77	0.69	1.3	0	1.57	0.31
WdPathCntHypernymTree	0.71	1.45	0.84	0	1.17	0
AvgConnSen_simple_subordinators	0.7	0.51	2.49	0	0.82	0
AvgAOASen_Bristol	0.68	0.66	0.71	0	1.29	0.62
AvgRhythmUnitStreesSyll	0.63	0.08	0.91	0	0.81	1.23
AvgInferenceDistChain	0.62	1.39	0.34	0	1.2	0
AggPronSen_indefinite	0.62	0.45	0.63	0	1.31	0.62
AvgAOASen_Bird	0.6	1.13	0.37	0	1.37	0
AvgDepsSen_compound	0.6	0.72	0.5	0	0.48	1.08
WdPolysemyCnt	0.58	0	1.09	0	1.9	0
AvgDepsSen_ccomp	0.57	0.09	1.32	0	0.9	0.62
AvgAOASen_Cortese	0.55	1.15	0.3	0	1.17	0
AvgDepsSen_cop	0.54	0.24	0.58	0	0.97	0.77
AvgPronounSen	0.54	0.12	0.93	0	0.48	1.08
AvgNmdEntSen	0.52	0.24	1.12	0	1.33	0
AvgNounSen	0.52	0.24	0.15	0	0.18	1.7
AvgDepsSen_nmod	0.48	0.7	0.69	0	1	0
AvgDepsSen_aux	0.48	0.24	0.92	0	1.31	0
AvgConnSen_addition	0.48	1.1	0.6	0	0.66	0
AvgDepsSen_dobj	0.48	0.23	1.51	0	0.16	0.62
AvgAOEDoc_InverseLinearRegressionSlope	0.44	0.4	0.8	0	0.68	0.31
AvgDepsSen_mark	0.41	0.43	0.95	0	0.73	0
AvgConnSen_temporal_connectors	0.41	0.32	0.64	0	1.11	0
AvgDepsSen_det	0.4	0.18	0.4	0	0.72	0.62
AvgConnSen_semi_coordinators	0.38	0.8	0.15	0	0.16	0.62
AvgConnSen_order	0.36	0.31	1.74	0	0.03	0
AggPronSen_third_person	0.36	0.57	0.91	0	0.41	0
LangRhythmDiameter	0.35	0.57	0.79	0	0.08	0.31
SenAsson	0.35	0.8	0.83	0	0.16	0
AvgAOEDoc_IndexAboveThreshold.0.3.	0.33	0.03	0.43	0	0.87	0.31
AvgDepsSen_amod	0.29	0.33	0.98	0	0.27	0
AvgAdjectiveSen	0.28	0.1	1.28	0	0.21	0
AvgConnSen_oppositions	0.27	0.54	0.82	0	0.07	0
AvgDepsSen_xcomp	0.24	0.01	0.13	0	1.04	0
AvgAOEDoc_IndexPolynomialFitAboveThreshold.0.3.	0.21	0.12	0.1	0	0.78	0
LangRhythmId	0.19	0.47	0.45	0	0.05	0
AvgDepsSen_neg	0.18	0.03	1.05	0	0	0
AvgDepsSen_mwe	0.17	0.38	0.47	0	0.04	0
LangRhythmCoeff	0.16	0	0.22	0	0.61	0
AvgDepsSen_acl	0.06	0.25	0	0	0.02	0