Self paced reading - theDebbister/cognitiveNLP-dataCollection GitHub Wiki

SPR datasets for NLP

Self-paced reading is a task where where participants read a passage word-by-word or phrase-by-phrase, pressing a button to get the next word or phrase displayed. The time taken to press the button gives an indication of the processing difficulty at each stage.

This list contains SPR datasets in the following languages:

English
Dutch

English

Thematic Fit Sentences

Stimulus: 120 sentences with local NP/S coordination ambiguity
Subjects: 96
Data: https://osf.io/npzc7/
Reference: Frank & Hoeks (2019)

Natural Stories Corpus

Stimulus: 10 stories (English texts edited to contain many low-frequency syntactic constructions)
Subjects: 19 (read all stories), 181 in total
Provided features: reading times
Data: https://github.com/languageMIT/naturalstories
Reference: Futrell et al. (2017)

UCL Corpus

Stimulus: 361 sentences
Subjects: 117
Data: https://link.springer.com/article/10.3758/s13428-012-0313-y#SupplementaryMaterial
Reference: Frank et al. (2013)

Also contains eye-tracking data.

Scalar Inferences

Stimulus: 48 target sentences and 144 filler sentences
Subjects: 28 native speakers
Data: https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0063943#s5
Reference: Politzer-Ahles & Fiorentino (2013)

Dutch

Sentences of Double‐Embedded Relative Clauses

Stimulus: 16 target sentences and 56 filler sentences
Subjects: 24
Data: https://github.com/vasishth/StanJAGSexamples/tree/master/FrankEtAlCogSci2015
Reference: Frank et al. (2015) - Experiment 1

This dataset also contains reading times of German and Dutch native speakers reading English sentences.