Electroencephalography (EEG) - norahollenstein/cognitiveNLP-dataCollection GitHub Wiki

EEG Datasets for NLP

EEG is the physiological method of choice to record the electrical activity generated by the brain via electrodes placed on the scalp surface. It measures electrical activity generated by the synchronized activity of thousands of neurons. EEG provides excellent time resolution, allowing you to detect activity within cortical areas - even at sub-second timescales. This allows for word-level signal segmentation.

This list contains data sources in the following languages:


Dutch

RaCCooNS (Radboud Coregistration Corpus of Narrative Sentences)

Also has an eye-tracking data.
Stimulus: 200 Dutch sentences from the SONAR-500 Dutch corpus (book section)
Participants: 34
Provided features: raw EEG after merger with eye-tracking data, and the preprocessed EEG data both before and after ICA-based ocular artifact correction
Data: https://data.ru.nl/collections/ru/cls/eeg_et_sentence_reading_dsc_556?0
Reference: Frank & Aumeistere (2023)

English

Alice in Wonderland Corpus

Stimulus: Audio - listening to Alice in Wonderland (Chapter 1); 2129 words in 84 sentences
Subjects: 52
Data: https://deepblue.lib.umich.edu/data/concern/data_sets/bg257f92t?locale=en
Reference: Brennan & Hale (2019)

fMRI data has also been recorded on the same stimulus.

Zurich Cognitive Language Processing Corpus (ZuCo)

Stimulus: Text - natural sentences
Subjects: 12
Provided features: Frequency band features (alpha, beta, gamma, theta)
Data: https://osf.io/q3zws/
Reference: Hollenstein et al. (2018)

Simultaneous eye-tracking and EEG recordings.

Natural Speech Corpus

Stimulus: Speech - listening to audiobook
Data: https://datadryad.org/resource/doi:10.5061/dryad.070jc
Reference: Broderick et al. (2018)

N400 Corpus

Stimulus: Text - short congruent and incongruent sentences
Data: https://datadryad.org/resource/doi:10.5061/dryad.070jc
Reference: Broderick et al. (2018)

UCL Corpus

Stimulus: Word-for-word reading of 205 sentences.
Subjects: 24
Data: https://ars.els-cdn.com/content/image/1-s2.0-S0093934X15001182-mmc1.zip
Reference: Frank & Willems (2017)

German

Multimodal Duolingo Bio-Signal Dataset

Stimulus: German language lessons using the web-based Duolingo
Subjects: 22 participants (either native English speakers or fluent in English)
Data: https://figshare.com/s/688e387fbfdc000f4e90
Reference: Notaro et al. (2018)

This dataset also contains eye-tracking and mouse movements metrics.