Keystroke metrics - norahollenstein/cognitiveNLP-dataCollection GitHub Wiki

Keystroke datasets for NLP

Keystroke data are behavioral biometrics recorded during text generation and have been extensively used in psycholinguistic and writing research to gain insights into cognitive processing. Keystroke dynamics represent the user's typing patterns and correlate with eye movements.

This collection contains datasets in the following languages:

English

Observations from HALIE

A dataset of human users interacting with LMs to solve information-seeking tasks

Stimulus: Answering multiple choice questions and solving crossword puzzles
Participants: 189 crowd workers for social dialogue, 304 for crossword solving, and 342 for question answering
Data: https://github.com/stanford-crfm/halie
Reference: Lee et al. 2023

CoAuthor Dataset

A human-AI collaborative dataset that captures interactions writers and GPT-3 language model instances.

Stimulus: keystrokes from writing creative and argumentative texts based on prompts from GPT-3 language models.
Participants: 63
Data: https://coauthor.stanford.edu/
Reference: Lee et al. 2022

Scrolling Interactions to Predict Readability

Stimulus: advanced and elementary texts from the OneStopEnglish corpus
Participants: 598 participants (native speakers and English L2 speakers)
Data: https://github.com/siangooding/readability_scroll
Reference: Gooding et al. 2021

University of Buffalo Keystroke Dynamics

Stimulus: keystrokes and mouse coordination based on transcription as well as free text typing
Participants: 157
Data: https://www.buffalo.edu/cubs/research/datasets.html#title_429116244
Reference: Sun et al. 2016

Clarkson University Keystroke Dataset

Stimulus: password input, free text questions and transcription
Participants: 39
Data: https://citer.clarkson.edu/research-resources/biometric-dataset-collections-2/clarkson-university-keystroke-dataset/
Reference: Vural et al. 2014

Stewart Keystroke and Stylometry Dataset

Stimulus: Contains free-text input of 966 words per subject (on average)
Participants: 40
Data: https://bitbucket.org/biometrics/dataset-stewart-keystroke/wiki/browse/
Reference: Stewart et al. 2011

Romanian

Politehnica University Timisoara Keystroke Dataset

Stimulus: free text
Participants: 80 participants
Data: https://sites.google.com/view/cataliniapa/timisoara-kd-data-set
Reference: Iapa & Cretu (2021)