Mega‐Grant ARFA Spelling Scoring - LeoLedesma237/LeoWebsite GitHub Wiki

Overview

The function of this script is to score information about the ARFA spelling portion. There are four parts to this script.

  • Part 1: Loading in the ARFA Spelling raw data
  • Part 2: Scoring the data
  • Part 3: Calculating Z-score performance
  • Part 4: Saving the scored data and poor performers

There is also a Quality Control section where visual data inspection confirms the code from these chunks are functioning as intended.

ARFA Spelling

The spelling portion of the ARFA consists of 22 items. The first 12 are words and the remaining 10 are sentences. Each item is presented verbally by an audio recording and may be repeated if requested by the participant. Each item has 7 columns attributed which represent the following:

  • Spelling_#_Mistake_Number: Number of spelling mistakes in single word (for items 1-12) or sum of spelling mistakes in each word in - sentence (for items 13-22)
  • Spelling_#_transposition: Number of word transpositions.
  • Spelling_#_word_omission: Spelling of omitted word.
  • Spelling_#_missed_letter_N: Number of letters in omitted word(s). If several words were omitted - sum of letters from all omitted words.
  • Spelling_#_word_adding: Spelling of added word.
  • Spelling_#_word_omission_count: Number of omitted words.
  • Spelling_#_word_adding_count: Number of added words.
  • Spelling_1_Rep: yes\no. Shows if stimuli word or sentence was repeated once during assessment.

For the purposes of our analysis, we are only interested in Column 1 which represents the number of spelling mistakes for a single word or the sum of spelling mistakes for several words.

When placed into chatGPT the items roughly translate to English as:

Part 1: Loading in the data

This dataset contains the spelling of 22 words/sentences in Russian from subjects. Each item was scored based on 7 different types of errors that could have occurred (I am not familiar with them). Additionally each item has a variable indicate whether the tester had to repeat themselves or not.

library(readxl)
library(tidyverse)

# Set the working directory
setwd("~/MegaGrant/ARFA Spelling Mistakes")

# load in the ARFA data that is unscored
ARFA.spelling <- read_excel("ARFA_Spelling_mistakes.xlsx")

Part 2: Scoring Spelling Errors

We are only interested in the mistake number column, which represents the total number of spelling errors. This column is named in the dataset as the item number followed by .1 (ex: SP1.1). The spelling error max value for each item is 2, thus if someone made 6 spelling errors in an item, then they would be scored with 2 errors for that item. If 1 error was made, then that item will be scored with 1 error and if no spelling mistakes were present then that item will be scored with a zero. After all items are scored, the sum will be calculated to obtain the total number of spelling errors for each participant. Lastly, we will need to reintroduce this information into the original dataset for quality control purposes.

# Items Spelling Error list
Items.Spelling.Error <- list()

# Extract the variables that will be used for scoring
Items <- paste("SP",1:22,".1",sep = "")


for(ii in 1:22) {

  # Extract all scoring variables for that item
  current.spelling.errors.df <- ARFA.spelling %>%
    select(starts_with(Items[ii](/LeoLedesma237/LeoWebsite/wiki/ii)))
  
  # Convert everything in this dataset into a numeric value
  current.spelling.errors.num.df <- data.frame(sapply(current.spelling.errors.df, as.numeric))
  
  # Convert any NA's into 1's
  current.spelling.errors.num.df[is.na(current.spelling.errors.num.df)] <- 1
  
  # Rename the variable
  names(current.spelling.errors.num.df) <- c("Spelling_Error")
  
  # Get the Spelling Error for the item
  current.spelling.errors.num.df <- current.spelling.errors.num.df %>%
    mutate(Spelling_Error = case_when(
      Spelling_Error == 0 ~ 0,
      Spelling_Error == 1 ~ 1,
      Spelling_Error >= 2 ~ 2
    ))
  
  
  # Save this into a list
  Items.Spelling.Error[ii](/LeoLedesma237/LeoWebsite/wiki/ii) <- current.spelling.errors.num.df$Spelling_Error

}

# Bind these scores into one dataset
Items.Spelling.Error.Binded <- data.frame(do.call(cbind, Items.Spelling.Error))

# Rename them
names(Items.Spelling.Error.Binded) <- paste("Item",1:22,"_SpellingError",sep="")

# Calculate the Row.Sum for a composite score
Items.Spelling.Error.Binded$Total_SpellingError <- rowSums(Items.Spelling.Error.Binded)


# Reintroduce this information into the original dataset
ARFA.spelling.scored <- cbind(ARFA.spelling,
                              Items.Spelling.Error.Binded) %>%
  tibble()

Part 3 Calculating Spelling Error Z-scores

The scale function was used to calculate Z-scores for total spelling error. This can also be done manually by subtracting each score from the mean of scores and then dividing it by the standard deviation of those scores. The outcome are values whose mean is equal to 0 and standard deviation equal to 1. Scores that are two or more standard deviations away from the mean could be considered problematic.

A concern was poor performers were not taking the task seriously. Thus the dataset with individuals that scored less than 2 SD away from the mean were sent to colleagues for inspection. They decided the spelling errors were legitimate.

# Introduce Z scores into the dataset
ARFA.spelling.scored$Total_SpellingError_Z <- c(scale(ARFA.spelling.scored$Total_SpellingError))

# Save those whose errors was more than 2SD away from the mean
ARFA.spelling.scored.2SD <- ARFA.spelling.scored %>%
  filter(Total_SpellingError_Z >= 2)

Part 4: Saving the data

# Set the working directory to save our data
setwd("~/Masters Project/cleaned_predictor_covariates")

# Save the version of Spelling Performance
write_csv(ARFA.spelling.scored, file = "ARFA.Spelling.Errors.Scored.csv")

# Save the tweaked version of Spelling performance
write.xlsx(ARFA.spelling.scored.2SD, file = "ARFA.Spelling.Errors.Scored.2SD.xlsx")

Quality Control

Copy these lines of code into the Console command to check for potential errors. They should all return 0's if the script was run successfully.

Checking for duplicate IDs

sum(duplicated(ARFA.spelling.scored$ID))

Check to see if output IDs matched initial dataset

setdiff(ARFA.spelling.scored$ID, ARFA.spelling$ID)

Check to see if there are NA's in Spelling Error

sum(is.na(ARFA.spelling.scored$Total_SpellingError))