Mega‐Grant deriving the final sample size for the ARFA spelling and rsEEG analysis


When writing the methods section, you need to start by reporting the number of participants recruited into a study (the mean age and number of men/women), and then describe your exclusion criteria before the analysis. This exclusion criteria can be poor performers on tasks, subjects with missing data, equipment malfunction, bad EEG recordings, etc. This wiki-page will function to do that for the 'ARFA spelling and rsEEG analysis'.

The following document is broken down into these sections (this is for IDs):

  • Initial Sample Size
  • Head self-report injury exclusion (also Epilepsy)
  • Removing Missing Data
  • Had an 'IQ' below 70
  • Extremely poor spellers
  • Issues with the EEG recordings
  • Did not survive EEG cleaning
  • Descriptives of final sample size for analysis
  • Saving the final dataset

Initial Sample Size

We will be using participants from the 'S 1' registry since they have both behavioral and neurophysiological data. Additionally we want to report the mean age and the number of females/or males in this sample.

# Load in packages

# Set working directory
setwd("Y:/STUDY 1")

# Load in the master data file
Master.dataset <- read_excel("MegaGrant_TBL_Database_Newest_MCh with duration.xlsx")

# Load in all of the files from S1
S1 <- Master.dataset %>%
  filter(`S1 reg-list` == "+")


S1[$Age),] %>% view()

table(S1$Sex, useNA = "ifany")

Head self-report injury exclusion

A medical questionnaire was asked that included questions about head injury. Those that marked yes will be excluded from the analysis. Removes 37 participants from analysis.

# Load in the medical history questionnaire
setwd("Y:/STUDY 1/All Hand and Med Organized/Usable Med Excels")
Medical <- read_excel("TBL_WHOQOL_BRIEF_Medical_S1_S3_DM_06.xlsx",  sheet = "med s1")

# Select the variables of interest
Medical2 <- select(Medical, ID, HeadTrauma, Health2epilepsy, Health2autism)

# Create an exclusion variable
Medical2 <- Medical2 %>%
  mutate(Med.Excluded = case_when(
    HeadTrauma == "Y" | Health2epilepsy == "Y" | Health2autism == "Y" ~ "Y",
    TRUE ~ "N"

Medical.Excluded.IDs <- Medical2$ID[Medical2$Med.Excluded == "Y"]

# Remove the medical excluded IDs
S1 <- S1 %>%
  filter(!(ID %in% Medical.Excluded.IDs))

Removes Missing Data

Participants who are missing data need to be removed from the analysis. This will differ based on what analysis is being performed. For our interest, we care for participants that have data for demographics (age and sex), resting-state EEG, and behavioral data. We will figure this out through the Master excel sheet we have. 51 IDs should have been removed.

  • Age
  • Sex
  • ARFA: Spelling subtests
  • CFIT
  • rsEEG
# Load in the master excel sheet that keeps track of available data
Master.Excel2 <- select(Master.Excel, ID, Age, Sex, ARFA, CFIT, RAW)

# Create a variable for missing data
Master.Excel2$Missing.Data <- rowSums(

Missing.Data.IDs <- Master.Excel2$ID[Master.Excel2$Missing.Data != 0]

# Remove the IDs missing data
S1 <- S1 %>%
  filter(!(ID %in% Missing.Data.IDs))

Had an 'IQ' below 70

We start by removing subjects that have partial data for the CFIT since a raw score cannot be calculated (N=9 -should learn to interpolate this in the future). We cannot derive an exact IQ score so instead we will remove those who performed 2 or more SD below the mean of raw scores from an IQ test (N=17).

# Very low IQ
setwd("Y:/STUDY 1/All Behavioral Data Organized/Usable CFIT Excels")

CFIT <- read_excel("CFIT.xlsx")

# Select the variables of interest
CFIT2 <- select(CFIT, ID, Sub1Sum:Sub4Sum)

# Drop rows that have NA for any of the columns
CFIT2 <- drop_na(CFIT2)

# Calculate a raw score of correct responses
CFIT2<- mutate(CFIT2, Raw.Scores = Sub1Sum + Sub2Sum + Sub3Sum + Sub4Sum)

# Create a variable to Exclude those who 2 or more SD below the mean
CFIT2$mean.Raw.Scores <- mean(CFIT2$Raw.Scores)
CFIT2$SD.Raw.Scores <- sd(CFIT2$Raw.Scores)

CFIT2 <- mutate(CFIT2, Excluded = ifelse(Raw.Scores < mean.Raw.Scores - 2* SD.Raw.Scores, "Y", "N"))

Low.IQ.IDs <- CFIT2$ID[CFIT2$Excluded == "Y"]

# Remove the IDs with low IQs
S1 <- S1 %>%
  filter(!(ID %in% Low.IQ.IDs))

Extremely poor spellers

We had zero extremely poor performers (more than 3 SD away from the mean). Those that were 2 SD away from the mean or more were visually examined by Russian speakers and determined to be accurate- aka they were taking the task seriously.

# Extremely poor spellers
setwd("~/Masters Project/cleaned_predictor_covariates")

Spelling.Errors <- read.csv("ARFA.Spelling.Errors.Scored.csv")

sum(Spelling.Errors$Total_SpellingError_Z < -3)

Issues with the EEG recordings

rsEEG data was recorded by presenting some type of 'task' that instructed when participants should open/close their eyes. This information was then sent to the recording as trigger codes to identify when a conditions started or ended Mega‐Grant Separating Eyes‐Open and Eyes‐Closed EEG (EXCEL). The problem occurs that some recordings were started late on BrainVision recorder- thus the trigger codes are missing, making the data not usable. These IDs will be identified here (N= 78).

# Had Issues with Marker information (aka do not have both rsEEG conditions)
setwd("Y:/STUDY 1/All EEG Files Organized/Preprocessed_RAW")

Both.Conditions <- read_excel("EEG Raw File Names.xlsx")

# Create an ID variable
Both.Conditions$ID <- as.numeric(gsub("\\D", "", Both.Conditions$

# Keep only the IDs that have both trigger codes
S1 <- S1 %>%
  filter(ID %in% Both.Conditions$ID)

Did not survive EEG cleaning

This applies for both the eyes-open and eyes-closed resting-state EEG. So far every recording had at least 80% of the data remaining after segmentation rejection. However, this will also include IDs that did not survive the any of the cleaning preprocessing steps (N=71).

# Did not survive EEG cleaning (Both Eyes Open and Eyes Closed)
setwd("Y:/STUDY 1/All EEG Files Organized/Preprocessed_RAW")

eyes.closed <- read_excel("EEG Raw File Names3 (ready for FFT).xlsx", sheet = "Cleaned.Closed") <- read_excel("EEG Raw File Names3 (ready for FFT).xlsx", sheet = "Cleaned.Open")

# Create and ID variable
eyes.closed$ID <- as.numeric(gsub("\\D", "", eyes.closed$FileName))$ID <- as.numeric(gsub("\\D", "",$FileName))

# Keep only IDs that have at least 80% of their data remaining after cleaning
eyes.closed2 <- filter(eyes.closed, PercentRemaining >= 80)
eyes.open2 <- filter(, PercentRemaining >= 80)

# Keep only variables that are shared between both and eyes.closed conditions
eyes.closed2.df <- data.frame(ID = eyes.closed2$ID,
                              eyes.closed = "+") %>% tibble()

eyes.open2.df <- data.frame(ID = eyes.open2$ID,
                   = "+") %>% tibble()

# Shared IDs by both conditions
Both.Coditions.Survived <- eyes.closed2.df %>%
  left_join(eyes.open2.df, by = "ID")

# Keep these IDs that have both recordings
S1 <- S1 %>%
  filter(ID %in% Both.Coditions.Survived$ID)

Descriptives of Final Sample Size for Analysis

# Get the descriptives of the new final sample size


table(S1$Sex, useNA = "ifany")

Saving the Final Dataset

# Set working directory to save the dataset
setwd("C:/Users/lledesma.TIMES/Documents/Masters Project")

# Select Variables of Interest <- select(S1, ID, `S1 reg-list`, Age, Sex, Group, ARFA, CFIT, RAW, Comments, Decision)

# Save S1
write.xlsx(list(N =, file = "Final.Sample.Size.xlsx")
