Mega‐Grant Separating Eyes‐Open and Eyes‐Closed EEG (EXCEL) - LeoLedesma237/LeoWebsite GitHub Wiki

Overview

The original BrainVision Recordings for resting-state EEG contained both eyes-open and eyes-closed recordings. These data need to be separated from each other. The goal of this script is to do just that.

Part 1: Initial Recording Problems

How many files are missing one of the three files needed to be opened in EEGLAB?
How many files need to be binded?

These files will not go through the automated preprocessing pipeline. To identify how many files have three types of recordings, we can use the following script to match filenames that contain .eeg, .vmrk, and .vhdr files. We then create a variable that indicates whether completion is present or not. We can then use the case_when function to identify any recordings that will need binding. The recordings that have all 3 files and do not need binding will move on to the next section.

We can now use the table function to make sense of this data.

We have 629 rsEEG files that contain both eyes-open and eyes-closed conditions. This shows that we have 605 files can be divided into eyes-closed and open conditions without issues. Four recording files will need to be merged respectively before they can be divided by conditions. Lastly, there are 20 recordings that are missing at least one file- thus they cannot be opened and their data is lost.

Status	n
Needs merging	4
Missing at least 1 file	20
Contains all 3 files	605
All Recordings	629

Combine all file types into one data frame

# Load in packates
library(tidyverse)
library(openxlsx)

# Set working directory
setwd("Y:/STUDY 1/All EEG Files Organized/RAW")

# Load in all .eeg, .vmrk, .vhdr files 
all.eeg <- list.files(pattern = ".eeg")
all.vmrk <- list.files(pattern = ".vmrk")
all.vhdr <- list.files(pattern = ".vhdr")

# Tranforms each into a dataframe
all.eeg.df <- tibble(all.eeg) %>% mutate(file.name = gsub("\\..*", "", all.eeg ))
all.vmrk.df <- tibble(all.vmrk) %>% mutate(file.name = gsub("\\..*", "", all.vmrk ))
all.vhdr.df <- tibble(all.vhdr) %>% mutate(file.name = gsub("\\..*", "", all.vhdr ))
 
# Combine them all
all.files <- all.eeg.df %>%
  full_join(all.vmrk.df, by = "file.name") %>%
  full_join(all.vhdr.df, by = "file.name") %>%
  select(file.name, everything())

Identify files that need binding

# Create a variable indicating all files are present
all.files <- mutate(all.files, complete.files = ifelse(rowSums(is.na(all.files)) > 0, "Missing.Files", "Completed.Files"))

# Need binding?
all.files <- all.files %>%
  mutate(need.binding = case_when(
    grepl(pattern = "W2", file.name) ~ "Need,Binding",
    grepl(pattern = "_2", file.name) ~ "Need,Binding",
    grepl(pattern = "\\(2)", file.name) ~ "Need,Binding",
    TRUE ~ "No.Binding.Needed"
  ))

Filter data that does not need binding and has all three file types

# Create a dataset of complete recordings that do not need binding
all.files <- mutate(all.files, marker.check = case_when(
                    complete.files == "Completed.Files" & need.binding == "No.Binding.Needed" ~ "Yes",
                    TRUE ~ "No"
))


complete.no.binding <- filter(all.files, marker.check == "Yes")

Part 2: Marker Problems

This next step applies to the 605 recordings that have all 3 files.

Number of markers

Using the pop_squeezevents(EEG) function from ERPLAB, we see that there are three stimulus markers. From visual inspection it seems that 'S 1' represents when the eyes-closed condition begins. 'S 2' is when the participant is instructed to open their eyes (beginning the eyes open condition) and 'S 3' is when this condition is over.

Visual Inspection of the Markers

Below is the visualization of a resting-state EEG recording. Notice that during the 'S 1' marker there is an increase in power in occipital electrodes, which likely represent posterior dominant alpha rhythms. Additionally, this power in these regions decreases when 'S 2' begins. Both are evidence for 'S 1' representing eyes-closed and 'S 2' representing eyes-open.

Eyes Closed	Eyes Opened

Problematic Files

There are a good number of files that do not have these markers in the data. Thus making the data difficult to separate into either condition. Below is an example of one file when using the pop_squeezevents(EEG) function. It is missing 'S 1', thus we do not know for certain when the eyes-closed condition begins.

A more comprehensive view is available below. Here we see three 'RAW' files that contain more than the marker numbers we would expect. On closer inspection, files 82511_RAW, 26187_RAW, and 94766_RAW have excessive markers that might indicate another task, thus these files will be archived.

Frequency of Markers	82511_RAW	94766_RAW

We also have recordings with the opposite problem. There are 50 recordings that contain less than 3 markers, which may mean they are problematic. In theory, we only care about those who are missing 'S 1' or 'S 2'. Thus, after removing the files with excessive markers from above, we can now transform the data into wide format to see which files are missing which markers. The file 47255_RAW has two 'S 1' markers in the data giving us issues, they will be removed from the analyses below.

We also have several files (n=47) that are missing at least one of the three stimulus markers.

Load the .vmrk files

# Load in the .vmrk files of completed datasets
vmrk.files <- complete.no.binding$all.vmrk

# Load them each one by one
marker.list <- list()

for(ii in 1:length(vmrk.files)) {

  data <- read.csv(vmrk.files[ii](/LeoLedesma237/LeoWebsite/wiki/ii), skip = 11)
  names(data) <- c("rem", "marker", "latency", "rem2", "rem3")
  
  if(nrow(data) == 0) {
    empty.data <- data.frame(rem = NA,
                             marker = NA,
                             latency = NA,
                             rem2 = NA,
                             rem3 = NA)
    
    data= rbind(data, empty.data)
  }
  data$ID <- sub("\\..*", "", vmrk.files[ii](/LeoLedesma237/LeoWebsite/wiki/ii))
  marker.list[ii](/LeoLedesma237/LeoWebsite/wiki/ii) <- select(data, ID, marker, latency)

}

# Combine all into one
marker.df <- tibble(do.call(rbind, marker.list))

Data Cleaning

# Remove the 'R 1' marker
marker.df <- filter(marker.df, marker != "R  1")

# Convert to wide format
marker.df2 <- select(marker.df, -latency)

# What markers are avaiable in the data
table(marker.df2$marker)

# Remove rows with odd makers or that have blanks for markers
marker.df2 <- marker.df2 %>%
  filter(marker != "" & marker != "S 16")

# Remove problematic file name for having two S 1 in the data
marker.df2 <- filter(marker.df2, ID != "47255_RAW")

# Create a row number variable
marker.df2$value <- 1

Identifying files with all markers present


# Transform the data into wide format 
marker.df2.wide <- 
  pivot_wider(marker.df2, names_from = marker, values_from = value)

# Change the names of the dataset
names(marker.df2.wide) <- c("file.name", "S1", "S2", "S3")

# How many are missing markers?
marker.df2.wide <- mutate(marker.df2.wide, complete.markers = ifelse(rowSums(is.na(marker.df2.wide)) > 0, "No", "Completed.Markers"))

Part 3: Creating a Comprehensive Dataset

Only the files that meet the following criteria will be preprocessed through an automated process.

Have all 3 files present (.eeg, .vhdr, .vmrk)
Do not need to be binded to another recording
Have all 3 required stimulus makers ('S 1', 'S 2', 'S 3')

However, we can also identify files that have at least 'S 2' but a different script will be required for them to divide them into open or closed conditions. We will do this by not relying on 'S 1' or 'S 3' to tell us when a condition begins or ends. Instead, we will use 3 minutes before and after!

Final Status	n
Needs merging	4
Missing at least 1 file	20
Missing 'S 2'	26
Can Interpolate Markers	46
Contains Markers	533
All Recordings	629

By using this option, we see that we can keep 46 EEG files that have 'S 2'. Thus the max rs-EEG files we could potentially have is 533 + 46 = 579.

# Bind this dataset to the other one that did not make the cut for a comprehensive dataset
# Create a status variable, which is the take away descriptor for the row
all.files2 <- all.files %>%
  full_join(marker.df2.wide, by = "file.name")

# Data cleaining with the complete.markers variable
all.files2 <- mutate(all.files2, complete.markers = ifelse(is.na(complete.markers), "-", complete.markers))

# Interpolate marker variable
all.files2 <- all.files2 %>%
  mutate(Status = case_when(
         need.binding == "Need.Binding" ~ "Need.Binding",
         complete.files == "Missing.Files" ~ "Missing.Files",
         is.na(S2) ~ "Missing 'S 2'",
         complete.markers == "Completed.Markers" ~ "Completed.Markers",
         complete.markers == "No" & S2 == 1 ~ "Interpolate.Markers",
         complete.markers == "No" & is.na(S2) ~ "Cant.Interpolate.Markers",
        TRUE ~ "No"
  ))

unique(all.files2$need.binding)

# Print a table to understand this better
data.frame(table(all.files2$Status))

Saving the comprehensive excel file

# Save this information
setwd("Y:/STUDY 1/All EEG Files Organized/Preprocessed_RAW")

# Save this data for preprocessing
all.files2.completed.markers <- filter(all.files2, Status == "Completed.Markers")
all.files2.interpolated.markers <- filter(all.files2, Status == "Interpolate.Markers")

write.xlsx(list(Complete.Markers = all.files2.completed.markers,
                Has.S2.Marker = all.files2.interpolated.markers),
           file = "EEG Raw File Names.xlsx")