ONR GnG EEG Preprocessing Pipeline - LeoLedesma237/LeoWebsite GitHub Wiki

Last updated on February 28th, 2024.

Overview

The following scripts do the following to the GnG EEG data:

  1. does some standard general cleaning (not include segmentation rejection)
  2. runs ICA and uses MARA to reject artifact components
  3. renames marker names

Directories

In order to efficiently automated several lines of code to multiple EEG data files (as in one by one), it is best to first set the directory where data will be pulled from and the directory where the transformed EEG file will be saved. Below I have the directories mentioned. Additionally, I also included some objects that can quickly change the parameters of the cleaning code as seen fit.

I have written several scripts and each have a version of this code below. In the interest of writing a succinct report, the code below will be shown only once.

% Load in EEGLAB
cd("C:\Users\lledesma.TIMES\Documents\MATLAB\eeglab2022.0")
eeglab

% 1. Set the folder path to record the names of all the files there
folder = 'M:\EEG_XDFs_Copy_2\0_EEG_DATA_WITH_MARKERS'; % replace with the path to your folder

% 2. Set Line Noise Criterion
LineNoiseCriterion = 7;

% 3. Set down sampling rate (Higher the number the less length needed for
% godd ICA)
Sample_Rate = 500;

% 4. Set the folder path that you want the EEG data saved in
save_pathway = 'M:\EEG_XDFs_Copy_2\1_Preprocessed EEG data - needs ICA';

% 5. Set the folder path that you want the CSV reports to be saved
save_pathway_csv = 'M:\EEG_XDFs_Copy_2\CSVs_Interpolation number';

Beginning where last finished

When writing scripts, it is important to note that each time a script incorporates a for loop, it will rerun the manipulations on each EEG file from scratch, even if that certain file was already transformed. This is very inefficient. Thus, this code of block has two functions. The first is to check if the currently loaded EEG file has already been transformed by checking to see if it is presented in the saved EEG file after transformation directory. The next step is to check for an EEG files that have unidentified problems that cause the code to break. These EEG files are added in manually. Each EEG transformation script includes a version of the code shown below.

% % % % % % Part 1: Reading in all the files in specified folder % % % % %
files = dir(folder);

% Create a for loop that keeps only real files present from the folder
AllFileNames = {};
for i = 1:length(files)
    if files(i).isdir == 0 % check if the file is not a directory
        AllFileNames{end+1} = files(i).name;
    end
end

%Use the startsWith function to find the location of where the .eeg files
eegIndx = endsWith(AllFileNames, '.set');

%Use the location of the .eeg files to extract them and save only them
eegFiles = AllFileNames(eegIndx);


% Remove redundancies
% Only unprocessed files will be ran by the for loop below
Already_Processed_Files = dir(save_pathway);

% Create a for loop that keeps only real files present from the folder
All_Processed_Files = {};
for i = 1:length(Already_Processed_Files)
    if Already_Processed_Files(i).isdir == 0 % check if the file is not a directory
        All_Processed_Files{end+1} = Already_Processed_Files(i).name;
    end
end


% Removes the already processed files from the vector that will be input into the for loop 
for i = 1:length(eegFiles)
    for j = 1:length(All_Processed_Files)
        if strcmp(eegFiles{i}, All_Processed_Files{j})
            eegFiles{i} = [];
            break
        end
    end
end


% Remove empty cells from the vector
eegFiles = eegFiles(~cellfun('isempty',eegFiles));

% Problematic EEG files
filesToRemove = {'example 1', 
                 'example 2'};


% Removes the remove files from the rsEEG files vector
for i = 1:length(eegFiles)
    for j = 1:length(filesToRemove)
        if strcmp(eegFiles{i}, filesToRemove{j})
            eegFiles{i} = [];
            break
        end
    end
end

% Remove empty cells from rsEEG files vector
eegFiles = eegFiles(~cellfun('isempty',eegFiles));

EEG General Cleaning

The following for loop does the following tasks:

  1. adds channels using the dpfit function
  2. removes unnecessary channels such as AUX1-3
  3. filters the EEG data from 1 - 30 Hz
  4. changes the sampling rate to 500 Hz
  5. re-references the data to TP9 and TP10
  6. uses the pop_clean_rawdata() function to identify problematic channels
  7. interpolates problematic channels using a spherical method
  8. creates and saves information about the number of channels interpolated as a CSV
  9. saves this cleaned EEG file
% % % % % % REMAINING CODE IS AUTOMATIC % % % % % % % % 
% % % Part 2: Preprocessing all EEG files in the specified directoy % % %

for ii = 1:length(eegFiles)
    Current_eegFile = eegFiles{ii}; %MUST BE SQUIGGLY LINE FOR SEGMENTATION TO WORK!!!!
    Pathway = [folder '\'];
    
    fullpath = strcat(Pathway,Current_eegFile);
    
    %Import data - change the name of the ID
    EEG = pop_fileio(fullpath, 'dataformat','auto');
    [ALLEEG EEG CURRENTSET] = pop_newset(ALLEEG, EEG, 0,'setname','ID#_Imported','gui','off');
    
    %Add channels
    EEG=pop_chanedit(EEG, 'lookup','C:\\Users\\lledesma.TIMES\\Documents\\MATLAB\\eeglab2022.0\\plugins\\dipfit\\standard_BEM\\elec\\standard_1005.elc');
    [ALLEEG EEG] = eeg_store(ALLEEG, EEG, CURRENTSET);
    
    %Aux and Sample Counter removal
    EEG = pop_select( EEG, 'nochannel',{'AUX_1','AUX_2','AUX_3','SampleCounter'});
    
    %TRACKING: starting channel number and how long the EEG recording is
    StartingChannels = EEG.nbchan;
    
    %Filter the data 1 - 30 Hz
    EEG = pop_eegfiltnew(EEG, 'locutoff',1,'hicutoff',30);
    [ALLEEG EEG CURRENTSET] = pop_newset(ALLEEG, EEG, 3,'setname','ID#_Filtered','gui','off');
    
    
    %Keep it at 500 Hz; the higher the sampling rate, the better the ICA
    EEG = pop_resample( EEG, Sample_Rate);
    [ALLEEG EEG CURRENTSET] = pop_newset(ALLEEG, EEG, 6,'setname','ID#_resampled','gui','off');
    
    % Rereference to TP9 and TP10
    EEG = pop_reref( EEG, [10 21] );
    [ALLEEG EEG CURRENTSET] = pop_newset(ALLEEG, EEG, 3,'gui','off'); 
    
    %Checks for channels that need interpolation
    EEG1 = pop_clean_rawdata(EEG, 'FlatlineCriterion',5,'ChannelCriterion',0.8,'LineNoiseCriterion',LineNoiseCriterion,'Highpass','off','BurstCriterion','off','WindowCriterion','off','BurstRejection','off','Distance','Euclidian');
    
    
    % The following if and else statement is IDs that have/do not have channels
    % that need interpolation. If they need interpolation they go into the if
    % statement. If not then they are directed to the code in the else.
    
    if length(fieldnames(EEG1.etc)) > 1
        
        %Returns the names of the channels that need to be interpolated
        Bad_Channels = find(EEG1.etc.clean_channel_mask == 0);
    
        %TRACKING: Number of channels that were interpolated
        Num_Interpolation = length(Bad_Channels);
    
        % Inteporlate the channels based on the object that we saved
        EEG = pop_interp(EEG, Bad_Channels', 'spherical');
    
    
    else
    
        %TRACKING: Number of channels that were NOT interpolated
        Num_Interpolation = 0;
    
    end
    
    
    % Save the data so it can be ready to use for ICA testing
    % First create like a data frame on MATLAB which is lowkey a pain in the
    % ass. This will tell us useful information about the data file, especially
    % to get us the proper PCA number
    
    RowNum = ii;
    VariableName = ["SubjectID" "RowNum" "StartingChannelNum" "InterpolationNum"];
    VariableValuesNum = [RowNum StartingChannels Num_Interpolation];
    VariableValuesNum2 = num2cell(VariableValuesNum);
    VariableValues = horzcat(Current_eegFile, VariableValuesNum2);
    DataLog = vertcat(VariableName,VariableValues);
    
    % This sets the working directory where the CSV files will be saved
    cd(save_pathway_csv);
    
    % Set the name for the CSV file that will be saved
    DataLogName = strcat(Current_eegFile,'.csv');
    
    % Saving the CSV file
    writematrix(DataLog, DataLogName);
    
    % Saving the EEG data
    Save_FileName = Current_eegFile
    EEG = pop_saveset(EEG, 'filename',Save_FileName,'filepath',save_pathway);

end

Running ICA

Now that our EEG data from above is pretty clean- we need to add our next cleaning procedure known as Independent Component Analysis. This is a complex procedure that takes the EEG signals and is able to use some fancy math to identify components that correspond to the electrical signals seen in the data. For example, it can identify all of the activity in the original signal that is coming strictly from eye or muscle movements, and other artifacts. Thus, ICA is primarily used for the identification of these components that are associated with noise and flags and all the other components that are brain based. However, these components are not removed at this stage.

This process takes a very long time. Additionally, we ran ICA using Runica and incorporated the correct PCA number for each subject.

for ii = 1:length(eegFiles)
    Current_eegFile = eegFiles{ii} %MUST BE SQUIGGLY LINE FOR SEGMENTATION TO WORK!!!!
    Pathway = [folder '\'];
   
    % Load in EXISTING EEG files
    EEG = pop_loadset('filename',Current_eegFile,'filepath',Pathway);
    
    % Load in excel file of interest
    Excel_fileName = [Current_eegFile(1:end-3), 'set'];
    Load_Excel_file = [Excelfiles_folder '\' Excel_fileName '.csv'];
    
    Excel_variables = xlsread(Load_Excel_file);
    Starting_Channels = Excel_variables(2);
    Interpolation_Num = Excel_variables(3);
    
    PCA_number = Starting_Channels - Interpolation_Num - 2 % The minus two represents re-referencing to TP9 and TP10
    
    % Resample the data to 250 to make ICA faster
    EEG = pop_resample( EEG, 250);
    [ALLEEG, EEG, CURRENTSET] = eeg_store( ALLEEG, EEG, 0 );

    % Run ICA
    EEG = pop_runica(EEG, 'icatype', 'runica', 'extended',1,'pca',PCA_number,'interrupt','on');
    
    % Saving the EEG data with ICA ran
    Save_FileName = strrep(Current_eegFile, '.set', '');
    EEG = pop_saveset(EEG, 'filename',Save_FileName,'filepath',save_pathway);

end

Using MARA

MARA is the code used to identify the noise related components from the ICA and remove them.

for ii = 1:length(eegFiles)
    Current_eegFile = eegFiles{ii} %MUST BE SQUIGGLY LINE FOR SEGMENTATION TO WORK!!!!
    Pathway = [folder '\'];
    
    % Load in EEG files
    EEG = pop_loadset('filename',Current_eegFile,'filepath',Pathway);

    % Make Mara work
    [ALLEEG,EEG,CURRENTSET] = processMARA(ALLEEG,EEG,CURRENTSET)

    % Show the components made from ICA
    %pop_selectcomps(EEG, [1:length(EEG.icaweights(:,1))] );

    % Find components to reject
    Rejected_Component = find(EEG.reject.gcompreject == 1);
    RejectedComponentNum = length(Rejected_Component);
      
    % Reject flagged components
    EEG = pop_subcomp(EEG, Rejected_Component, 0);

    % Save the number of components removed
    VariableName = ["SubjectID" "Components Removed"];
    VariableValuesNum = [RejectedComponentNum];
    VariableValuesNum2 = num2cell(VariableValuesNum);
    VariableValues = horzcat(Current_eegFile, VariableValuesNum2);
    DataLog = vertcat(VariableName,VariableValues);

    % This sets the working directory where the CSV files will be saved
    cd(save_pathway_csv);

    % Set the name for the CSV file that will be saved
    DataLogName = strcat(Current_eegFile,'.csv');
    
    % Saving the CSV file
    writematrix(DataLog, DataLogName);

    % Saving the EEG data
    Save_FileName = Current_eegFile
    EEG = pop_saveset(EEG, 'filename',Save_FileName,'filepath',save_pathway);

end

Renaming the markers

A huge challenge in cleaning ONR GnG data is the marker names chosen (events) to describe the presented stimulus. There are hundreds of them. Below shows a few of the marker names that show when using the pop_squeezevents(EEG) command. (Disclaimer: This function only works if you have ERPLAB installed).

original marker names

Thus, some scripts were written to change the names of these markers to something that makes more sense.

The code below is very complex. It is first extracting all of the different marker names in an EEG file. Then it is isolating a subset of these markers that correspond to stimuli presentation (as opposed to including response markers). Then it creates a cell to differentiate between a stimulus marker and a block condition, since the names of these components in the marker overlap- thus we don't want to accidentally include a response marker because it contains 'G1' since this both represents a stimulus marker (Go trial) and a blockname condition (Colors and Shape condition; Don't ask me, I didn't decide on these marker names...).

Part 1: It then takes the following markers that contain the following numbers (1, 5, 8, 12) and converts them into Go trials and the other numbers (2, 3, 4, 6, 7, 9, 10, 11) and converts them into NoGo trials. For more information on this, please visit the following link: GnG marker codes

Part 2: It uses the information from the blockname portion of the stimuli marker to identifies the block conditions that we are interested in, which are five (does not including their practice version). The 'Unknown' category was added for quality control.

  1. Colors and Shapes
  2. Colors and Shapes fixed ISI
  3. Colors Shape Arrow
  4. Colors Shape Reverse Arrow
  5. Saber Color Matching
  6. Unknown

Part 3: Using a section of the response marker name to identify if the trial was done correctly or not.

for ii = 1:length(eegFiles)

    Current_eegFile = eegFiles{ii} %MUST BE SQUIGGLY LINE FOR SEGMENTATION TO WORK!!!!
    Pathway = [folder '\'];
    
    fullpath = strcat(Pathway,Current_eegFile);
    
    % Load in EXISTING EEG files
    EEG = pop_loadset('filename',Current_eegFile,'filepath',Pathway);

    % Save all markers into a variable
    % Wrote extra code to confirm this is the case
    All_marker_names = cell(size(EEG.event));
    
    for i = 1:numel(EEG.event)
        All_marker_names{i} = EEG.event(i).type;
    end


    % % % % % % % PART 1: OBTAINING STIMULI PRESENTING MARKERS % % % % % %
    % Extract only the markers that start with S
    % Define substrings to check for
    stimuli_markers = {'S0', 'S1', 'S2', 'S3', 'S4', 'S5', 'S6', 'S7', 'S8', 'S9'};

    % Initialize a cell array to store values containing the specified substrings
    extracted_stimuli_markers = {};

    % Check each string for the specified substrings
    for i = 1:numel(All_marker_names)
        for j = 1:numel(stimuli_markers)
            if contains(All_marker_names{i}, stimuli_markers{j})
                % If the string contains the substring, add it to the selectedValues
                extracted_stimuli_markers = [extracted_stimuli_markers, All_marker_names{i}];
                break; % Break out of the inner loop once a match is found
            end
        end
    end

    % % % % % % % PART 2: OBTAINING BLOCK AND ITEM IDS % % % % % % % %
    % Initialize matrix to store strings, block IDs, and item IDs
    infoMatrix = cell(numel(extracted_stimuli_markers), 3);
    
    % Define regular expressions for extracting block ID and item ID
    pattern = 'G(\d+)G(\d+)';
    
    % Extract original string, block ID, and item ID from each string
    for i = 1:numel(extracted_stimuli_markers)
        % Store the original string
        infoMatrix{i, 1} = extracted_stimuli_markers{i};
        
        % Extract block ID and item ID
        match = regexp(extracted_stimuli_markers{i}, pattern, 'tokens');
        if ~isempty(match)
            infoMatrix{i, 2} = str2double(match{1}{1}); % Block ID
            infoMatrix{i, 3} = str2double(match{1}{2}); % Item ID
        end
    end
    
    
    % % % % % % % PART 3: ASSIGNING GO and NO-GO TRIALS % % % % % % % %
    % Extract the third column that has information on item ids
    % Use this to identify Go trials and No-Go trials
    column3 = cell2mat(infoMatrix(:, 3));
    
    % Initialize a new column for trial type
    trialType = cell(size(column3));
    
    % Assign trial types based on conditions
    goTrials = [1, 5, 8, 12];
    noGoTrials = [2, 3, 4, 6, 7, 9, 10, 11];
    
    % Determine trial types
    for i = 1:length(column3)
        if ismember(column3(i), goTrials)
            trialType{i} = 'Go trial';
        elseif ismember(column3(i), noGoTrials)
            trialType{i} = 'No-Go trial';
        else
            trialType{i} = 'Unknown trial type';  % Handle other cases if needed
        end
    end
    
    % Add the new column to the original cell array
    yourCellArray = [infoMatrix, trialType];
    

    % % % % % % % PART 4: KEEPING APPROPRIATE BLOCK IDS % % % % % % % %
    % We only care about 5 conditions, thus we need to index for those and
    % save them before epoching the data. 

    % Extract values from column 2
    column2 = cell2mat(yourCellArray(:, 2));
    
    % Define the values to keep in column 2
    validValues = [1, 2, 3, 4, 5];
    
    % Use logical indexing to filter rows based on column 2 values
    filteredCellArray = yourCellArray(ismember(column2, validValues), :);


    % % % % % % % PART 5: RENAMING STIMULI EVENT MARKERS IN EEGLAB % % % % % % % %
    % now that we know which event markers are the most meaningful for our
    % analysis, we need to rename them in EEGLAB for easier extractions
    % when wanting to epoch the data. 
    
    % % % % % % STIMULI EVENT MARKERS

    % Extract values from column 2 and column 4
    column2 = cell2mat(filteredCellArray(:, 2));
    column4 = filteredCellArray(:, 4);
    
    % Initialize a new column
    newMarkers = cell(size(column2));
    
    % Define mappings based on values in column 2 and column 4
    for i = 1:length(column2)
        % Map values based on column 2
        switch column2(i)
            case 1
                newMarkers{i} = 'Colors_and_Shape';
            case 2
                newMarkers{i} = 'Colors_and_Shape_fixed_ISI';
            case 3
                newMarkers{i} = 'Colors_Shape_Arrow';
            case 4
                newMarkers{i} = 'Colors_Shape_Reverse_Arrow';
            case 5
                newMarkers{i} = 'Saber_Color_Matching';
            otherwise
                newMarkers{i} = 'Unknown';
        end
        
        % Append _Go_trial or _No_Go_trial based on column 4
        if strcmp(column4{i}, 'Go trial')
            newMarkers{i} = [newMarkers{i}, '_Go_trial'];
        elseif strcmp(column4{i}, 'No-Go trial')
            newMarkers{i} = [newMarkers{i}, '_No_Go_trial'];
        end
    end
    
    % Add the new column to the original cell array
    newMarkersCellArray = [filteredCellArray, newMarkers];
    
    % Loop through EEG.event.type and find matches in yourCellArray
    for i =  1:numel(EEG.event)
        % Find the index where the values match in column 1 of yourCellArray
        index = find(strcmp(newMarkersCellArray(:, 1), EEG.event(i).type));
        
        % Check if there is a match
        if ~isempty(index)
            % Assign the corresponding value from column 5
            EEG.event(i).type = newMarkersCellArray{index, 5};
        else
            % If no match, use the original value from EEG.event.type
            EEG.event(i).type = EEG.event(i).type;
        end
    end

    % % % % % % % PART 6: RENAMING RESPONSE EVENT MARKERS IN EEGLAB % % % % % % % %
    % This will require extracting response makers that we did not do in
    % earlier steps, in addition, all we will do is rename them into 
    % 'correct' or 'incorrect' values. 
    
    % % % % % % RESPONSE EVENT MARKERS
    response_markers = {'R0','R1'};
    
    % Initialize a cell array to store values containing the specified substrings
    extracted_response_markers = {};

    % Check each string for the specified substrings
    for i = 1:numel(All_marker_names)
        for j = 1:numel(response_markers)
            if contains(All_marker_names{i}, response_markers{j})
                % If the string contains the substring, add it to the selectedValues
                extracted_response_markers = [extracted_response_markers, All_marker_names{i}];
                break; % Break out of the inner loop once a match is found
            end
        end
    end

    % Extract the first two letters from column 1
    firstTwoLetters = cellfun(@(x) x(1:2), extracted_response_markers, 'UniformOutput', false);

    % Initialize a new column
    Response_correct_label = cell(size(firstTwoLetters));
    
    % Define mappings based on values in firstTwoLetters
    for i = 1:length(firstTwoLetters)
        % Map values based on firstTwoLetters
        switch firstTwoLetters{i}
            case 'R0'
                Response_correct_label{i} = 'Incorrect_Response';
            case 'R1'
                Response_correct_label{i} = 'Correct_Response';
        end
    end

    
    responseMarkerMatrix = cell(numel(extracted_response_markers), 2);
    
    % Extract original string, block ID, and item ID from each string
    for i = 1:numel(extracted_response_markers)
        
        % Store the original string
        responseMarkerMatrix{i, 1} = extracted_response_markers{i};
        responseMarkerMatrix{i, 2} = Response_correct_label{i};
    end

    % Loop through EEG.event.type and find matches in yourCellArray
    for i =  1:numel(EEG.event)
        % Find the index where the values match in column 1 of yourCellArray
        index = find(strcmp(responseMarkerMatrix(:, 1), EEG.event(i).type));
        
        % Check if there is a match
        if ~isempty(index)
            % Assign the corresponding value from column 5
            EEG.event(i).type = responseMarkerMatrix{index,2};
        else
            % If no match, use the original value from EEG.event.type
            EEG.event(i).type = EEG.event(i).type;
        end
    end

    conditionsAndTrials = {'Colors_and_Shape_Go_trial',
                           'Colors_and_Shape_fixed_ISI_Go_trial',
                           'Colors_Shape_Arrow_Go_trial',
                           'Colors_Shape_Reverse_Arrow_Go_trial',
                           'Saber_Color_Matching_Go_trial',
                           'Colors_and_Shape_No_Go_trial',
                           'Colors_and_Shape_fixed_ISI_No_Go_trial',
                           'Colors_Shape_Arrow_No_Go_trial',
                           'Colors_Shape_Reverse_Arrow_No_Go_trial',
                           'Saber_Color_Matching_No_Go_trial'};
    
    responseMarkers = {'Incorrect_Response',
                       'Correct_Response'};
    
    % Using curly braces for cell arrays
    eventTypesToKeep = vertcat(conditionsAndTrials, responseMarkers);

    % Find the indices of events with the specified types
    indicesToKeep = ismember({EEG.event.type}, eventTypesToKeep);
    
    % Keep only the events with the specified types
    EEG.event = EEG.event(indicesToKeep);

    % Extract all markers 
      All_marker_names = cell(size(EEG.event));
    
    for i = 1:numel(EEG.event)
        All_marker_names{i} = EEG.event(i).type;
    end

    % Quality control check one - what is the count of trial types we have
    % and responses?
    % Counting occurrences of each unique string
    uniqueStrings = unique(All_marker_names);
    counts = countcats(categorical(All_marker_names));

    % Combining into a table
    resultTable = table(uniqueStrings', counts', 'VariableNames', {'UniqueStrings', 'Counts'});

    % Displaying the result table
    disp(resultTable);

    % Set the save directory to save the fil
    cd(save_pathway_csv)

    % Set the name to save the file
    Current_eegName = Current_eegFile(1:end-4);
    csvFilename = [Current_eegName '_Markers_cleaned1.csv'];

    % Save the file
    writetable(resultTable, csvFilename);

    % % % PART 7: SAVE THE EEG DATA % % % %
    % We finally made it to the end game
    Save_FileName = Current_eegFile
    EEG = pop_saveset(EEG, 'filename',Save_FileName,'filepath',save_pathway);
    
end

Next Step: Creating ERPs

Creating GnG ERPs script