Detailed instructions - murphyp7/meg-preproc GitHub Wiki

Protocol for preprocessing data for ICA, identifying and removing bad components, and segmenting data for trial-based analysis. This should be run on individual MEG recordings. Note: all code is written to work with Fieldtrip version 20160221 on Matlab version 2015b; if working with later versions some adjustments will almost certainly be required.

1. Identify artifacts (preprocSurpriseD_artifacts.m)

Load event information for the entire recording and identify start/end points of individual blocks.
Begin for loop through each block, and within this loop carry out the following steps:

Load the following data within the block bounds +/- 10 seconds: MEG sensors, head localization channels, Eyelink channels, and relevant EEG channels (e.g. scalp, EOG and EKG for surprise study). Not loading additional channels will save memory/time.
Downsample to 400Hz.
Filter out 50Hz line noise.
Identify times of head motion, blink, saccade, jump, muscle and car/other artifacts, and store these in their own Nx2 matrices (column 1 = sample of artifact onset; column 2 = sample of artifact end). NB: the sample numbers in these matrices should be relative to block segment onset (since after downsampling, data.sampleinfo gets reset and reference to native recording time is lost) – this will be the standard time reference for all analyses.
Save these artifact matrices – an example of which is Artifacts_CMC-1_01_Block_1.mat. (We don’t save the data here to save harddrive space.)

The goal of this step is solely to identify and save times of various kind of artifact. A .png file (example: Overview_CMC-1_01_Block_1.png) will also be saved for each block which plots channel-wise power spectra and various types of identified artifact, and so can be used to get an initial impression of data quality (needs some work in uploaded code).

2. Run ICA on preprocessed data (ICA_SurpriseD.m)

As above, load event information for entire recording to identify block start/end points, and enter into for loop through blocks that does the following:

Load data within block bounds +/-10 seconds and downsample to 400Hz (this will straight away bring these data in line with the time stamps of the artifact matrices saved previously).
Filter out 50Hz line noise.
High-pass filter the data at a cutoff of 1Hz.
Load previously created artifact matrices for this block (which, remember, contain sample numbers relative to the start of this block’s data segment).
Remove this block’s head movement, muscle, jump and car/other artifacts from the data and create new blink/saccade artifact matrices that take these periods of discarded data into account (this can be a little bit tricky and care needs to be taken that it’s done correctly).
Concatenate both the cleaned data and the new blink/saccade matrices across blocks.

Save the across-block concatenated data and new realigned artifact matrices together in a .mat file. Note that these will be used solely for the purpose of identifying ICA components that we want to remove – we only save them here so that we don’t need to go through the initial preprocessing steps yet another time.
Subject only the concatenated MEG data to ICA, and save the resulting ‘comp’ structure.

The goal of this step is to save ICA weights and data/artifact matrices that will make it easy to identify ICA components that we want to remove.

3. Identify ICA components for removal (identify_ICA_comps.m)

For each of blink, saccade and EKG artifacts, do the following:

Load the data and artifact matrices saved from Step 2 above.
Segment the data around artifacts of interest (for blinks/saccades, the timings are given by the Nx2 matrices calculated previously; for heart beats they are given by calling ft_artifact_ecg.m)
Load the ICA component structure saved from Step 2 above and compute component time-series by projecting the ICA weights onto the artifact-aligned data.
Average each ICA component and the relevant artifactual channel (vEOG for blinks, Y-gaze for saccades, EKG for heart beat) across trials, resulting in a single artifact-aligned average time-course for each component and for the relevant artifactual channel.
Compute the coherence between the trial-averaged artifactual channel and each ICA component – this coherence metric is primarily what we use to hone in on potentially artifactual components.
Plot the time-courses and topographies (i.e. weights) of the components with the top 25 coherence values, manually identify those that are likely artifactual, and store the numbers of these artifactual components in a separate script.

The goal of this step is to identify ICA components that are likely artifactual, and save their identities so we can later remove them from the data. Note: see uploaded ICA_blink.pdf and ICA_ecg.pdf figures for examples of blink (ICs 7 and maybe 40) and heart-beat (ICs 37 and 35) components, respectively; see also http://www.fieldtriptoolbox.org/example/use_independent_component_analysis_ica_to_remove_eog_artifacts/ and https://mne.tools/dev/auto_tutorials/preprocessing/plot_40_artifact_correction_ica.html#sphx-glr-auto-tutorials-preprocessing-plot-40-artifact-correction-ica-py for other examples to guide your search.

4. Prepare data for trial-based analysis (prep_trialAnalysis.m)

Load original event information for entire recording and this time, use a custom trial definition script to construct the actual trial definition matrix (i.e. each row will specify the timing of each trial). This matrix should also include other pieces of relevant information for trial identification, including session/block/trial numbers, and features of trial itself (e.g. response time, choice, etc.); these will be important for later analysis steps (especially the column corresponding to the block number of each trial, which will be used in step 4c-vi below). Example trial definition script: trialbasedfun_surpriseD.m.
Also construct a separate matrix containing the start/end points of each block +/-10 seconds (i.e. the bounds that we use for loading data in previous processing steps).
Loop through each block and do the following:

Load continuous data within bounds of the block.
Filter out 50Hz line noise.
High-pass filter using cutoff of 0.1Hz (note lower cutoff than that employed previously, which will facilitate analysis of slow ERFs if desired).
Downsample filtered block data to 400Hz.
Load ICA weights and IDs of artifactual components constructed in steps 2 and 3 previously, and remove artifactual components from downsampled data.
Segment cleaned data into trials using previously-constructed trial matrix. Note that this is a challenge, because our data have been downsampled and aligned relative to block start, but our trial matrix contains sample numbers at the native MEG sampling rate and relative to recording start. So, you will need to do the following: i) Convert any sample values contained in the trial matrix (at least the first three columns, but possibly more depending on what you choose to include in your trial definition script) to the new sampling rate by the following (where a sample value is x, and sampling rate is fs): xnew = round(xold/fsold*fsnew). [Note that rounding is important so that the resulting sample numbers are whole numbers]. ii) Isolate the rows in the trial matrix (i.e. trials) that specifically account for trials in the current block. A new matrix should be created that contains these rows with sample numbers now aligned to block start. To do so, go back to the block matrix constructed in step 4b, convert the block start sample for this block to the new sampling rate using the above equation, and then subtract this value from all sample numbers in the new trial matrix specifically for this block. The result will be exactly what we want: a trial matrix containing only trials for the current block, with sample numbers that are relative to the start of that block segment. iii) Use this new block-specific trial matrix to segment this block’s data into trials, by feeding it in as input to the ft_redefinetrial function.
Load all artifact matrices (blink, saccade, jump, muscle) for this block estimated during Step 1 above, and remove any trials where the time span of the trial overlaps with whatever kinds of artifact we want to be sure are not affecting the data (head movement, jump and muscle for sure; blink and saccade if we want to be extra careful, which we do especially for stimulus decoding analyses).
Store clean trials for subsequent concatenation across blocks.

Concatenate all clean data across blocks and save.

The goal of this step is to clean the data of any artifactual ICA components and residual artifacts, and segment the cleaned data into trial epochs which can then be analysed.