fMRI process pipeline: HCPD_dataset - CuiLabCIBR/utils GitHub Wiki

Step1: HCPD data structure

The data structure of HCPD(HCP Development) is similar to HCP-Adults dataset, however, they use PA(AP) phase encoding direction instead of RL(LR) in HCPD data. The reason why they choose AP(PA) phase encoding direction was descibed in the paper Extending the Human Connectome Project across ages: Imaging protocols for the Lifespan Development and Aging projects, if you are interest about it, please see 3.2. BOLD imaging section. Let's see an example subject of HCPD dataset:

For each subject, there are four folders for resting fMRI data, two folders are AP phase encoded and the others are PA phase encoded. REST1 refers to that those data was scanned in first day, similarly, the REST2 was scanned in second day. Those nii files and cifti files was prerprocessed by HCP minimal process pipeline. In summary, the file that we got have been removed spatial distortions, realigned volumes to compensate for subject motion, registered to the structural, normalized to MNI space, masked by the data with the final brain mask and projected into the CIFTI grayordinate standard space(91k space). You should notice that 'wm_mean_signal.txt','whole_mean_signal' and 'csf_mean_signal.txt' was caculated by customer(for this instance, me). And the 'confounds_timeseries.tsv' is also caculated by me. If you did not know why we should get those files, please read fMRI process pipeline : HCP style image dataset HCP_ALL_Family.

For task fMRI, HCPD have three types tasks, they are Guessing(GUESSING), Go/NoGo (CARIT), and Emotion（EMOTION)
.

The details of task is described in The Lifespan Human Connectome Project in Development: A large-scale study of brain connectivity development in 5–21 year olds. In summary, the acquisition parameters for tfMRI and rfMRI are identical, and we also download the minimal process task fMRI data (same as resting fMRI data).

Step2: Convert HCPD data into BIDS

Considering for the standard process of fMRI data, we use the fMRIPrep and XCP-ABCD tools to process the fMRI data of HCP. However, we need convert the HCP style dataset into the BIDS structure becasue of the input restriction of XCP-ABCD. That is the reason why we look through the HCP style datasets firstly. The BIDS standard structure aims to organise and describe neuroimaging data in a uniform way to simplify data sharing through the scientific community. Let's convert the HCPD dataset into BIDS structure directly(If you don't familiar with BIDS, please read fMRI process pipeline: fMRI preprocess steps and tools and fMRI process pipeline : HCP style image dataset HCP_ALL_Family ).

Here, we use python to do all steps. First, we can check the file status by using os and pandas.

data_recommnd = '/HCPD/rfMRI_PrepRecommand_Package_1188937/fmriresults01/' #this directory store the hcp processed data,mainly for motion txt 
data_unpre = '/HCPD/rfMRI_PrepUnclean/' #this directory store the hcp minimal processed data,mainly for nii file and cii file 
data_re_list = os.listdir(data_recommnd)
da_unpre_list = os.listdir(data_unpre)
data_all_have = list(set(da_unpre_list).intersection(set(data_re_list))) # get common subject name list
runs = ('rfMRI_REST1_AP','rfMRI_REST1_PA','rfMRI_REST2_AP','rfMRI_REST2_PA')
time = 1
for name in data_all_have:
    data_sub_re = os.path.join(data_recommnd,name,'MNINonLinear/Results')
    data_sub_unpre= os.path.join(data_unpre,name,'MNINonLinear/Results')
    for run in runs:
        data_sub_re_run = os.path.join(data_sub_re,run)
        data_sub_unpre_run = os.path.join(data_sub_unpre,run)
        motion_txt = os.path.join(data_sub_re_run,'Movement_Regressors.txt')
        rms_txt = os.path.join(data_sub_re_run,'Movement_RelativeRMS.txt')
        nii_file = os.path.join(data_sub_unpre_run,run+'.nii.gz')
        cii_file = os.path.join(data_sub_unpre_run,run+'_Atlas.dtseries.nii')
        msm_file = os.path.join(data_sub_unpre_run,run+'_Atlas_MSMAll.dtseries.nii')
        all_file_list = {'subname':name,
                        'runs':run,
                        'motion_txt':motion_txt,
                        'nii_file':nii_file,
                        'cii_file':cii_file,
                        'msm_file':msm_file,
                        'rms_txt':rms_txt}
        
        if time==1:
            all_file_df = pd.DataFrame(all_file_list,index=[0])
            all_file_df = all_file_df.assign(motion_txt_exists = lambda dataframe: dataframe['motion_txt'].map(
                lambda motion_txt: True if os.path.isfile(motion_txt) else False)).assign(
                nii_exists = lambda dataframe: dataframe['nii_file'].map(
                lambda nii_file: True if os.path.isfile(nii_file) else False)).assign(
                cii_exists = lambda dataframe: dataframe['cii_file'].map(
                lambda cii_file: True if os.path.isfile(cii_file) else False)).assign(
                msm_exists = lambda dataframe: dataframe['msm_file'].map(
                lambda msm_file: True if os.path.isfile(msm_file) else False)).assign(
                rms_exists = lambda dataframe: dataframe['rms_txt'].map(
                lambda rms_txt: True if os.path.isfile(rms_txt) else False))            
        else:
            tmp_file_df = pd.DataFrame(all_file_list,index=[0])
            tmp_file_df = tmp_file_df.assign(motion_txt_exists = lambda dataframe: dataframe['motion_txt'].map(
                lambda motion_txt: True if os.path.isfile(motion_txt) else False)).assign(
                nii_exists = lambda dataframe: dataframe['nii_file'].map(
                lambda nii_file: True if os.path.isfile(nii_file) else False)).assign(
                cii_exists = lambda dataframe: dataframe['cii_file'].map(
                lambda cii_file: True if os.path.isfile(cii_file) else False)).assign(
                msm_exists = lambda dataframe: dataframe['msm_file'].map(
                lambda msm_file: True if os.path.isfile(msm_file) else False)).assign(
                rms_exists = lambda dataframe: dataframe['rms_txt'].map(
                lambda rms_txt: True if os.path.isfile(rms_txt) else False))   
            all_file_df = all_file_df.append(tmp_file_df,ignore_index=True)
        time = time+1
all_file_df.to_csv('HCP_D_all_sub_file_summary.csv')

And then, we can use the rms_txt to control the motion quality of resting fMRI data by such rules:

(1) mean FD exceeded 1.5 times the inter-quartile range of the distribution
(2) Individual scans were also excluded if more than 25% of frames exceeded 0.2mm frame-wise displacement(Jekinson FD).

all_file_df['rms_mean'] = ''
all_file_df['rms_valid'] = ''
all_file_df['rms_above_25'] = '' #25 actually means 0.25mm
all_file_df['rms_above_25_valid'] = ''
for (num_row,j) in zip(np.arange(0,all_file_df.shape[0]),trange(all_file_df.shape[0])):
    if all_file_df.loc[num_row,['rms_exists']].values[0]==True:
        file_name = all_file_df.loc[num_row,['rms_txt']].values[0]
        data_rms = pd.read_csv(file_name)
        mean_rms = np.nanmean(data_rms.to_numpy())
        valid = True if mean_rms > 0.2 else False
        ab_25 = sum(data_rms.to_numpy()>0.2)/len(data_rms.to_numpy())
        valid_ab_25 = True if ab_25 > 0.25 else False
        #ab_25 = sum(data_rms.to_numpy()>0.25)
        #valid_ab_25 = True if ab_25 > 20 else False
        all_file_df.loc[num_row,['rms_mean']] = mean_rms
        all_file_df.loc[num_row,['rms_valid']] = valid
        all_file_df.loc[num_row,['rms_above_02']] = ab_25
        all_file_df.loc[num_row,['rms_above_02_valid']] = valid_ab_25

The rules of motion control based on those two papers, Dynamic expression of brain functional systems disclosed by fine-scale analysis of edge time series and Edge-centric functional network representations of human cerebral cortex reveal overlapping system-level architecture. Those paper are high impact papers and the quality control are also accepted by experts in neuro-image field.

Next, we will move on caculating the 'wm_mean_signal.txt','whole_mean_signal' and the 'csf_mean_signal' for each subject. Those mask files used in here was calculated in fMRI process pipeline : HCP style image dataset HCP_ALL_Family.

slurm = '/HCPD/code/post_process.slurm'
wm_mask = '/HCPD/code/WM_csf_mask/final_csf_10_mask.nii.gz'
csf_mask = '/HCPD/WM_csf_mask/final_WM_10_mask.nii.gz'
whole_brain = '/HCPD/WM_csf_mask/brainmask_fs.2.nii.gz'
no_nii_sub = []
for num_row in np.arange(0,all_file_df.shape[0]):
    nii_file = all_file_df.loc[num_row,['nii_file']].values[0]
    nii_path = os.path.join(data_unpre,all_file_df.loc[num_row,['subname']].values[0],'MNINonLinear/Results',
                                 all_file_df.loc[num_row,['runs']].values[0])
    subname = all_file_df.loc[num_row,['subname']].values[0]
    runs = all_file_df.loc[num_row,['runs']].values[0]
    slurm_txt = ('#!/bin/bash'+'\n'+'#SBATCH --nodes=1'+'\n'+'#SBATCH -p lab_fat'+'\n'+
                            '#SBATCH --cpus-per-task=1'+'\n'+'#SBATCH -e '+subname + runs+'.error'+'\n'+
                             '#SBATCH -o '+subname+ runs +'.out'+'\n'+'module load fsl')
    if all_file_df.loc[num_row,['nii_exists']].values==True:
        if os.path.exists(nii_path  + '/wm_mean_signal.txt') & os.path.exists(nii_path  + '/csf_mean_signal.txt') & os.path.exists(nii_path  + '/whole_mean_signal.txt'):
            continue
        #calculate the mean signal of nii file for wm csf and whole brain
        else:           
            cmd1 = 'fslmeants -i '+ nii_file +' -o '+ nii_path  + '/wm_mean_signal.txt'+' -m '+ wm_mask
            cmd2 = 'fslmeants -i '+ nii_file +' -o '+ nii_path  + '/csf_mean_signal.txt'+' -m '+ csf_mask
            cmd3 = 'fslmeants -i '+ nii_file +' -o '+ nii_path  + '/whole_mean_signal.txt'+' -m '+ whole_brain            
            with open(slurm,"w") as file:
                    file.write(slurm_txt+"\n"+cmd1 +"\n" +cmd2+"\n"+cmd3+"\n")
            !sbatch /GPFS/cuizaixu_lab_permanent/Public_Data/HCPD/code/post_process.slurm
    else:
        no_nii_sub.append(all_file_df.loc[num_row,['subname']].values[0] + '_'+ all_file_df.loc[num_row,['runs']].values[0])
    print(subname +'has already done')

When we have those nuisance parameters for physiological noise, we can use such function to get the '*confounds_timeseries.tsv' for XCP-ABCD post denosing process:

def get_confound_tsv(motion_file,csf_file,wm_file,whole_file,rmsd_file):
    reg=motion_file
    reg=reg.iloc[:,0:6]
    reg.columns=['trans_x','trans_y','trans_z','rot_x','rot_y','rot_z']
    # convert rotx to degree 
    reg['rot_x']=reg['rot_x']*np.pi/180 # the unit of orignal rot value is degree, so we need to convert is into mm 
    reg['rot_y']=reg['rot_y']*np.pi/180
    reg['rot_z']=reg['rot_z']*np.pi/180
    reg_brian=pd.DataFrame({'global_signal':whole_file,'white_matter':wm_file,'csf':csf_file,'rmsd':rmsd_file })
    regressors =  pd.concat([reg, reg_brian], axis=1)
    return regressors

based on this function, we can calculate the '*confounds_timeseries.tsv' for each subject by follow commands:

len_sub = []
task_sub = []
for num_row in np.arange(0,all_file_df.shape[0]):
    file_path = os.path.join(data_unpre,all_file_df.loc[num_row,['subname']].values[0],'MNINonLinear/Results',
                                 all_file_df.loc[num_row,['runs']].values[0])
    if os.path.exists(os.path.join(file_path,'Movement_Regressors.txt'))==True:
        motion_file = pd.read_csv(os.path.join(file_path,'Movement_Regressors.txt'),header=None,delimiter=r"\s+")
        csf_file = np.loadtxt(os.path.join(file_path,'csf_mean_signal.txt'))
        wm_file = np.loadtxt(os.path.join(file_path,'wm_mean_signal.txt'))
        whole_file = np.loadtxt(os.path.join(file_path,'whole_mean_signal.txt'))
        rmsd_file = np.loadtxt(os.path.join(file_path,'Movement_AbsoluteRMS.txt'))
        if len(csf_file) == len(wm_file) == len(whole_file) ==len(rmsd_file):
            subname = all_file_df.loc[num_row,['subname']].values[0][0:10]
            run = all_file_df.loc[num_row,['runs']].values[0]
            regressors_table = get_confound_tsv(motion_file,csf_file,wm_file,whole_file,rmsd_file)
            file_name = file_path +'/sub-'+subname + '_task-'+run[6:11]+'_acq-'+ run[12:]+'_desc-confounds_timeseries.tsv'
            if os.path.exists(file_name)==False:
                regressors_table.to_csv(file_name,index=False,sep= '\t')
        else:
            len_sub.append(subname)
            task_sub.append(runs)
            print(task_path+direct+'length not equal')
    print(subname+' has done')
lack_file = {'subject':len_sub,'task':task_sub}
lack_file_table = pd.DataFrame(lack_file)

Now, we have already get the '*confounds_timeseries.tsv', we can move on to convert the nii file and cii file into BIDS style. I use below script to do it:

anat_file_fake = '/deratives/fmriprep/sub-MSC01/anat/sub-MSC01_from-MNI152NLin6Asym_to-T1w_mode-image_xfm.h5'
target_path_or = '/HCPD/processing/fmriprep/'
for (num_row,j) in zip(np.arange(0,all_file_df.shape[0]),trange(all_file_df.shape[0])):
    file_path = os.path.join(data_unpre,all_file_df.loc[num_row,['subname']].values[0],'MNINonLinear/Results',
                                 all_file_df.loc[num_row,['runs']].values[0])
    subname = all_file_df.loc[num_row,['subname']].values[0][0:10]
    run = all_file_df.loc[num_row,['runs']].values[0]
    task_path = os.path.join(data_unpre,all_file_df.loc[num_row,['subname']].values[0],'MNINonLinear/Results',
                                 all_file_df.loc[num_row,['runs']].values[0])
    target_path = target_path_or
    #get files that we need
    cii_file = all_file_df.loc[num_row,['cii_file']].values[0]
    nii_file = all_file_df.loc[num_row,['nii_file']].values[0]
    motion_regress = file_path +'/sub-'+subname + '_task-'+run[6:11]+'_acq-'+ run[12:]+'_desc-confounds_timeseries.tsv'
    if os.path.exists(cii_file) & os.path.exists(nii_file) & os.path.exists(motion_regress):
        #set new names for fmriprep
        nii_name = 'sub-'+subname+'_task-'+run[6:11]+'_acq-'+run[12:]+'_space-MNI152NLin2009cAsym_desc-preproc_bold'
        cii_name = 'sub-'+subname+'_task-'+run[6:11]+'_acq-'+run[12:]+'_space-fsLR_den-91k_bold'
        tsv_name = 'sub-'+subname+'_task-'+run[6:11]+'_acq-'+run[12:]+'_desc-confounds_timeseries'
        mni2t1 = 'sub-'+subname+'_from-MNI152NLin2009cAsym_to-T1w_mode-image_xfm.h5'
        t12mni = 'sub-'+subname+'_from-T1w_to-MNI152NLin2009cAsym_mode-image_xfm.h5'
        #create the fmriprep file path
        fmriprep_path = os.path.join(target_path,'sub-'+ subname,'func')
        ant_path = os.path.join(target_path,'sub-'+ subname,'anat')            
        if os.path.exists(fmriprep_path)== False:
            os.makedirs(fmriprep_path)
            os.makedirs(ant_path)
        shutil.copyfile(nii_file,fmriprep_path+'/'+nii_name+'.nii.gz')
        shutil.copyfile(cii_file,fmriprep_path+'/'+cii_name+'.dtseries.nii')
        shutil.copyfile(motion_regress,fmriprep_path+'/'+tsv_name+'.tsv')
        shutil.copyfile(anat_file_fake,ant_path+'/'+mni2t1)
        shutil.copyfile(anat_file_fake,ant_path+'/'+t12mni)
        nii_cmd = 'jo -p RepetitionTime=0.72 SkullStripped=false '+ 'TaskName='+run[6:11]+' > '+fmriprep_path+'/'+nii_name+'.json'
        os.system(nii_cmd)
        c_json = 'jo -p RepetitionTime=0.72 SkullStripped=false '+'TaskName='+run[6:11]+' > '+ fmriprep_path+'/'+cii_name+'.json'
        os.system(c_json)
        cii_cmd = ('jo -p RepetitionTime=0.72 '+'grayordinates=91k '+ 'space="HCP grayordinates" ' +  
            'surface=fsLR '+ 'surface_density=32k '+'volume=MNI152NLin6Asym '+'TaskName='+run[6:11]+' > '+ 
        fmriprep_path+'/'+ cii_name+'.dtseries.json')
        os.system(cii_cmd)
        tsv_cmd = 'jo -p LR="1 2 3"' + ' > '+fmriprep_path +'/'+tsv_name+'.json'
        os.system(tsv_cmd)

you need to notice the **anat_file_fake ** just a fake file which dose not use in post-process, it is required by xcp-abcd for checking. Finally, we can use XCP-ABCD to process HCPD dataset:

slurm = '/HCPD/code/post_process.slurm'
target_path_or = '/HCPD/processing/fmriprep/'
output = '/HCPD/processing/out'
workdir = '/HCPD/processing/work'
sublist = os.listdir(target_path_or)[0:1]
for (subname,j) in zip(sublist,trange(len(sublist))):
    target_path = target_path_or
    fmriprep_path = os.path.join(target_path, subname)
    slurm_txt = ('#!/bin/bash'+'\n'+'#SBATCH --nodes=1'+'\n'+'#SBATCH -p lab_fat'+'\n'+
                            '#SBATCH --cpus-per-task=20'+'\n'+'#SBATCH -e '+subname +'.error'+'\n'+
                             '#SBATCH -o '+subname +'.out''\n'+'module load singularity')
    first_part_command = ('singularity run --cleanenv -B '+ fmriprep_path +':/fmriprep/'+subname+
            ' -B '+ output + ':/out'+
            ' -B /home/.cache:/home/xcp_abcd/.cache'+
            ' -B '+workdir+':/work')
    second_part_command = (' /app/xcp_abcd.sif /fmriprep /out participant -w /work --participant_label '+
                           subname + ' --cifti -p 36P --despike --lower-bpf 0.01 --upper-bpf 0.08 --smoothing 6 --verbose --resource-monitor')
    real_comand = first_part_command + second_part_command
    with open(slurm,"w") as file:
        file.write(slurm_txt+"\n"+real_comand +"\n")
    output_sub = os.path.join(output,'xcp_abcd','sub-'+subname,'func')
    if os.path.exists(output_sub)==False:
        #shutil.rmtree(os.path.join(output,'xcp_abcd','sub-'+subname))
        #shutil.rmtree(os.path.join(workdir,'xcpabcd_wf','single_bold_'+subname+'_wf'))
        !sbatch /HCPD/code/post_process.slurm

Step3: HCPD task fMRI activation regressor

Somtimes, we need to regress task effects from the BOLD timeseries for “pseudo-resting state” timeseriesis calculation, which can be performed in 3 steps:

1. Create a task event timing array. 
2. Convolve task events with a spm style hemodynamic response function (HRF)
3. Regress out the effects of task via a general linear model implemented with xcp_d.

1. Create a task event timing array.

In general, we have the design matrix of task event time, such as follow matrix, you can get it from the eprime or psychopy or whatever was used to play your task for subject.

Values in this colum should same the length of the TR as your fMRI data. Values in this column should equal either 0 or 1; each volume during which the task event was being tested should = 1, all others should = 0.

2. Convolve task events with a spm style hemodynamic response function (HRF)

And then, we can convolve above task time column with a spm HRF, a spm hrf would be like this:

from nipy.modalities.fmri.hemodynamic_models import spm_hrf
import matplotlib.pyplot as plt
trLength=0.72
spm_hrfTS = spm_hrf(trLength, oversampling=1)
plt.plot(np.arange(0,len(spm_hrfTS)), spm_hrfTS, color='red', marker='o', linestyle='dashed',linewidth=2, markersize=12)

you will notice that the hrf shape would be affected by length of TR. Now we can use below code to convolve the task time column with HRF:

def loadTaskTiming(timefile, trLength):
    stimMat = timefile
    # Convolve taskstim regressors based on SPM canonical HRF (likely period of task-induced activity)
    stimMat = timefile
    taskStims_HRF = np.zeros(stimMat.shape)
    spm_hrfTS = spm_hrf(trLength, oversampling=1)
    tmpconvolve = np.convolve(stimMat, spm_hrfTS)
    taskLength = len(stimMat)
    taskStims_HRF = tmpconvolve[:taskLength]  # Make sure to cut off at the end of the run
    taskRegressors = taskStims_HRF.copy()
    return taskRegressors
data = pd.read_csv('/GPFS/cuizaixu_lab_permanent/Public_Data/HCPD/code/taskfmri/CARIT_timedesign.csv',sep=',',header=None)#task time point as same as above picture
time_matrix=np.array(data.to_numpy()).flatten() 
data_regressors = loadTaskTiming(time_matrix.T, 0.8)# use a function to convolve the task time design matrix
regressors_table.to_csv('/GPFS/cuizaixu_lab_permanent/Public_Data/HCPD/processed_data/rsfMRI/fmriprep_task_no_MSM/AP/CARIT/sub-HCD0001305/func/sub-HCD0001305_task-CARIT_acq-AP_desc-custom_timeseries.tsv')#you should save the regressor arrary into the right subject BIDS folder(func) and names as BIDS style for convnient using through xcp_d

and you can visualize the time block design and the bold response according to this time block design by using such code:

plt.figure(figsize=(20, 6))
plt.plot(np.arange(0,len(data_regressors)), data_regressors, color='blue', marker='o', linestyle='dashed',linewidth=1, markersize=5)
plt.xticks([x for x in range(max(np.arange(0,len(data_regressors))) + 20) if x % 20 == 0]) 
plt.show
plt.plot(np.arange(0,len(time_matrix.T)), time_matrix.T, color='red', marker='o', linestyle='dashed',linewidth=1, markersize=5)
plt.xticks([x for x in range(max(np.arange(0,len(time_matrix.T))) + 20) if x % 20 == 0]) 
plt.show

3.Regress out the effects of task via a general linear model implemented with xcp_d.

Now, we can use the xcp_d to regress the task activation information from task bold data use follow scripts:

singularity exec --cleanenv -B /GPFS/cuizaixu_lab_permanent/Public_Data/HCPD/processed_data/rsfMRI/fmriprep_task_no_MSM/AP/CARIT/:/fmriprep  \
-B /GPFS/cuizaixu_lab_permanent/Public_Data/HCPD/processed_data/rsfMRI/xcp_abcd_task/AP/CARIT:/out \
-B /home/cuizaixu_lab/wuguowei/.cache:/home/xcp_abcd/.cache \ 
-B /GPFS/cuizaixu_lab_permanent/Public_Data/HCPD/processed_data/rsfMRI/work/AP/CARIT:/work \
$IMG xcp_abcd /fmriprep /out participant -w /work --cifti --participant-label HCD0001305 -p 36P -f 10 --despike --lower-bpf 0.01 \
--upper-bpf 0.08 --smoothing 6 --verbose --resource-monitor -c /fmriprep/sub-HCD0001305/func

you should note, the regressor tsv file was stored in /GPFS/cuizaixu_lab_permanent/Public_Data/HCPD/processed_data/rsfMRI/fmriprep_task_no_MSM/AP/CARIT/sub-HCD0001305/func and it has been mounting into the directory of xcp_d container: /fmriprep.