FreeSurfer v5.3 QA on CFN - PennBBL/conte GitHub Wiki

Reprocessing FS on CFN with v5.3

Motivation:

In moving CONTE data to CFN, the subjects were reprocessed on CFN using FreeSurfer v5.3. This is the same process and version that was run on monstrum.

Scripts Overview:

  • /data/joy/BBL/projects/conteReproc2017/freesurfer/QA.sh
    • also calls:
      • /data/joy/BBL/projects/conteReproc2017/freesurfer/aparc.stats.meanthickness.totalarea.sh
      • /data/joy/BBL/projects/conteReproc2017/freesurfer/cnr_euler_number_calculation.sh
      • /data/joy/BBL/projects/conteReproc2017/freesurfer/flag_outliers.R
      • /data/joy/BBL/projects/conteReproc2017/freesurfer/cnr_euler_qa.R
      • /data/joy/BBL/projects/conteReproc2017/freesurfer/sum_flags_auto_qa.R

Runs the quality assurance pipeline for FreeSurfer 5.3 on CFN and outputs several csv's with subject specific QA data. The code is broken down into seven main sections:

  • Create subcortical volume segmentation csv - aseg stats
  • Create mean QA data charts (thickness and surface area charts)
  • Create parcellation csv's - aparc stats
  • Create CNR and Euler Numbers csv's
  • Flag subjects based on whether they are an outlier (>2 SD) on at least one of the following measures:
  • Mean thickness
  • Total surface area
  • Cortical volume
  • Subcortical gray matter
  • Cortical White matter
  • CNR
  • Euler number
  • SNR
  • ROI- Raw cortical thickness
  • ROI- laterality thickness difference
  • Flag (gray/csf flag, gray/white flag, euler number flag, number outliers rois thickness flag, total outliers)
  • Create SNR csv
  • Create summary of automatic flags csv

Output:

The csv that contains the final QA is:

  • /data/joy/BBL/studies/conte/subjectData/freesurfer/stats/auto.qa.summary.flags.csv

The final include/exclude decision is based on a sum of the binary flags for the following measures:

  • meanthickness_outlier
  • totalarea_outlier
  • SubCortGrayVol_outlier
  • CortexVol_outlier
  • CorticalWhiteMatterVol_outlier
  • snr_outlier
  • noutliers.thickness.rois_outlier
  • noutliers.lat.thickness.rois_outlier
  • Gray/CSF outlier
  • Gray/White outlier
  • Euler number outlier
  • failed_freesurfer

If there is a 1 in any of the aforementioned flag columns for a subject, the subject should be flagged and excluded from FreeSurfer analyses, which is noted in the "FsFlag" column of the final QA output file.

All output from the QA pipeline:

  • Aseg stats
    • /data/joy/BBL/studies/conte/subjectData/freesurfer/stats/aseg.stats/aseg.stats.volume.csv
  • Mean thickness and total area
    • /data/joy/BBL/studies/conte/subjectData/freesurfer/stats/aparc.stats/bilateral.meanthickness.totalarea.csv
  • Left hemisphere thickness
    • /data/joy/BBL/studies/conte/subjectData/freesurfer/stats/aparc.stats/lh.aparc.stats.thickness.csv
  • Right hemisphere thickness
    • /data/joy/BBL/studies/conte/subjectData/freesurfer/stats/aparc.stats/rh.aparc.stats.thickness.csv
  • Left hemisphere volume
    • /data/joy/BBL/studies/conte/subjectData/freesurfer/stats/aparc.stats/lh.aparc.stats.volume.csv
  • Right hemisphere volume
    • /data/joy/BBL/studies/conte/subjectData/freesurfer/stats/aparc.stats/rh.aparc.stats.volume.csv
  • Left hemisphere area
    • /data/joy/BBL/studies/conte/subjectData/freesurfer/stats/aparc.stats/lh.aparc.stats.area.csv
  • Right hemisphere area
    • /data/joy/BBL/studies/conte/subjectData/freesurfer/stats/aparc.stats/rh.aparc.stats.area.csv
  • Euler number data
    • /data/joy/BBL/studies/conte/subjectData/freesurfer/stats/cnr/euler_number.csv
  • CNR data
    • /data/joy/BBL/studies/conte/subjectData/freesurfer/stats/cnr/cnr_buckner.csv
  • CNR csv (subject specific)
    • /data/joy/BBL/studies/conte/processedData/freesurfer/[bblid]/[datexscanid]/[bblid]_[datexscanid]_cnr.txt
  • Left hemisphere Euler number csv (subject specific)
    • /data/joy/BBL/studies/conte/processedData/freesurfer/[bblid]/[datexscanid]/[bblid]_[datexscanid]_lh_euler.txt
  • Right hemisphere Euler number csv (subject specific)
    • //data/joy/BBL/studies/conte/processedData/freesurfer/[bblid]/[datexscanid]/[bblid]_[datexscanid]_rh_euler.txt
  • Overall data (mean thickness, cnr, snr, total area, cortical and subcortical volume, and outliers)
    • /data/joy/BBL/studies/conte/subjectData/freesurfer/stats/all.flags.n[subjnum].csv
  • CNR and euler number flags csv
    • /data/joy/BBL/studies/conte/subjectData/freesurfer/stats/cnr_euler_flags_n[subjnum].csv
  • SNR csv
    • /data/joy/BBL/studies/conte/subjectData/freesurfer/stats/cnr/snr.txt
  • Summary csv of automatic QA flags
    • /data/joy/BBL/studies/conte/subjectData/freesurfer/stats/auto.qa.summary.flags.csv

QA Pipeline:

Wrapper Script: /data/joy/BBL/projects/conteReproc2017/freesurfer/QA.sh

Requires:

  • subject list in the form of a text file (slist) (list of bblid/datexscanid)
    Example:
    bblid/datexscanid

  • Full path to directory for output csv's to be output to
    export output_dir=/data/joy/BBL/studies/conte/subjectData/freesurfer/stats

  • FreeSurfer specific environment and subject specific variables to be set

      export SUBJECTS_DIR=/data/joy/BBL/studies/conte/processedData/freesurfer  
      export QA_TOOLS=/data/joy/BBL/applications/QAtools_v1.1/  
      export FREESURFER_HOME=/share/apps/freesurfer/5.3.0/  
      export PATH=$FREESURFER_HOME/bin/:$PATH  
    

Create subcortical volume segmentation csv - aseg stats

Script: /data/joy/BBL/projects/conteReproc2017/freesurfer/QA.sh

Steps:

  1. Checks to make sure that the aseg.stats csv does not exist, and if it doesn't then make the appropriate directory (/data/joy/BBL/studies/conte/subjectData/freesurfer/stats/aseg.stats)

     if [ ! -e "$output_dir/aseg.stats" ]; then    
       mkdir -p $output_dir/aseg.stats  
     fi    
    
  2. It then runs asegstats2table which takes each subject's subcortical stats file created by recon-all and outputs the data into a table in which each row is a subject and each column is a segmentation volume (in mm3). If a subject is missing their aseg file then that subject is skipped (using the --skip argument to asegstats2table).

    asegstats2table --subjectsfile=$slist -t $output_dir/aseg.stats/aseg.stats.volume.csv -m volume --skip

  3. The output table is /data/joy/BBL/studies/conte/subjectData/freesurfer/stats/aseg.stats/aseg.stats.volume.csv

Create mean QA data charts (thickness and surface area charts)

Script: /data/joy/BBL/projects/conteReproc2017/freesurfer/aparc.stats.meanthickness.totalarea.sh

Requires:

  • Arguments passed by /data/joy/BBL/projects/conteReproc2017/freesurfer/QA.sh
    • subject list (slist)
    • subject bblid specific directory (SUBJECTS_DIR)
    • full path of directory where csv should be output (OUTPUT_DIR)

Steps:

  1. Sets variables based on those passed by wrapper script

     export OUTPUT_DIR=$2      
     export SUBJECTS_DIR=$3    
     slist=$1    
    
  2. Checks to make sure that the aparc.stats csv does not exist, and if it doesn't then make the appropriate directory (/data/joy/BBL/studies/conte/subjectData/freesurfer/stats/aparc.stats)

     if [ ! -e $OUTPUT_DIR/aparc.stats ]; then    
         mkdir $OUTPUT_DIR/aparc.stats     
     fi    
    
  3. Creates and outputs headers to the aggregate file /data/joy/BBL/studies/conte/subjectData/freesurfer/stats/aparc.stats/bilateral.meanthickness.totalarea.csv

    echo bblid,scanid,rh.meanthickness,rh.totalarea,lh.meanthickness,lh.totalarea > $OUTPUT_DIR/aparc.stats/bilateral.meanthickness.totalarea.csv

  4. Loops through subjects and pulls bblid and scanid from slist

     for i in $(cat $slist); do    
        bblid=$(echo $i | cut -d"/" -f1)    
        scanid=$(echo $i | cut -d"/" -f2)    
    
  5. Checks for the subject's right and left aparc.stats files

  6. If those files don't exist it prints "no rh/lh.aparc.stats file for bblid/datexscanid"

  7. If the files do exist, it greps for "MeanThickness" and "SurfArea" in each subject's aparc stats file created by recon-all

     (Steps 5-7)    
                                               
     ### RH MEAN THICKNESS AND TOTAL AREA ###  
     ########################################  
             subdir=$SUBJECTS_DIR/$i    
             if [ ! -e "$subdir/stats/rh.aparc.stats" ]; then    
               echo "no rh.aparc.stats file for" $i    
               rmt="NA"    
               rta="NA"    
             else    
               string=`grep MeanThickness,  $subdir/stats/rh.aparc.stats`     
               rmt=`echo $string | cut -d "," -f 4`    
               string=`grep SurfArea,  $subdir/stats/rh.aparc.stats`     
               rta=`echo $string | cut -d "," -f 4`    
             fi    
                                                        
     ### LH MEAN THICKNESS AND TOTAL AREA ###    
     ########################################   
             if [ ! -e "$subdir/stats/lh.aparc.stats" ]; then    
               echo "no lh.aparc.stats file for" $i    
               lmt="NA"    
               lta="NA"    
             else    
               string=`grep MeanThickness,  $subdir/stats/lh.aparc.stats`     
               lmt=`echo $string | cut -d "," -f 4`    
               string=`grep SurfArea,  $subdir/stats/lh.aparc.stats`     
               lta=`echo $string | cut -d "," -f 4`    
             fi    
    
  8. It outputs this data into the table /data/joy/BBL/studies/conte/subjectData/freesurfer/stats/aparc.stats/bilateral.meanthickness.totalarea.csv

    echo $bblid,$scanid,$rmt,$rta,$lmt,$lta >> $OUTPUT_DIR/aparc.stats/bilateral.meanthickness.totalarea.csv

Create parcellation csv's - aparc stats

Script: /data/joy/BBL/projects/conteReproc2017/freesurfer/QA.sh

Steps:

  1. Run aparcstats2table on each hemisphere/measure combination (left and right hemispheres for volume, thickness, and area). Subject's without aparc stats files are skipped using the --skip flag

  2. Outputs this parcelation data into 6 tables in /data/joy/BBL/studies/conte/subjectData/freesurfer/stats/aparc.stats/ with the naming convention of [hemi lh/rh].aparc.stats.[measurement volume/thickness/area].csv

     (Steps 1-2)    
                                       
     aparcstats2table --hemi lh --subjectsfile=$slist -t $output_dir/aparc.stats/lh.aparc.stats.thickness.csv -m thickness --skip  
     aparcstats2table --hemi rh --subjectsfile=$slist -t $output_dir/aparc.stats/rh.aparc.stats.thickness.csv -m thickness --skip  
     aparcstats2table --hemi lh --subjectsfile=$slist -t $output_dir/aparc.stats/lh.aparc.stats.volume.csv -m volume --skip  
     aparcstats2table --hemi rh --subjectsfile=$slist -t $output_dir/aparc.stats/rh.aparc.stats.volume.csv -m volume --skip  
     aparcstats2table --hemi lh --subjectsfile=$slist -t $output_dir/aparc.stats/lh.aparc.stats.area.csv --skip  
     aparcstats2table --hemi rh --subjectsfile=$slist -t $output_dir/aparc.stats/rh.aparc.stats.area.csv --skip          
    

Create CNR and Euler Numbers csv's

Script: /data/joy/BBL/projects/conteReproc2017/freesurfer/cnr_euler_number_calculation.sh

Requires:

  • Arguments passed by /data/joy/BBL/projects/conteReproc2017/freesurfer/QA.sh
    • subject list (slist) (/data/joy/BBL/studies/conte/subjectData/design3FullSubjectList.txt)
    • subject bblid specific directory (SUBJECTS_DIR)
    • full path to directory where files should be output (OUTPUT_DIR)
  • FreeSurfer specific environments be set (FREESURFER_HOME and QA_TOOLS)

Steps:

  1. Sets variables based on those passed by wrapper script

      #set freesurfer specific variables  
      export SUBJECTS_DIR=$2  
      export OUTPUT_DIR=$3  
      export QA_TOOLS=/data/joy/BBL/applications/QAtools_v1.1/  
      export FREESURFER_HOME=/share/apps/freesurfer/5.3.0/  
      export PATH=$FREESURFER_HOME/bin/:$PATH  
                 
      #set subject list, directory to output aggregated files to, and the filenames of those aggregate files  
      slist=$1  
      outdir=$OUTPUT_DIR/cnr  
                 
      file=$outdir/cnr_buckner.csv  
      euler_file=$outdir/euler_number.csv  
    
  2. Creates and outputs headers to the aggregate files /data/joy/BBL/studies/conte/subjectData/freesurfer/stats/cnr/cnr_buckner.csv and /data/joy/BBL/studies/conte/subjectData/freesurfer/stats/cnr/euler_number.csv

      echo "bblid,scanid,total,graycsflh,graycsfrh,graywhitelh,graywhiterh" > $file      
      echo "bblid,scanid,left_euler,right_euler" > $euler_file    
    
  3. Loops through subjects and pulls bblid and scanid from slist and creates variables for the surf and mri directories

     for i in $(cat $slist);do    
       bblid=$(echo $i | cut -d"/" -f1)    
       scanid=$(echo $i | cut -d"/" -f2)    
       echo "working on subject" $i    
       surf=`ls -d $SUBJECTS_DIR/$i/surf`      
       mri=`ls -d $SUBJECTS_DIR/$i/mri`     
    
  4. Runs mri_cnr on the orig.mgz file to calculate cnr and outputs all measures into their subject specific CNR file (/data/joy/BBL/studies/conte/processedData/freesurfer/[bblid]/[datexscanid]/[bblid]_[datexscanid]_cnr.txt)

    mri_cnr $surf $mri/orig.mgz > $SUBJECTS_DIR/$i/stats/$bblid"_"$scanid"_cnr.txt"

  5. Create variables for total cnr, gray/csf for left and right hemispheres and gray/white for left and right hemispheres

    #create variables for total cnr, gray/csf for left and right hemispheres and gray/white for left and right hemispheres (total and cnr variables are grepping the information from the subject specific file, the variables then need to be cut in order to get just the number)       
                                   
    total=`grep "total CNR" $SUBJECTS_DIR/$i/stats/$bblid"_"$scanid"_cnr.txt"`    
    total2=`echo $total |cut -f 4 -d " "`    
    cnr=`grep "gray/white CNR" $SUBJECTS_DIR/$i/stats/$bblid"_"$scanid"_cnr.txt"`    
    graycsflh=`echo $cnr | cut -d "," -f 2 | cut -d "=" -f 2 | cut -d " " -f 2`    
    graycsfrh=`echo $cnr | cut -d "," -f 3 | cut -d "=" -f 2 | cut -d " " -f 2`    
    graywhitelh=`echo $cnr | cut -d "," -f 1 | cut -d "=" -f 2 | cut -d " " -f 2`    
    graywhiterh=`echo $cnr | cut -d "," -f 2 | cut -d "=" -f 3 | cut -d " " -f 2`      
    
  6. Outputs CNR data into an appended table (/data/joy/BBL/studies/conte/subjectData/freesurfer/stats/cnr/cnr_buckner.csv)

    echo $bblid,$scanid,$total2,$graycsflh,$graycsfrh,$graywhitelh,$graywhiterh >> $file

  7. Runs mris_euler on the lh.orig.nofix and rh.orig.nofix files to calculate euler numbers for left and right hemispheres and outputs all measures into their subject specific euler files (/data/joy/BBL/studies/conte/processedData/freesurfer/[bblid]/[datexscanid]/[bblid]_[datexscanid]lh_euler.txt and /data/joy/BBL/studies/conte/processedData/freesurfer/[bblid]/[datexscanid]/[bblid][datexscanid]_rh_euler.txt)

    script -c "mris_euler_number $surf/lh.orig.nofix" $SUBJECTS_DIR/$i/stats/$bblid"_"$scanid"_lh_euler.txt"    
    script -c "mris_euler_number $surf/rh.orig.nofix" $SUBJECTS_DIR/$i/stats/$bblid"_"$scanid"_rh_euler.txt"    
    
  8. Creates variables for left and right euler numbers by grepping from the lh/rh_euler.txt files

    left=`grep ">" $SUBJECTS_DIR/$i/stats/$bblid"_"$scanid"_lh_euler.txt" | cut -d ">" -f 1 | cut -d "=" -f 4 | cut -d " " -f 2`    
    right=`grep ">" $SUBJECTS_DIR/$i/stats/$bblid"_"$scanid"_rh_euler.txt" | cut -d ">" -f 1 | cut -d "=" -f 4 | cut -d " " -f 2`  
    
  9. Outputs Euler number data into an appended table (/data/joy/BBL/studies/conte/subjectData/freesurfer/stats/cnr/euler_number.csv)

    echo $bblid,$scanid,$left,$right >> $euler_file

Notes:

  • mris_euler must be run with script -c preceeding it so that the output can be put into a file
  • The methods for this script were obtained from Chalavi, et al, BMC medical Imaging, 2012. doi:10.1186/1471-2342-12-27. http://www.biomedcentral.com/1471-2342/12/27

Flag subjects based on whether they are an outlier (>2sd) in any of the following fields

Script: /data/joy/BBL/projects/conteReproc2017/freesurfer/flag_outliers.R

Requires:

  • Arguments passed by /data/joy/BBL/projects/conteReproc2017/freesurfer/QA.sh
    • output directory (output_dir)

Steps:

  1. Create directory variables, standard deviation threshold variable, and read in arguments passed by QA.sh script

      ### ARGS ###  
      ############  
      subjects.dir<-commandArgs(TRUE)[1]  
      sdthresh<-2  
                  
      ### DIRS ###  
      ############  
      stats.dir<-file.path(subjects.dir)  
      aparc.dir<-file.path(stats.dir, 'aparc.stats')  
      aseg.dir<-file.path(stats.dir, 'aseg.stats')  
      area.dir<-file.path(stats.dir, 'aparc.stats/area')  
      curvature.dir<-file.path(stats.dir, 'aparc.stats/curvature')  
    
  2. Create variables for files

     mean.file<-file.path(aparc.dir, 'bilateral.meanthickness.totalarea.csv')  
     cnr.file<-file.path(stats.dir, 'cnr/cnr_buckner.csv')  
     snr.file<-file.path(stats.dir, 'cnr/snr.txt')  
     aseg.volume.file<-file.path(aseg.dir, 'aseg.stats.volume.csv')  
     lh.thickness.file<-file.path(aparc.dir, 'lh.aparc.stats.thickness.csv')  
     rh.thickness.file<-file.path(aparc.dir, 'rh.aparc.stats.thickness.csv')  
    
  3. Read in mean, cnr, snr, and aseg data and merge into a file called full

    mean.data<-read.csv(mean.file, strip.white=TRUE)  
    mean.data$meanthickness<-rowMeans(mean.data[, c('rh.meanthickness', 'lh.meanthickness')])  
    mean.data$totalarea<-rowSums(mean.data[, c('rh.totalarea', 'lh.totalarea')])  
    mean.data<-mean.data[,!(grepl('lh', names(mean.data)) | grepl('rh', names(mean.data)))]  
    cnr.data<-read.csv(cnr.file, strip.white=TRUE, header=FALSE)  
    full<-mean.data  
    full$cnr<- cnr.data$V3[match(full$scanid,cnr.data$V2)]  
    full$cnr<- as.numeric(as.character(full$cnr))  
               
    snr.data<-try(read.table(snr.file, strip.white=TRUE, header=FALSE, col.names=c('subject', 'snr')))  
    if(is.data.frame(snr.data)){  
        snr.data[,c('bblid', 'scanid')]<-apply(do.call(rbind, strsplit(as.character(snr.data$subject), split="/")), 2, as.character)  
    	snr.data<-snr.data[,-1]  
        full$snr<- snr.data$snr[match(full$scanid,snr.data$scanid)]  
    }  
       
    aseg.volume.data<-read.table(aseg.volume.file, strip.white=TRUE, header=TRUE)  
    aseg.volume.data[,c('bblid', 'scanid')]<-apply(do.call(rbind, strsplit(as.character(aseg.volume.data$Measure.volume), split="/")), 2, as.character)  
    aseg.volume.data<-aseg.volume.data[,c("bblid", "scanid", "SubCortGrayVol", "CortexVol", "CorticalWhiteMatterVol")]  
    full$SubCortGrayVol<- aseg.volume.data$SubCortGrayVol[match(full$scanid,aseg.volume.data$scanid)]  
    full$CortexVol<- aseg.volume.data$CortexVol[match(full$scanid,aseg.volume.data$scanid)]  
    full$CorticalWhiteMatterVol<- aseg.volume.data$CorticalWhiteMatterVol[match(full$scanid,aseg.volume.data$scanid)]  
    
  4. Read in thickness data and merge left and right thickness data into one file

     thickness.data<-read.table(lh.thickness.file, header=TRUE, strip.white=TRUE)  
     rh.thickness.data<-read.table(rh.thickness.file, header=TRUE, strip.white=TRUE)  
     thickness.data[,c('bblid', 'scanid')]<-apply(do.call(rbind, strsplit(as.character(thickness.data$lh.aparc.thickness), split="/")), 2, as.character)  
     rh.thickness.data[,c('bblid', 'scanid')]<-apply(do.call(rbind, strsplit(as.character(rh.thickness.data$rh.aparc.thickness), split="/")), 2, as.character)  
     rh.thickness.data<-rh.thickness.data[,-1]  
     thickness.data<-thickness.data[,-1]  
     thickness.data<-merge(thickness.data, rh.thickness.data, all=TRUE,by=c("scanid","bblid"))  
     rm('rh.thickness.data')  
    
  5. Create data to calculate the standard deviation cut-offs from and get names of left and right hemisphere ROIs

     lh.names<-grep('lh', names(thickness.data), value=TRUE)  
     rh.names<-sub('lh', 'rh', lh.names)  
        
     tmp_lh<-data.frame(matrix(NA, nrow=(nrow(thickness.data)),ncol=length(lh.names)+1))  
     tmp_lh[1]<- thickness.data$scanid  
     colnames(tmp_lh)[1]<-"scanid"  
     colnames(tmp_lh)[2:ncol(tmp_lh)]<- lh.names  
        
     tmp_rh<-data.frame(matrix(NA, nrow=(nrow(thickness.data)),ncol=length(rh.names)+1))  
     tmp_rh[1]<- thickness.data$scanid  
     colnames(tmp_rh)[1]<-"scanid"  
     colnames(tmp_rh)[2:ncol(tmp_rh)]<- rh.names  
    
  6. Calculate the 2SD cut off for each left and right ROI thickness and calculate if each subject in the full dataset is a SD outlier based on the threshold you set (sdthresh)

     for (i in lh.names){  
          
      sd_thresh<-(sdthresh*(sd(subset.data[,i])))  
        
      sd_above_value<- mean(subset.data[,i])+sd_thresh    
      sd_below_value<- mean(subset.data[,i])-sd_thresh    
                                                                                
      tmp_lh[i]<- "0"    
      tmp_lh[i][thickness.data[i]>sd_above_value]<- "1"  
      tmp_lh[i][thickness.data[i]<sd_below_value]<- "1"  
           
     }    
     for (i in rh.names){  
       
      sd_thresh<-(sdthresh*(sd(subset.data[,i])))  
       
      sd_above_value<- mean(subset.data[,i])+sd_thresh  
      sd_below_value<- mean(subset.data[,i])-sd_thresh  
        
      tmp_rh[i]<- "0"  
      tmp_rh[i][thickness.data[i]>sd_above_value]<- "1"  
      tmp_rh[i][thickness.data[i]<sd_below_value]<- "1"  
        
     }  
       
     tmp<- cbind(tmp_lh,tmp_rh[2:ncol(tmp_rh)])  
     tmp2<-data.frame(sapply(tmp[2:ncol(tmp)], function(x) as.numeric(as.character(x))))  
     tmp2<- cbind(tmp[1],tmp2)  
                       
     #get number of thickness ROIs (sum of 1's just calculated for each subject)  
     #count number of outlying regions for each subject  
     thickness.data$noutliers.thickness.rois<-rowSums(tmp2[2:ncol(tmp2)])  
    
  7. Calculate the 2SD cut off for each ROI laterality and calculate if each subject in the full dataset is a SD outlier based on the threshold you set (sdthresh)

      tmp_laterality<-data.frame(matrix(NA, nrow=(nrow(thickness.data)),ncol=length(lh.names)+1))  
      tmp_laterality[1]<- thickness.data$scanid  
      colnames(tmp_laterality)[1]<-"scanid"  
      colnames(tmp_laterality)[2:ncol(tmp_laterality)]<- lh.names  
        
      for (z in seq(1, length(lh.names))){  
         i <- lh.names[z]  
             
         r_name<- paste("rh",substring(i,4,10000),sep="_")  
           
         sd_above_value<-(mean((subset.data[,i] - subset.data[,r_name])/(subset.data[,i] + subset.data[,r_name]))+(sdthresh*(sd((subset.data[,i] - subset.data[,r_name])/(subset.data[,i] + subset.data[,r_name])))))  
         sd_below_value<-(mean((subset.data[,i] - subset.data[,r_name])/(subset.data[,i] + subset.data[,r_name]))-(sdthresh*(sd((subset.data[,i] - subset.data[,r_name])/(subset.data[,i] + subset.data[,r_name])))))  
         
         tmp_laterality[,z+1]<- "0"  
         tmp_laterality[,z+1][which((thickness.data[,i] - thickness.data[,r_name])/(thickness.data[,i] + thickness.data[,r_name])>sd_above_value)]<- "1"  
         tmp_laterality[,z+1][which((thickness.data[,i] - thickness.data[,r_name])/(thickness.data[,i] + thickness.data[,r_name])<sd_below_value)]<- "1"  
         
         tmp_laterality[,z+1]<- as.numeric(tmp_laterality[,z+1])  
         
      }  
                              
       thickness.data$noutliers.lat.thickness.rois<-rowSums(tmp_laterality[2:ncol(tmp_laterality)])  
    
  8. Flag mean data (mean thickness, total surface area, Cortical volume, Subcortical gray matter, Cortical White matter) based on > 2 SD

      thickness.data.mean<-full  
                    
      mean_names<- c('meanthickness', 'totalarea', "SubCortGrayVol", "CortexVol", "CorticalWhiteMatterVol", "cnr","snr")  
         
      tmp_mean<-data.frame(matrix(NA, nrow=(nrow(full)),ncol=length(mean_names)+1))  
      tmp_mean[1]<- full$scanid  
      colnames(tmp_mean)[1]<-"scanid"   
      colnames(tmp_mean)[2:ncol(tmp_mean)]<- mean_names  
        
      #then calculate the 2SD cut off for each mean and calculate if each subject in the full dataset is a SD outlier based on the threshold you set (sdthresh)   
           
      for (i in mean_names){  
            
         sd_thresh<-(sdthresh*(sd(thickness.data.mean[,i])))  
             
         sd_above_value<- mean(thickness.data.mean[,i])+sd_thresh  
         sd_below_value<- mean(thickness.data.mean[,i])-sd_thresh  
                     
         tmp_mean[i]<- "0"  
         tmp_mean[i][full[i]>sd_above_value]<- "1"  
         tmp_mean[i][full[i]<sd_below_value]<- "1"  
                     
      }  
      colnames(tmp_mean)[2:ncol(tmp_mean)]<- c(paste(mean_names, 'outlier', sep="_"))  
    
  9. Merge the thickness outliers into the full data file

    thickness.data<-thickness.data[,c('bblid', 'scanid', 'noutliers.thickness.rois', 'noutliers.lat.thickness.rois')]  
    full$noutliers.thickness.rois<- thickness.data$noutliers.thickness.rois[match(full$scanid,thickness.data$scanid)]  
    full$noutliers.lat.thickness.rois<- thickness.data$noutliers.lat.thickness.rois[match(full$scanid,thickness.data$scanid)]  
    full$meanthickness_outlier<- tmp_mean$meanthickness_outlier[match(full$scanid,tmp_mean$scanid)]  
    full$totalarea_outlier<- tmp_mean$totalarea_outlier[match(full$scanid,tmp_mean$scanid)]  
    full$SubCortGrayVol_outlier<- tmp_mean$SubCortGrayVol_outlier[match(full$scanid,tmp_mean$scanid)]  
    full$CortexVol_outlier<- tmp_mean$CortexVol_outlier[match(full$scanid,tmp_mean$scanid)]   
    full$CorticalWhiteMatterVol_outlier<- tmp_mean$CorticalWhiteMatterVol_outlier[match(full$scanid,tmp_mean$scanid)]  
    full$cnr_outlier<- tmp_mean$cnr_outlier[match(full$scanid,tmp_mean$scanid)]  
    full$snr_outlier<- tmp_mean$snr_outlier[match(full$scanid,tmp_mean$scanid)]  
    
  10. Get flags

`flags<-names(full)[which(!names(full) %in% c('bblid', 'scanid'))]`  
  1. Write the outlier file to /data/joy/BBL/studies/conte/subjectData/freesurfer/stats/all.flags.n[subjnum].csv

     noutliers.flags<-grep('noutlier', names(full), value=T)
     full[,paste(noutliers.flags, 'outlier', sep="_")]<-as.numeric(scale(full[,noutliers.flags])>sdthresh)
     write.csv(full, file.path(stats.dir, paste('all.flags.n' , nrow(full),'.csv', sep='')), quote=FALSE, row.names=FALSE)
     cat('wrote file to', file.path(stats.dir, paste('all.flags.n' , nrow(full),'.csv', sep='')), '\n')
    

Notes:

  • For the ROI based measures we compute number of roi outliers for each subject then compute outliers across subjects for number of ROIs flagged.
  • The measures that are flagged are based on comments here: http://saturn/wiki/index.php/QA

Flag (gray/csf flag, gray/white flag, euler number flag, number outliers rois thickness flag, total outliers)

Script: /data/joy/BBL/projects/conteReproc2017/freesurfer/cnr_euler_qa.R

Requires:

  • Arguments passed by /data/joy/BBL/projects/conteReproc2017/freesurfer/QA.sh
    • full path to directory where csv's should be output (output_dir)

Steps:

  1. Sets variables based on those passed by wrapper script and read in cnr and euler number csv's

     output.dir<-commandArgs(TRUE)[1]  
     cnr_data<- read.csv(paste(output.dir,"/cnr/cnr_buckner.csv",sep=""))  
     euler_data<- read.csv(paste(output.dir,"/cnr/euler_number.csv",sep=""))  
    
  2. Merge the files together into one data file and exclude the subjects that failed freesurfer processing

    #merge the files together by datexscanid  
    data<- cnr_data  
    data$left_euler<- euler_data$left_euler[match(data$scanid,euler_data$scanid)]  
    data$right_euler<- euler_data$right_euler[match(data$scanid,euler_data$scanid)]  
    
  3. create a data frame with mean values for gray/csf cnr, gray/white cnr, and euler numbers across hemispheres

    #create a dataframe which will get the flags based on euler and cnr calculations
    flags<- data

    #get mean values for gray/csf cnr, gray/white cnr and euler numbers (average across hemispheres)
    flags$mean_euler<-(flags$left_euler+flags$right_euler)/2
    flags$mean_graycsf_cnr<- (flags$graycsflh+flags$graycsfrh)/2
    flags$mean_graywhite_cnr<- (flags$graywhitelh+flags$graywhiterh)/2

    #subset data frame to only IDs and averages
    flags<- flags[,c(1,2,10:12)]

  4. create variables to be used as cutoff values for cnr and euler number flagging qa (mean - 2 SD)

    graycsf_cutoff<- mean(flags$mean_graycsf_cnr-(2*sd(flags$mean_graycsf_cnr,na.rm=T)),na.rm=T)  
    graywhite_cutoff<- mean(flags$mean_graywhite_cnr-(2*sd(flags$mean_graywhite_cnr,na.rm=T)),na.rm=T)  
    euler_cutoff<- mean(flags$mean_euler-(2*sd(flags$mean_euler,na.rm=T)),na.rm=T)  
    
  5. flag subjects for each measure based on mean - 2 SD cutoff (binary flagged [1=yes, 0=no] column)

    #create a binary flag column (1=yes, 0=no) for average cnr and euler numbers (<2 SD =1, >2 SD=0)
    flags$graycsf_flag<- NA
    flags$graywhite_flag<- NA
    flags$euler_flag<- NA

    #remove any subjects with no data
    flags<- flags[! is.na(flags$mean_euler),]

    for (i in 1:nrow(flags)){
    if (flags$mean_graycsf_cnr[i]<= graycsf_cutoff){
    flags$graycsf_flag[i]<- 1
    } else if (flags$mean_graycsf_cnr[i]>graycsf_cutoff){
    flags$graycsf_flag[i]<- 0
    }

    if (flags$mean_graywhite_cnr[i]<=graywhite_cutoff){
    flags$graywhite_flag[i]<- 1
    } else if (flags$mean_graywhite_cnr[i]>graywhite_cutoff){
    flags$graywhite_flag[i]<- 0
    }

    if (flags$mean_euler[i]<=euler_cutoff){
    flags$euler_flag[i]<- 1
    } else if (flags$mean_euler[i]>euler_cutoff){
    flags$euler_flag[i]<- 0
    }

    } # for (i in 1:nrow(flags)){

  6. create a total outliers column in the data (which sums the number of flags across measures for each subject) and a binary flagged column which gets a 1 for a subject if there is at least 1 flagged measure for that subject and a 0 if there were no outliers in any measure for that subject

    #subset data frame to only IDs and flags
    flags<- flags[,c(1,2,6:8)]

    #create a total outliers column which gets the number of total outliers and a column which gets a binary flag 1=yes, 0=no
    flags$total_outliers<- NA
    x<- cbind(flags$graycsf_flag,flags$graywhite_flag,flags$euler_flag)
    flags$total_outliers<- apply(x,1,sum)
    flags$flagged<- flags$total_outliers
    flags$flagged[flags$flagged>0]<- 1

    nsubj<- nrow(flags)

  7. write out flagged csv

write.csv(flags,paste("/data/joy/BBL/studies/conte/subjectData/freesurfer/stats/cnr_euler_flags_n",nsubj,".csv",sep=""))

Create SNR csv

Script: /data/joy/BBL/projects/conteReproc2017/freesurfer/QA.sh

Steps:

  1. loop through subjects (slist) and run QA_TOOLS recon checker, not checking aseg data, output order of files, or logfile status for outliers or errors and without taking snapshots and output this information into a temporary file

    for i in $(cat $slist); do    
    $QA_TOOLS/recon_checker -s $i -nocheck-aseg -nocheck-status -nocheck-outputFOF -no-snaps  
    done > temp.txt    
    
  2. grep for subject id (bblid/datexscanid) and output into temp2.txt

    grep "wm-anat-snr results" temp.txt | cut -d"(" -f2 | cut -d")" -f1 >temp2.txt

  3. loop through each bblid in temp.txt, and pull the snr value from temp.txt and output to temp3.txt

    for i in $(cat -n temp.txt | grep "wm-anat-snr results" | cut -f1); do    
    echo $(sed -n "$(echo $i +2 | bc)p" temp.txt | cut -f1)    
    done > temp3.txt   
    
  4. append each subject's snr data into the snr.txt file (/data/joy/BBL/studies/conte/subjectData/freesurfer/stats/cnr/snr.txt)

paste temp2.txt temp3.txt > $output_dir/cnr/snr.txt

  1. remove temporary files

rm -f temp*.txt

Notes:

Create automatic QA flag summary csv

Script: /data/joy/BBL/projects/conteReproc2017/freesurfer/QA.sh

Steps:

  1. create variables for the automatic qa output of flags and cnr/euler number output

    auto_flag=$output_dir/all.flags.n103.csv  
    euler_flag=$output_dir/cnr_euler_flags_n102.csv    
    

Script: /data/joy/BBL/projects/conteReproc2017/freesurfer/sum_flags_auto_qa.R

Requires:

  • Arguments passed by /data/joy/BBL/projects/conteReproc2017/freesurfer/QA.sh
    • full path to directory where csv's should be output (output_dir)
    • csv of automatic flags from QA (auto_flag)
    • csv of atuomatic cnr/euler number flags from QA (euler_flag)

Steps:

  1. Sets variables based on those passed by wrapper script and read in automatic qa csv's

      output.dir<-commandArgs(TRUE)[1]  
      auto_flag<-read.csv(commandArgs(TRUE)[2])  
      euler_flag<-read.csv(commandArgs(TRUE)[3])  
    
  2. Creates data sheet for output of all columns which are included in flagging for review, and merge in the euler and cnr data

      flags<- auto_flag[,c(1:2,12:16,18:20)]  
            
      #merge in the euler number columns (gray/white cnr, gray/csf cnr, euler number)  
      flags$graycsf_flag<- euler_flag$graycsf_flag[match(flags$scanid,euler_flag$scanid)]  
      flags$graywhite_flag<- euler_flag$graywhite_flag[match(flags$scanid,euler_flag$scanid)]  
      flags$euler_flag<- euler_flag$euler_flag[match(flags$scanid,euler_flag$scanid)]  
    
  3. Creates summary column which gets a sum of the flags in the csv (ones used for qa flagging) to get total outliers

flags$total_outliers<- rowSums(flags[,3:13])

  1. Creates a column that gets a binary 1 (flagged) or 0 (not flagged) if the subject is flagged or not, based on if they have one or more total outlier defined above

     flags$fsFlag<- "NA"  
     flags$fsFlag[which(flags$total_outliers==0)]<- "0"  
     flags$fsFlag[which(flags$total_outliers>0)]<- "1"  
    
  2. Output the summary qa file

     write.csv(flags, file.path(output.dir, "auto.qa.summary.flags.csv"), quote=FALSE, row.names=FALSE)  
     cat('wrote file to', file.path(output.dir, "auto.qa.summary.flags.csv"), '\n')  
    
⚠️ **GitHub.com Fallback** ⚠️