FreeSurfer v5.3 QA on CFN - PennBBL/conte GitHub Wiki
In moving CONTE data to CFN, the subjects were reprocessed on CFN using FreeSurfer v5.3. This is the same process and version that was run on monstrum.
- /data/joy/BBL/projects/conteReproc2017/freesurfer/QA.sh
- also calls:
- /data/joy/BBL/projects/conteReproc2017/freesurfer/aparc.stats.meanthickness.totalarea.sh
- /data/joy/BBL/projects/conteReproc2017/freesurfer/cnr_euler_number_calculation.sh
- /data/joy/BBL/projects/conteReproc2017/freesurfer/flag_outliers.R
- /data/joy/BBL/projects/conteReproc2017/freesurfer/cnr_euler_qa.R
- /data/joy/BBL/projects/conteReproc2017/freesurfer/sum_flags_auto_qa.R
- also calls:
Runs the quality assurance pipeline for FreeSurfer 5.3 on CFN and outputs several csv's with subject specific QA data. The code is broken down into seven main sections:
- Create subcortical volume segmentation csv - aseg stats
- Create mean QA data charts (thickness and surface area charts)
- Create parcellation csv's - aparc stats
- Create CNR and Euler Numbers csv's
- Flag subjects based on whether they are an outlier (>2 SD) on at least one of the following measures:
- Mean thickness
- Total surface area
- Cortical volume
- Subcortical gray matter
- Cortical White matter
- CNR
- Euler number
- SNR
- ROI- Raw cortical thickness
- ROI- laterality thickness difference
- Flag (gray/csf flag, gray/white flag, euler number flag, number outliers rois thickness flag, total outliers)
- Create SNR csv
- Create summary of automatic flags csv
The csv that contains the final QA is:
- /data/joy/BBL/studies/conte/subjectData/freesurfer/stats/auto.qa.summary.flags.csv
The final include/exclude decision is based on a sum of the binary flags for the following measures:
- meanthickness_outlier
- totalarea_outlier
- SubCortGrayVol_outlier
- CortexVol_outlier
- CorticalWhiteMatterVol_outlier
- snr_outlier
- noutliers.thickness.rois_outlier
- noutliers.lat.thickness.rois_outlier
- Gray/CSF outlier
- Gray/White outlier
- Euler number outlier
- failed_freesurfer
If there is a 1 in any of the aforementioned flag columns for a subject, the subject should be flagged and excluded from FreeSurfer analyses, which is noted in the "FsFlag" column of the final QA output file.
All output from the QA pipeline:
- Aseg stats
- /data/joy/BBL/studies/conte/subjectData/freesurfer/stats/aseg.stats/aseg.stats.volume.csv
- Mean thickness and total area
- /data/joy/BBL/studies/conte/subjectData/freesurfer/stats/aparc.stats/bilateral.meanthickness.totalarea.csv
- Left hemisphere thickness
- /data/joy/BBL/studies/conte/subjectData/freesurfer/stats/aparc.stats/lh.aparc.stats.thickness.csv
- Right hemisphere thickness
- /data/joy/BBL/studies/conte/subjectData/freesurfer/stats/aparc.stats/rh.aparc.stats.thickness.csv
- Left hemisphere volume
- /data/joy/BBL/studies/conte/subjectData/freesurfer/stats/aparc.stats/lh.aparc.stats.volume.csv
- Right hemisphere volume
- /data/joy/BBL/studies/conte/subjectData/freesurfer/stats/aparc.stats/rh.aparc.stats.volume.csv
- Left hemisphere area
- /data/joy/BBL/studies/conte/subjectData/freesurfer/stats/aparc.stats/lh.aparc.stats.area.csv
- Right hemisphere area
- /data/joy/BBL/studies/conte/subjectData/freesurfer/stats/aparc.stats/rh.aparc.stats.area.csv
- Euler number data
- /data/joy/BBL/studies/conte/subjectData/freesurfer/stats/cnr/euler_number.csv
- CNR data
- /data/joy/BBL/studies/conte/subjectData/freesurfer/stats/cnr/cnr_buckner.csv
- CNR csv (subject specific)
- /data/joy/BBL/studies/conte/processedData/freesurfer/[bblid]/[datexscanid]/[bblid]_[datexscanid]_cnr.txt
- Left hemisphere Euler number csv (subject specific)
- /data/joy/BBL/studies/conte/processedData/freesurfer/[bblid]/[datexscanid]/[bblid]_[datexscanid]_lh_euler.txt
- Right hemisphere Euler number csv (subject specific)
- //data/joy/BBL/studies/conte/processedData/freesurfer/[bblid]/[datexscanid]/[bblid]_[datexscanid]_rh_euler.txt
- Overall data (mean thickness, cnr, snr, total area, cortical and subcortical volume, and outliers)
- /data/joy/BBL/studies/conte/subjectData/freesurfer/stats/all.flags.n[subjnum].csv
- CNR and euler number flags csv
- /data/joy/BBL/studies/conte/subjectData/freesurfer/stats/cnr_euler_flags_n[subjnum].csv
- SNR csv
- /data/joy/BBL/studies/conte/subjectData/freesurfer/stats/cnr/snr.txt
- Summary csv of automatic QA flags
- /data/joy/BBL/studies/conte/subjectData/freesurfer/stats/auto.qa.summary.flags.csv
Wrapper Script: /data/joy/BBL/projects/conteReproc2017/freesurfer/QA.sh
Requires:
-
subject list in the form of a text file (slist) (list of bblid/datexscanid)
Example:
bblid/datexscanid -
Full path to directory for output csv's to be output to
export output_dir=/data/joy/BBL/studies/conte/subjectData/freesurfer/stats
-
FreeSurfer specific environment and subject specific variables to be set
export SUBJECTS_DIR=/data/joy/BBL/studies/conte/processedData/freesurfer export QA_TOOLS=/data/joy/BBL/applications/QAtools_v1.1/ export FREESURFER_HOME=/share/apps/freesurfer/5.3.0/ export PATH=$FREESURFER_HOME/bin/:$PATH
Script: /data/joy/BBL/projects/conteReproc2017/freesurfer/QA.sh
Steps:
-
Checks to make sure that the aseg.stats csv does not exist, and if it doesn't then make the appropriate directory (/data/joy/BBL/studies/conte/subjectData/freesurfer/stats/aseg.stats)
if [ ! -e "$output_dir/aseg.stats" ]; then mkdir -p $output_dir/aseg.stats fi
-
It then runs asegstats2table which takes each subject's subcortical stats file created by recon-all and outputs the data into a table in which each row is a subject and each column is a segmentation volume (in mm3). If a subject is missing their aseg file then that subject is skipped (using the --skip argument to asegstats2table).
asegstats2table --subjectsfile=$slist -t $output_dir/aseg.stats/aseg.stats.volume.csv -m volume --skip
-
The output table is /data/joy/BBL/studies/conte/subjectData/freesurfer/stats/aseg.stats/aseg.stats.volume.csv
Script: /data/joy/BBL/projects/conteReproc2017/freesurfer/aparc.stats.meanthickness.totalarea.sh
Requires:
- Arguments passed by /data/joy/BBL/projects/conteReproc2017/freesurfer/QA.sh
- subject list (slist)
- subject bblid specific directory (SUBJECTS_DIR)
- full path of directory where csv should be output (OUTPUT_DIR)
Steps:
-
Sets variables based on those passed by wrapper script
export OUTPUT_DIR=$2 export SUBJECTS_DIR=$3 slist=$1
-
Checks to make sure that the aparc.stats csv does not exist, and if it doesn't then make the appropriate directory (/data/joy/BBL/studies/conte/subjectData/freesurfer/stats/aparc.stats)
if [ ! -e $OUTPUT_DIR/aparc.stats ]; then mkdir $OUTPUT_DIR/aparc.stats fi
-
Creates and outputs headers to the aggregate file /data/joy/BBL/studies/conte/subjectData/freesurfer/stats/aparc.stats/bilateral.meanthickness.totalarea.csv
echo bblid,scanid,rh.meanthickness,rh.totalarea,lh.meanthickness,lh.totalarea > $OUTPUT_DIR/aparc.stats/bilateral.meanthickness.totalarea.csv
-
Loops through subjects and pulls bblid and scanid from slist
for i in $(cat $slist); do bblid=$(echo $i | cut -d"/" -f1) scanid=$(echo $i | cut -d"/" -f2)
-
Checks for the subject's right and left aparc.stats files
-
If those files don't exist it prints "no rh/lh.aparc.stats file for bblid/datexscanid"
-
If the files do exist, it greps for "MeanThickness" and "SurfArea" in each subject's aparc stats file created by recon-all
(Steps 5-7) ### RH MEAN THICKNESS AND TOTAL AREA ### ######################################## subdir=$SUBJECTS_DIR/$i if [ ! -e "$subdir/stats/rh.aparc.stats" ]; then echo "no rh.aparc.stats file for" $i rmt="NA" rta="NA" else string=`grep MeanThickness, $subdir/stats/rh.aparc.stats` rmt=`echo $string | cut -d "," -f 4` string=`grep SurfArea, $subdir/stats/rh.aparc.stats` rta=`echo $string | cut -d "," -f 4` fi ### LH MEAN THICKNESS AND TOTAL AREA ### ######################################## if [ ! -e "$subdir/stats/lh.aparc.stats" ]; then echo "no lh.aparc.stats file for" $i lmt="NA" lta="NA" else string=`grep MeanThickness, $subdir/stats/lh.aparc.stats` lmt=`echo $string | cut -d "," -f 4` string=`grep SurfArea, $subdir/stats/lh.aparc.stats` lta=`echo $string | cut -d "," -f 4` fi
-
It outputs this data into the table /data/joy/BBL/studies/conte/subjectData/freesurfer/stats/aparc.stats/bilateral.meanthickness.totalarea.csv
echo $bblid,$scanid,$rmt,$rta,$lmt,$lta >> $OUTPUT_DIR/aparc.stats/bilateral.meanthickness.totalarea.csv
Script: /data/joy/BBL/projects/conteReproc2017/freesurfer/QA.sh
Steps:
-
Run aparcstats2table on each hemisphere/measure combination (left and right hemispheres for volume, thickness, and area). Subject's without aparc stats files are skipped using the --skip flag
-
Outputs this parcelation data into 6 tables in /data/joy/BBL/studies/conte/subjectData/freesurfer/stats/aparc.stats/ with the naming convention of [hemi lh/rh].aparc.stats.[measurement volume/thickness/area].csv
(Steps 1-2) aparcstats2table --hemi lh --subjectsfile=$slist -t $output_dir/aparc.stats/lh.aparc.stats.thickness.csv -m thickness --skip aparcstats2table --hemi rh --subjectsfile=$slist -t $output_dir/aparc.stats/rh.aparc.stats.thickness.csv -m thickness --skip aparcstats2table --hemi lh --subjectsfile=$slist -t $output_dir/aparc.stats/lh.aparc.stats.volume.csv -m volume --skip aparcstats2table --hemi rh --subjectsfile=$slist -t $output_dir/aparc.stats/rh.aparc.stats.volume.csv -m volume --skip aparcstats2table --hemi lh --subjectsfile=$slist -t $output_dir/aparc.stats/lh.aparc.stats.area.csv --skip aparcstats2table --hemi rh --subjectsfile=$slist -t $output_dir/aparc.stats/rh.aparc.stats.area.csv --skip
Script: /data/joy/BBL/projects/conteReproc2017/freesurfer/cnr_euler_number_calculation.sh
Requires:
- Arguments passed by /data/joy/BBL/projects/conteReproc2017/freesurfer/QA.sh
- subject list (slist) (/data/joy/BBL/studies/conte/subjectData/design3FullSubjectList.txt)
- subject bblid specific directory (SUBJECTS_DIR)
- full path to directory where files should be output (OUTPUT_DIR)
- FreeSurfer specific environments be set (FREESURFER_HOME and QA_TOOLS)
Steps:
-
Sets variables based on those passed by wrapper script
#set freesurfer specific variables export SUBJECTS_DIR=$2 export OUTPUT_DIR=$3 export QA_TOOLS=/data/joy/BBL/applications/QAtools_v1.1/ export FREESURFER_HOME=/share/apps/freesurfer/5.3.0/ export PATH=$FREESURFER_HOME/bin/:$PATH #set subject list, directory to output aggregated files to, and the filenames of those aggregate files slist=$1 outdir=$OUTPUT_DIR/cnr file=$outdir/cnr_buckner.csv euler_file=$outdir/euler_number.csv
-
Creates and outputs headers to the aggregate files /data/joy/BBL/studies/conte/subjectData/freesurfer/stats/cnr/cnr_buckner.csv and /data/joy/BBL/studies/conte/subjectData/freesurfer/stats/cnr/euler_number.csv
echo "bblid,scanid,total,graycsflh,graycsfrh,graywhitelh,graywhiterh" > $file echo "bblid,scanid,left_euler,right_euler" > $euler_file
-
Loops through subjects and pulls bblid and scanid from slist and creates variables for the surf and mri directories
for i in $(cat $slist);do bblid=$(echo $i | cut -d"/" -f1) scanid=$(echo $i | cut -d"/" -f2) echo "working on subject" $i surf=`ls -d $SUBJECTS_DIR/$i/surf` mri=`ls -d $SUBJECTS_DIR/$i/mri`
-
Runs mri_cnr on the orig.mgz file to calculate cnr and outputs all measures into their subject specific CNR file (/data/joy/BBL/studies/conte/processedData/freesurfer/[bblid]/[datexscanid]/[bblid]_[datexscanid]_cnr.txt)
mri_cnr $surf $mri/orig.mgz > $SUBJECTS_DIR/$i/stats/$bblid"_"$scanid"_cnr.txt"
-
Create variables for total cnr, gray/csf for left and right hemispheres and gray/white for left and right hemispheres
#create variables for total cnr, gray/csf for left and right hemispheres and gray/white for left and right hemispheres (total and cnr variables are grepping the information from the subject specific file, the variables then need to be cut in order to get just the number) total=`grep "total CNR" $SUBJECTS_DIR/$i/stats/$bblid"_"$scanid"_cnr.txt"` total2=`echo $total |cut -f 4 -d " "` cnr=`grep "gray/white CNR" $SUBJECTS_DIR/$i/stats/$bblid"_"$scanid"_cnr.txt"` graycsflh=`echo $cnr | cut -d "," -f 2 | cut -d "=" -f 2 | cut -d " " -f 2` graycsfrh=`echo $cnr | cut -d "," -f 3 | cut -d "=" -f 2 | cut -d " " -f 2` graywhitelh=`echo $cnr | cut -d "," -f 1 | cut -d "=" -f 2 | cut -d " " -f 2` graywhiterh=`echo $cnr | cut -d "," -f 2 | cut -d "=" -f 3 | cut -d " " -f 2`
-
Outputs CNR data into an appended table (/data/joy/BBL/studies/conte/subjectData/freesurfer/stats/cnr/cnr_buckner.csv)
echo $bblid,$scanid,$total2,$graycsflh,$graycsfrh,$graywhitelh,$graywhiterh >> $file
-
Runs mris_euler on the lh.orig.nofix and rh.orig.nofix files to calculate euler numbers for left and right hemispheres and outputs all measures into their subject specific euler files (/data/joy/BBL/studies/conte/processedData/freesurfer/[bblid]/[datexscanid]/[bblid]_[datexscanid]lh_euler.txt and /data/joy/BBL/studies/conte/processedData/freesurfer/[bblid]/[datexscanid]/[bblid][datexscanid]_rh_euler.txt)
script -c "mris_euler_number $surf/lh.orig.nofix" $SUBJECTS_DIR/$i/stats/$bblid"_"$scanid"_lh_euler.txt" script -c "mris_euler_number $surf/rh.orig.nofix" $SUBJECTS_DIR/$i/stats/$bblid"_"$scanid"_rh_euler.txt"
-
Creates variables for left and right euler numbers by grepping from the lh/rh_euler.txt files
left=`grep ">" $SUBJECTS_DIR/$i/stats/$bblid"_"$scanid"_lh_euler.txt" | cut -d ">" -f 1 | cut -d "=" -f 4 | cut -d " " -f 2` right=`grep ">" $SUBJECTS_DIR/$i/stats/$bblid"_"$scanid"_rh_euler.txt" | cut -d ">" -f 1 | cut -d "=" -f 4 | cut -d " " -f 2`
-
Outputs Euler number data into an appended table (/data/joy/BBL/studies/conte/subjectData/freesurfer/stats/cnr/euler_number.csv)
echo $bblid,$scanid,$left,$right >> $euler_file
Notes:
- mris_euler must be run with script -c preceeding it so that the output can be put into a file
- The methods for this script were obtained from Chalavi, et al, BMC medical Imaging, 2012. doi:10.1186/1471-2342-12-27. http://www.biomedcentral.com/1471-2342/12/27
Script: /data/joy/BBL/projects/conteReproc2017/freesurfer/flag_outliers.R
Requires:
- Arguments passed by /data/joy/BBL/projects/conteReproc2017/freesurfer/QA.sh
- output directory (output_dir)
Steps:
-
Create directory variables, standard deviation threshold variable, and read in arguments passed by QA.sh script
### ARGS ### ############ subjects.dir<-commandArgs(TRUE)[1] sdthresh<-2 ### DIRS ### ############ stats.dir<-file.path(subjects.dir) aparc.dir<-file.path(stats.dir, 'aparc.stats') aseg.dir<-file.path(stats.dir, 'aseg.stats') area.dir<-file.path(stats.dir, 'aparc.stats/area') curvature.dir<-file.path(stats.dir, 'aparc.stats/curvature')
-
Create variables for files
mean.file<-file.path(aparc.dir, 'bilateral.meanthickness.totalarea.csv') cnr.file<-file.path(stats.dir, 'cnr/cnr_buckner.csv') snr.file<-file.path(stats.dir, 'cnr/snr.txt') aseg.volume.file<-file.path(aseg.dir, 'aseg.stats.volume.csv') lh.thickness.file<-file.path(aparc.dir, 'lh.aparc.stats.thickness.csv') rh.thickness.file<-file.path(aparc.dir, 'rh.aparc.stats.thickness.csv')
-
Read in mean, cnr, snr, and aseg data and merge into a file called full
mean.data<-read.csv(mean.file, strip.white=TRUE) mean.data$meanthickness<-rowMeans(mean.data[, c('rh.meanthickness', 'lh.meanthickness')]) mean.data$totalarea<-rowSums(mean.data[, c('rh.totalarea', 'lh.totalarea')]) mean.data<-mean.data[,!(grepl('lh', names(mean.data)) | grepl('rh', names(mean.data)))] cnr.data<-read.csv(cnr.file, strip.white=TRUE, header=FALSE) full<-mean.data full$cnr<- cnr.data$V3[match(full$scanid,cnr.data$V2)] full$cnr<- as.numeric(as.character(full$cnr)) snr.data<-try(read.table(snr.file, strip.white=TRUE, header=FALSE, col.names=c('subject', 'snr'))) if(is.data.frame(snr.data)){ snr.data[,c('bblid', 'scanid')]<-apply(do.call(rbind, strsplit(as.character(snr.data$subject), split="/")), 2, as.character) snr.data<-snr.data[,-1] full$snr<- snr.data$snr[match(full$scanid,snr.data$scanid)] } aseg.volume.data<-read.table(aseg.volume.file, strip.white=TRUE, header=TRUE) aseg.volume.data[,c('bblid', 'scanid')]<-apply(do.call(rbind, strsplit(as.character(aseg.volume.data$Measure.volume), split="/")), 2, as.character) aseg.volume.data<-aseg.volume.data[,c("bblid", "scanid", "SubCortGrayVol", "CortexVol", "CorticalWhiteMatterVol")] full$SubCortGrayVol<- aseg.volume.data$SubCortGrayVol[match(full$scanid,aseg.volume.data$scanid)] full$CortexVol<- aseg.volume.data$CortexVol[match(full$scanid,aseg.volume.data$scanid)] full$CorticalWhiteMatterVol<- aseg.volume.data$CorticalWhiteMatterVol[match(full$scanid,aseg.volume.data$scanid)]
-
Read in thickness data and merge left and right thickness data into one file
thickness.data<-read.table(lh.thickness.file, header=TRUE, strip.white=TRUE) rh.thickness.data<-read.table(rh.thickness.file, header=TRUE, strip.white=TRUE) thickness.data[,c('bblid', 'scanid')]<-apply(do.call(rbind, strsplit(as.character(thickness.data$lh.aparc.thickness), split="/")), 2, as.character) rh.thickness.data[,c('bblid', 'scanid')]<-apply(do.call(rbind, strsplit(as.character(rh.thickness.data$rh.aparc.thickness), split="/")), 2, as.character) rh.thickness.data<-rh.thickness.data[,-1] thickness.data<-thickness.data[,-1] thickness.data<-merge(thickness.data, rh.thickness.data, all=TRUE,by=c("scanid","bblid")) rm('rh.thickness.data')
-
Create data to calculate the standard deviation cut-offs from and get names of left and right hemisphere ROIs
lh.names<-grep('lh', names(thickness.data), value=TRUE) rh.names<-sub('lh', 'rh', lh.names) tmp_lh<-data.frame(matrix(NA, nrow=(nrow(thickness.data)),ncol=length(lh.names)+1)) tmp_lh[1]<- thickness.data$scanid colnames(tmp_lh)[1]<-"scanid" colnames(tmp_lh)[2:ncol(tmp_lh)]<- lh.names tmp_rh<-data.frame(matrix(NA, nrow=(nrow(thickness.data)),ncol=length(rh.names)+1)) tmp_rh[1]<- thickness.data$scanid colnames(tmp_rh)[1]<-"scanid" colnames(tmp_rh)[2:ncol(tmp_rh)]<- rh.names
-
Calculate the 2SD cut off for each left and right ROI thickness and calculate if each subject in the full dataset is a SD outlier based on the threshold you set (sdthresh)
for (i in lh.names){ sd_thresh<-(sdthresh*(sd(subset.data[,i]))) sd_above_value<- mean(subset.data[,i])+sd_thresh sd_below_value<- mean(subset.data[,i])-sd_thresh tmp_lh[i]<- "0" tmp_lh[i][thickness.data[i]>sd_above_value]<- "1" tmp_lh[i][thickness.data[i]<sd_below_value]<- "1" } for (i in rh.names){ sd_thresh<-(sdthresh*(sd(subset.data[,i]))) sd_above_value<- mean(subset.data[,i])+sd_thresh sd_below_value<- mean(subset.data[,i])-sd_thresh tmp_rh[i]<- "0" tmp_rh[i][thickness.data[i]>sd_above_value]<- "1" tmp_rh[i][thickness.data[i]<sd_below_value]<- "1" } tmp<- cbind(tmp_lh,tmp_rh[2:ncol(tmp_rh)]) tmp2<-data.frame(sapply(tmp[2:ncol(tmp)], function(x) as.numeric(as.character(x)))) tmp2<- cbind(tmp[1],tmp2) #get number of thickness ROIs (sum of 1's just calculated for each subject) #count number of outlying regions for each subject thickness.data$noutliers.thickness.rois<-rowSums(tmp2[2:ncol(tmp2)])
-
Calculate the 2SD cut off for each ROI laterality and calculate if each subject in the full dataset is a SD outlier based on the threshold you set (sdthresh)
tmp_laterality<-data.frame(matrix(NA, nrow=(nrow(thickness.data)),ncol=length(lh.names)+1)) tmp_laterality[1]<- thickness.data$scanid colnames(tmp_laterality)[1]<-"scanid" colnames(tmp_laterality)[2:ncol(tmp_laterality)]<- lh.names for (z in seq(1, length(lh.names))){ i <- lh.names[z] r_name<- paste("rh",substring(i,4,10000),sep="_") sd_above_value<-(mean((subset.data[,i] - subset.data[,r_name])/(subset.data[,i] + subset.data[,r_name]))+(sdthresh*(sd((subset.data[,i] - subset.data[,r_name])/(subset.data[,i] + subset.data[,r_name]))))) sd_below_value<-(mean((subset.data[,i] - subset.data[,r_name])/(subset.data[,i] + subset.data[,r_name]))-(sdthresh*(sd((subset.data[,i] - subset.data[,r_name])/(subset.data[,i] + subset.data[,r_name]))))) tmp_laterality[,z+1]<- "0" tmp_laterality[,z+1][which((thickness.data[,i] - thickness.data[,r_name])/(thickness.data[,i] + thickness.data[,r_name])>sd_above_value)]<- "1" tmp_laterality[,z+1][which((thickness.data[,i] - thickness.data[,r_name])/(thickness.data[,i] + thickness.data[,r_name])<sd_below_value)]<- "1" tmp_laterality[,z+1]<- as.numeric(tmp_laterality[,z+1]) } thickness.data$noutliers.lat.thickness.rois<-rowSums(tmp_laterality[2:ncol(tmp_laterality)])
-
Flag mean data (mean thickness, total surface area, Cortical volume, Subcortical gray matter, Cortical White matter) based on > 2 SD
thickness.data.mean<-full mean_names<- c('meanthickness', 'totalarea', "SubCortGrayVol", "CortexVol", "CorticalWhiteMatterVol", "cnr","snr") tmp_mean<-data.frame(matrix(NA, nrow=(nrow(full)),ncol=length(mean_names)+1)) tmp_mean[1]<- full$scanid colnames(tmp_mean)[1]<-"scanid" colnames(tmp_mean)[2:ncol(tmp_mean)]<- mean_names #then calculate the 2SD cut off for each mean and calculate if each subject in the full dataset is a SD outlier based on the threshold you set (sdthresh) for (i in mean_names){ sd_thresh<-(sdthresh*(sd(thickness.data.mean[,i]))) sd_above_value<- mean(thickness.data.mean[,i])+sd_thresh sd_below_value<- mean(thickness.data.mean[,i])-sd_thresh tmp_mean[i]<- "0" tmp_mean[i][full[i]>sd_above_value]<- "1" tmp_mean[i][full[i]<sd_below_value]<- "1" } colnames(tmp_mean)[2:ncol(tmp_mean)]<- c(paste(mean_names, 'outlier', sep="_"))
-
Merge the thickness outliers into the full data file
thickness.data<-thickness.data[,c('bblid', 'scanid', 'noutliers.thickness.rois', 'noutliers.lat.thickness.rois')] full$noutliers.thickness.rois<- thickness.data$noutliers.thickness.rois[match(full$scanid,thickness.data$scanid)] full$noutliers.lat.thickness.rois<- thickness.data$noutliers.lat.thickness.rois[match(full$scanid,thickness.data$scanid)] full$meanthickness_outlier<- tmp_mean$meanthickness_outlier[match(full$scanid,tmp_mean$scanid)] full$totalarea_outlier<- tmp_mean$totalarea_outlier[match(full$scanid,tmp_mean$scanid)] full$SubCortGrayVol_outlier<- tmp_mean$SubCortGrayVol_outlier[match(full$scanid,tmp_mean$scanid)] full$CortexVol_outlier<- tmp_mean$CortexVol_outlier[match(full$scanid,tmp_mean$scanid)] full$CorticalWhiteMatterVol_outlier<- tmp_mean$CorticalWhiteMatterVol_outlier[match(full$scanid,tmp_mean$scanid)] full$cnr_outlier<- tmp_mean$cnr_outlier[match(full$scanid,tmp_mean$scanid)] full$snr_outlier<- tmp_mean$snr_outlier[match(full$scanid,tmp_mean$scanid)]
-
Get flags
`flags<-names(full)[which(!names(full) %in% c('bblid', 'scanid'))]`
-
Write the outlier file to /data/joy/BBL/studies/conte/subjectData/freesurfer/stats/all.flags.n[subjnum].csv
noutliers.flags<-grep('noutlier', names(full), value=T) full[,paste(noutliers.flags, 'outlier', sep="_")]<-as.numeric(scale(full[,noutliers.flags])>sdthresh) write.csv(full, file.path(stats.dir, paste('all.flags.n' , nrow(full),'.csv', sep='')), quote=FALSE, row.names=FALSE) cat('wrote file to', file.path(stats.dir, paste('all.flags.n' , nrow(full),'.csv', sep='')), '\n')
Notes:
- For the ROI based measures we compute number of roi outliers for each subject then compute outliers across subjects for number of ROIs flagged.
- The measures that are flagged are based on comments here: http://saturn/wiki/index.php/QA
Flag (gray/csf flag, gray/white flag, euler number flag, number outliers rois thickness flag, total outliers)
Script: /data/joy/BBL/projects/conteReproc2017/freesurfer/cnr_euler_qa.R
Requires:
- Arguments passed by /data/joy/BBL/projects/conteReproc2017/freesurfer/QA.sh
- full path to directory where csv's should be output (output_dir)
Steps:
-
Sets variables based on those passed by wrapper script and read in cnr and euler number csv's
output.dir<-commandArgs(TRUE)[1] cnr_data<- read.csv(paste(output.dir,"/cnr/cnr_buckner.csv",sep="")) euler_data<- read.csv(paste(output.dir,"/cnr/euler_number.csv",sep=""))
-
Merge the files together into one data file and exclude the subjects that failed freesurfer processing
#merge the files together by datexscanid data<- cnr_data data$left_euler<- euler_data$left_euler[match(data$scanid,euler_data$scanid)] data$right_euler<- euler_data$right_euler[match(data$scanid,euler_data$scanid)]
-
create a data frame with mean values for gray/csf cnr, gray/white cnr, and euler numbers across hemispheres
#create a dataframe which will get the flags based on euler and cnr calculations
flags<- data#get mean values for gray/csf cnr, gray/white cnr and euler numbers (average across hemispheres)
flags$mean_euler<-(flags$left_euler+flags$right_euler)/2
flags$mean_graycsf_cnr<- (flags$graycsflh+flags$graycsfrh)/2
flags$mean_graywhite_cnr<- (flags$graywhitelh+flags$graywhiterh)/2#subset data frame to only IDs and averages
flags<- flags[,c(1,2,10:12)] -
create variables to be used as cutoff values for cnr and euler number flagging qa (mean - 2 SD)
graycsf_cutoff<- mean(flags$mean_graycsf_cnr-(2*sd(flags$mean_graycsf_cnr,na.rm=T)),na.rm=T) graywhite_cutoff<- mean(flags$mean_graywhite_cnr-(2*sd(flags$mean_graywhite_cnr,na.rm=T)),na.rm=T) euler_cutoff<- mean(flags$mean_euler-(2*sd(flags$mean_euler,na.rm=T)),na.rm=T)
-
flag subjects for each measure based on mean - 2 SD cutoff (binary flagged [1=yes, 0=no] column)
#create a binary flag column (1=yes, 0=no) for average cnr and euler numbers (<2 SD =1, >2 SD=0)
flags$graycsf_flag<- NA
flags$graywhite_flag<- NA
flags$euler_flag<- NA#remove any subjects with no data
flags<- flags[! is.na(flags$mean_euler),]for (i in 1:nrow(flags)){
if (flags$mean_graycsf_cnr[i]<= graycsf_cutoff){
flags$graycsf_flag[i]<- 1
} else if (flags$mean_graycsf_cnr[i]>graycsf_cutoff){
flags$graycsf_flag[i]<- 0
}if (flags$mean_graywhite_cnr[i]<=graywhite_cutoff){
flags$graywhite_flag[i]<- 1
} else if (flags$mean_graywhite_cnr[i]>graywhite_cutoff){
flags$graywhite_flag[i]<- 0
}if (flags$mean_euler[i]<=euler_cutoff){
flags$euler_flag[i]<- 1
} else if (flags$mean_euler[i]>euler_cutoff){
flags$euler_flag[i]<- 0
}} # for (i in 1:nrow(flags)){
-
create a total outliers column in the data (which sums the number of flags across measures for each subject) and a binary flagged column which gets a 1 for a subject if there is at least 1 flagged measure for that subject and a 0 if there were no outliers in any measure for that subject
#subset data frame to only IDs and flags
flags<- flags[,c(1,2,6:8)]#create a total outliers column which gets the number of total outliers and a column which gets a binary flag 1=yes, 0=no
flags$total_outliers<- NA
x<- cbind(flags$graycsf_flag,flags$graywhite_flag,flags$euler_flag)
flags$total_outliers<- apply(x,1,sum)
flags$flagged<- flags$total_outliers
flags$flagged[flags$flagged>0]<- 1nsubj<- nrow(flags)
-
write out flagged csv
write.csv(flags,paste("/data/joy/BBL/studies/conte/subjectData/freesurfer/stats/cnr_euler_flags_n",nsubj,".csv",sep=""))
Script: /data/joy/BBL/projects/conteReproc2017/freesurfer/QA.sh
Steps:
-
loop through subjects (slist) and run QA_TOOLS recon checker, not checking aseg data, output order of files, or logfile status for outliers or errors and without taking snapshots and output this information into a temporary file
for i in $(cat $slist); do $QA_TOOLS/recon_checker -s $i -nocheck-aseg -nocheck-status -nocheck-outputFOF -no-snaps done > temp.txt
-
grep for subject id (bblid/datexscanid) and output into temp2.txt
grep "wm-anat-snr results" temp.txt | cut -d"(" -f2 | cut -d")" -f1 >temp2.txt
-
loop through each bblid in temp.txt, and pull the snr value from temp.txt and output to temp3.txt
for i in $(cat -n temp.txt | grep "wm-anat-snr results" | cut -f1); do echo $(sed -n "$(echo $i +2 | bc)p" temp.txt | cut -f1) done > temp3.txt
-
append each subject's snr data into the snr.txt file (/data/joy/BBL/studies/conte/subjectData/freesurfer/stats/cnr/snr.txt)
paste temp2.txt temp3.txt > $output_dir/cnr/snr.txt
- remove temporary files
rm -f temp*.txt
Notes:
- More information on the recon_checker command can be found on the FreeSurfer QA TOOLS wiki https://surfer.nmr.mgh.harvard.edu/fswiki/QATools
Script: /data/joy/BBL/projects/conteReproc2017/freesurfer/QA.sh
Steps:
-
create variables for the automatic qa output of flags and cnr/euler number output
auto_flag=$output_dir/all.flags.n103.csv euler_flag=$output_dir/cnr_euler_flags_n102.csv
Script: /data/joy/BBL/projects/conteReproc2017/freesurfer/sum_flags_auto_qa.R
Requires:
- Arguments passed by /data/joy/BBL/projects/conteReproc2017/freesurfer/QA.sh
- full path to directory where csv's should be output (output_dir)
- csv of automatic flags from QA (auto_flag)
- csv of atuomatic cnr/euler number flags from QA (euler_flag)
Steps:
-
Sets variables based on those passed by wrapper script and read in automatic qa csv's
output.dir<-commandArgs(TRUE)[1] auto_flag<-read.csv(commandArgs(TRUE)[2]) euler_flag<-read.csv(commandArgs(TRUE)[3])
-
Creates data sheet for output of all columns which are included in flagging for review, and merge in the euler and cnr data
flags<- auto_flag[,c(1:2,12:16,18:20)] #merge in the euler number columns (gray/white cnr, gray/csf cnr, euler number) flags$graycsf_flag<- euler_flag$graycsf_flag[match(flags$scanid,euler_flag$scanid)] flags$graywhite_flag<- euler_flag$graywhite_flag[match(flags$scanid,euler_flag$scanid)] flags$euler_flag<- euler_flag$euler_flag[match(flags$scanid,euler_flag$scanid)]
-
Creates summary column which gets a sum of the flags in the csv (ones used for qa flagging) to get total outliers
flags$total_outliers<- rowSums(flags[,3:13])
-
Creates a column that gets a binary 1 (flagged) or 0 (not flagged) if the subject is flagged or not, based on if they have one or more total outlier defined above
flags$fsFlag<- "NA" flags$fsFlag[which(flags$total_outliers==0)]<- "0" flags$fsFlag[which(flags$total_outliers>0)]<- "1"
-
Output the summary qa file
write.csv(flags, file.path(output.dir, "auto.qa.summary.flags.csv"), quote=FALSE, row.names=FALSE) cat('wrote file to', file.path(output.dir, "auto.qa.summary.flags.csv"), '\n')