Extracting UK Biobank data - katielavigne/documentation GitHub Wiki
Extracting UKBB via NeuroHub and CBRAIN
Overview
The UKBB data is accessible to McGill students and faculty through NeuroHub. Instructions for gaining access are here. NeuroHub has raw neuroimaging data as well as CIVET preprocessed data for structural MRI for 39,000+ participants. This will be updated for other modalities in the future and progress can be followed here. Finally, more information on the full data acquired as part of the UKBB can be found in the UK Biobank showcase.
Extracting CIVET outputs
Once you gain access to the UKBB data, you will want to extract the MRI data you are interested in rather than downloading the entire dataset, which is ~50 TB. To do this, you can use the SimpleFileExtractor tool in CBRAIN.
Launching the SimpleFileExtractor for all subjects
On the (NeuroHub portal](https://portal.neurohub.ca/neurohub), click on the CB in the upper left corner for CBRAIN and then click on Projects in the top ribbon. You should see the NeuroHub-UKBB-Access project once you've gained approval through the steps described in the link above. Inside the project you will find all the files associated with the UK Biobank, including raw data (mincs, niftis) and processed data (CIVET outputs). Use the File Type filter to select only the files you want (e.g., Civet Output). Enter the number of files to display in the upper right (up to 1000) and then click on the box to the left of Filename to select all files on that page. You should see "1000 files currently selected" just above. Note that the SimpleFileExtractor we are using will only work with 5000 files at a time. Then you can launch the SimpleFileExtractor Tool, by clicking Launch -> SimpleFileExtractor -> Select Server & Version (Converter-1 or Converter-2 for UKBB data). More information on launching is available in the (CBRAIN Getting Started Guide](https://portal.cbrain.mcgill.ca/doc/manual/CBRAINGettingStartedGuideFeb15.pdf) page 12, including ways to filter more specifically.
Once you have launched the tool, you will see a new page under the Tasks tab with several options. This is where you define the parameters of which files to extract. Under Task Control -> Save results to: you can choose SFTP-1 and you can add a description, e.g., "Extract cortical thickness CIVET outputs". You can save or load preset options under Preset Management. Under Task Parameters, define which files you want to extract. More details are provided in the Help link. To extract CIVET outputs, you will need to know the filename patterns, which are described in the CIVET Documentation in the section (Outputs of CIVET](http://www.bic.mni.mcgill.ca/ServicesSoftware/CIVET-2-1-0-Outputs-of-CIVET). To explore the files of a single subject, you can click on one of the CIVET outputs in the CBRAIN Files tab and view the directories and files under Content. Viewing these along with the CIVET documentation will give you a feel for the data. We can see that the UKBB CIVET outputs include thickness for 0mm, 5mm, 10mm, 20mm, 30mm, and 40mm smoothing under the thickness directory, e.g., thickness/ukbb_1000011_ses2_native_rms_rsl_tlaplace_0mm_left.txt. To extract all the 0mm files, your SimpleFileExtractor patterns would be as follows:
*/thickness/*rms_rsl_tlaplace_0mm_left.txt
*/thickness/*rms_rsl_tlaplace_0mm_right.txt
These are more specific than those described in the help to avoid extracting unnecessary files (e.g., asym, asym_hemi). You want the "rms_rsl" as these are resampled to MNI 152 space as described in the (Outputs of CIVET](http://www.bic.mni.mcgill.ca/ServicesSoftware/CIVET-2-1-0-Outputs-of-CIVET).
Add an output name to define your files, e.g., CIVET_0mm. Click Start SimpleFileExtractor. It will launch in the background and you will see a notification at the top of the screen. Once it is complete, another notification will show. This will take quite some time. You can then download your files using sftp through the terminal or with a program like FileZilla. For example:
Open a terminal on your local machine and cd to the directory where you'd like to download your files:
cd /path/to/local/directory
sftp -o port=7500 [email protected]
get -r *
The files will download to your local machine and then you can exit.
Acquiring UKBB behavioural data
To acquire the behavioural data for UKBB, you can use the LORIS Data Query Tool and export it to NeuroHub.