UsageDocs QualityAssurance - ULJ-Yale/qunex GitHub Wiki
Quality Assurance is an important but highly tedious step in running QuNex preprocessing. The run_qa
command helps ease this process, and supports the following:
- Raw Data QA (
--datatype=raw_data
) - Config File QA (
--datatype=config
)
Quality Assurance, QA, is not to be confused with Quality Control, QC, and it's command run_qc
. In short, QA is responsible for processing efficiency and completion, whereas QC is responsible for the results of the processing.
qunex run_qa \
--datatype=<Type of QA>
--sessionsfolder=<QuNex sessions folder>
--sessions=<Sessions to QA>
--configfile=<QA config file>
--tag=<Output identifier>
--overwrite=<Overwrite, yes or no>
This command will run QA on all specified sessions according to a highly-customizable user-created configuration YAML file. Usually, this entails checking specified files exist and that parameters have expected values.
Once complete, run_qa
will output lists of sessions that have passed and failed the declared QA, as well as reports, both human and machine readable, that detail why and how these sessions failed.
The QA performed is highly dependent on two flags: configfile
and datatype
. The configfile
flag should point towards a configuration file. The datatype
flag must be a String referring to the type of data on which you want to run QA. This page will focus primarily on these.
For more precise info on other flags and actually running the command, see the command's page.
Command inputs vary significantly depending on the QA specified, but at the bare minimum the command requires a configuration YAML file and a folder for each session, found within the --sessionsfolder
directory.
The run_qa
command generates four files: two lists containing sessions that have passed failed QA, and two reports (one human-readable and one machine-readable containing the same info). These are generated inside processing/lists
and processing/reports
respectively.
The first output is a file containing all sessions that have passed the specified QA. It is formatted as a QuNex .list
file. This means each line corresponds to an individual session, with the format:
session id: {session_1}
session id: {session_2}
...
session id: {session_N}
The result is that this file can input directly into qunex commands with the --sessions
parameter:
--sessions="QA_pass_raw_data.list"
The goal is that users can use this list to continue processing, without including problematic sessions or those that would require different processing. These lists files also have other functionality, see more info on list files here.
This output file has the same .list
format as above, but instead contains all sessions that have failed QA. Similarly, this file can be input directly into QuNex commands with the --sessions
parameter, even into run_qa
itself if you wish to investigate the data further.
This file will contain a human-readable report of the QA outcomes, particularly for sessions that have failed QA. What exactly is contained within the report is highly dependent on the QA run, but will typically explain why sessions failed QA and what precisely went wrong.
This file output has all the same information as the above but in a machine-friendly .yml format. It also contains internal variables that may be useful to those developing pipelines off the outputs of `run_qa.
Because the QA needed varies greatly between datasets, run_qa
is designed to be highly user-customizable, controlled through a user-created configuration YAML file. If you're unfamiliar with YAML format, see the documentation here.
In this file, users can define nested parameter-value pairs and sequences pertaining to your data. Basically, it allows you to tell run_qa
what things you want to check in your data and what you expect them to be.
The contents will be quite different depending on the QA type you're running and your data, but it should follow this basic format:
datatypes:
<Specified Data-type 1>:
<param>:<value>
<param>:
<sub-param>:<value>
<Specified Data-type 2>:
- <sequence param>:
<sub-param>:<value>
<sub-param>:
<sub-sub-param>:<value>
config:
<Additional config options>
Parameters and sub-parameters must be within the scope of their corresponding datatype or parameter. They can either be specified directly as key-value pairs, or as yaml sequences starting with -
depending on the data type. See below for data type specific parameters and config creation.
Only the below data types are currently supported.
Raw Data QA checks whether found scans are in-line with the scan Protocol, defined by the user in the supplied config. Run after import_<datatype>
, this does various checks to ensure data is valid before processing. The main goal is to identify problematic sessions before you start processing, saving time and resources. It should also prevent users from needing to manually identify missing/misordered scans.
To specify Raw Data QA in your config, it must be added underneath datatypes
as raw_data
:
datatypes:
raw_data:
For each scan/image the user wishes to QA, they must add a corresponding scan-config in their configuration file with the tag - scan
. Each scan-config must have the series_description
parameter, which is used to identify which image (as labeled in the session.txt
file) you are attempting to QA.
datatypes:
raw_data:
- scan:
series_description: T1w
- scan:
series_description: BOLD1
- scan:
series_description: BOLD2
Note: this series_description
field also accepts the user of wildcards, *
, or specifying multiple acceptable scans with |
.
- scan:
series_description: T1w run-1|T1w run-2
- scan:
series_description: BOLD1*
This is all you need to run a basic QA: run_qa
will simply check each session has scans that explicitly match the series_description
specified. One possible use-case for a config like this is in mapping file verification for create_session_info
.
To do more advanced QA, users can add a combination of parameters and sub-parameters. Aside from series_description
all are optional, though depending on the data not all are practical.
Here are all potential parameters that can be specified at the scan
level:
- scan:
series_description: --> Scan identifier, looks in session.txt
required: --> Whether scan must be present for a session to pass QA
dicoms: --> The number of dicoms before Nifti conversion (from import_dicom)
session: --> Contains sub-parameters related to the session.txt file
<sub-params>
json: --> Contains sub-parameters related to the sidecar .json file
<sub-params>
nii: --> Contains sub-parameters related to the Nifti file header
<sub-params>
As you are likely familiar by now, the main identifier for scans in the QuNex session hierarchy is the session.txt file. The first level of QA, under the session
key, pertains to the contents of this file.
Here are the potential parameters that can be specified at the session
level, all of which are optional:
json:
image_count: --> Number of expected images
image_number: --> Associated image number
scan_index: --> The scan's index, if multiple images found
acquisition: --> The scan's acquisition number, if split into multiple
The goal for these parameters is to help in the scenario that there are multiple images with the same series_description
: these allow users to narrow down which images they actually want to QA.
In QuNex (and most modern neuroimaging formats), raw Nifti image files are coupled with a JSON file containing processing related information and metadata, also known as 'sidecar JSON' files. These are located with their Nifti images in sessions/<session_id>/nii
.
To allow for in-depth customization, json
QA allows you to specify any key, so long as they correspond with an actual value in the sidecar JSON. The only exception is the key normalized
, which is an easier way to require image normalization than using the associated key. Below are some examples, but these are not exhaustive. We recommend checking the .json files of a pilot subject / similar dataset to find the keys associated with your protocol.
json:
normalized: --> Whether or not it is a normalized image
RepetitionTime: --> EXAMPLE
DwellTime: --> EXAMPLE
PhaseEncodingDirection: --> EXAMPLE
EffectiveEchoSpacing: --> EXAMPLE
Note: data will be converted to String (text) for actual comparison which may cause issues with mathematical notation
Though sidecar JSON files contain a lot of useful information on Nifti files, some data is only available directly in the Nifti image's header. Therefore, run_qa
is also able to read Nifti images' headers for key-value validation. Similar to JSON QA, this step has no requirements on keys other than that they exist in the data. The only exception is the data_shape
key, which gives info not available in the header.
This is more advanced than JSON QA, as Nifti images are not simple text files you can print, you will need to use some other software to read them yourself. However, there are many options for this.
nii:
data_shape: --> Data shape of the acquired data, specified as an array
Here we will run an example raw_data -scan
QA. Though you can use this as a base, the keys and values must be adjusted according to your data and analysis.
Below is our session.txt
file after initial import.
11: Localizer [1/3]
12: Localizer [2/3]
13: Localizer [3/3]
21: T1w_MPR
31: T1w_MPR
41: T2w_SPC
51: T2w_SPC
61: SpinEchoFieldMap_AP
71: SpinEchoFieldMap_PA
81: Resting_AP
91: Resting_AP_SBRef
101: Resting_PA
111: Resting_PA_SBRef
And here is our mapping file. A core use-case for raw_data
is that it can be used to validate HCP image mapping will work correctly, and should be setup with the mapping file in mind.
31 => T1w
51 => T2w
SpinEchoFieldMap_AP => SE-FM-AP
SpinEchoFieldMap_PA => SE-FM-PA
Resting_AP => bold:rest
Resting_AP_SBRef => boldref:rest
Resting_PA => bold:rest
Resting_PA_SBRef => boldref:rest
Let's start with the anatomical data, that being the T1 and T2 structural images.
You'll notice that we have two images for each scan: these are the original image and the normalized image. It is somewhat common in many protocols to save both and it serves as good example. In the mapping, we specify the second, normalized, image of each pair using the image number from our session.txt
as this is the one we want to use in preprocessing. If we were to map both it would attempt to average them which we do not want. Therefore, we must ensure that image exists in our QA config using the normalized
and image_number
keys.
We may also want to ensure the data matches our defined protocol. Therefore, we will also check the data matrix shape, Repetition Time, Echo Time, and Dwell Time.
Furthermore, in this example dataset we have sessions from both GE and Siemens machines which have different protocols. We should evaluate the device manufacturer as well to help differentiate required processing.
datatypes:
raw_data:
- scan:
series_description: T1w_MPR
session:
image_number: 31
nifti:
data_shape: [208, 300, 320]
json:
RepetitionTime: 2.5
EchoTime: 0.00207
Manufacturer: "Siemens"
DwellTime: 6.5e-06
normalized: True
Above, we have only included the normalized image as that's the one we need. However, if you want to QA multiple images that share the same name, you can do so by specifying parameters as a list:
- scan:
series_description: T2w_SPC
session:
image_number: [41, 51]
nifti:
data_shape: [[208, 300, 320],[208, 300, 320]
json:
EchoTime: 0.00207
normalized: [False, True]
Here, for the T2 scan, we are now requiring both the original and normalized images. Again, to ensure our mapping is correct we require the normalized image be the second scan using the image_number
key. Otherwise, as these images are identical, we don't need to differentiate for any of the other parameters. You can either continue to specify values as a list, but keep them the same (like for data_shape
), or just leave them as one value (like for EchoTime
). It will extrapolate single values to multiple scans where possible, but if there is a mismatch for multiple values (eg. you've specified three values for EchoTime
, but only two for normalized
) then it will cause an error.
Now let's evaluate the fieldmaps. We want to ensure we have a single SpinEcho for each direction, but as we do not care about the specific image number we will instead use the image_count
key.
In addition to our basic parameter validation, we also want to make sure that the direction (AP/PA) actually matches how the scan is labelled using the PhaseEncodingDirection
key.
- scan:
series_description: SpinEchoFieldMap_AP
session:
image_count: 1
nifti:
data_shape: [90, 90, 60]
json:
RepetitionTime: 6.2
EchoTime: 0.06
PhaseEncodingDirection: j-
- scan:
series_description: SpinEchoFieldMap_PA
session:
image_count: 1
nifti:
data_shape: [90, 90, 60]
json:
RepetitionTime: 6.2
EchoTime: 0.06
PhaseEncodingDirection: j
Finally, we have our two resting-state functional images, one AP and one PA, and their reference images.
In this example, we know that some sessions were acquired with more than just two resting Bold images. However, as this will not negatively impact our preprocessing we won't make this a condition for failing QA. Rather, for the sake of showcasing possibilities, we will simply require that sessions have at least one AP bold rest image, and one PA. You can do this by neglecting to specify the session
subconfig:
- scan:
series_description: Resting_AP
nifti:
data_shape: [90, 90, 60, 333]
json:
RepetitionTime: 0.9
EchoTime: 0.035
PhaseEncodingDirection: j-
- scan:
series_description: Resting_AP_SBRef
nifti:
data_shape: [90, 90, 60]
json:
RepetitionTime: 0.9
EchoTime: 0.035
PhaseEncodingDirection: j-
- scan:
series_description: Resting_PA
nifti:
data_shape: [90, 90, 60, 333]
json:
RepetitionTime: 0.9
EchoTime: 0.035
PhaseEncodingDirection: j
- scan:
series_description: Resting_PA
nifti:
data_shape: [90, 90, 60]
json:
RepetitionTime: 0.9
EchoTime: 0.035
PhaseEncodingDirection: j-