UsageDocs QualityAssurance - ULJ-Yale/qunex GitHub Wiki

Quality Assurance in QuNex

Quality Assurance is an important but highly tedious step in running QuNex preprocessing. The run_qa command helps ease this process, and supports the following:

  • Raw Data QA (--datatype=raw_data)
  • Config File QA (--datatype=config)

Quality Assurance, QA, is not to be confused with Quality Control, QC, and it's command run_qc. In short, QA is responsible for processing efficiency and completion, whereas QC is responsible for the results of the processing.

Using the run_qa command

qunex run_qa \
    --datatype=<Type of QA>
    --sessionsfolder=<QuNex sessions folder>
    --sessions=<Sessions to QA>
    --configfile=<QA config file>
    --tag=<Output identifier>
    --overwrite=<Overwrite, yes or no>

This command will run QA on all specified sessions according to a highly-customizable user-created configuration YAML file. Usually, this entails checking specified files exist and that parameters have expected values.

Once complete, run_qa will output lists of sessions that have passed and failed the declared QA, as well as reports, both human and machine readable, that detail why and how these sessions failed.

The QA performed is highly dependent on two flags: configfile and datatype. The configfile flag should point towards a configuration file. The datatype flag must be a String referring to the type of data on which you want to run QA. This page will focus primarily on these.

For more precise info on other flags and actually running the command, see the command's page.

Inputs

Command inputs vary significantly depending on the QA specified, but at the bare minimum the command requires a configuration YAML file and a folder for each session, found within the --sessionsfolder directory.

Outputs

The run_qa command generates four files: two lists containing sessions that have passed failed QA, and two reports (one human-readable and one machine-readable containing the same info). These are generated inside processing/lists and processing/reports respectively.

processing/lists/QA_pass_{datatype}{tag/config}.list

The first output is a file containing all sessions that have passed the specified QA. It is formatted as a QuNex .list file. This means each line corresponds to an individual session, with the format:

session id: {session_1}
session id: {session_2}
...
session id: {session_N}

The result is that this file can input directly into qunex commands with the --sessions parameter:

--sessions="QA_pass_raw_data.list"

The goal is that users can use this list to continue processing, without including problematic sessions or those that would require different processing. These lists files also have other functionality, see more info on list files here.

processing/lists/QA_fail_{datatype}{tag/config}.list

This output file has the same .list format as above, but instead contains all sessions that have failed QA. Similarly, this file can be input directly into QuNex commands with the --sessions parameter, even into run_qa itself if you wish to investigate the data further.

processing/reports/QA_report{datatype}{tag/config}.txt

This file will contain a human-readable report of the QA outcomes, particularly for sessions that have failed QA. What exactly is contained within the report is highly dependent on the QA run, but will typically explain why sessions failed QA and what precisely went wrong.

processing/reports/QA_report{datatype}{tag/config}.yml

This file output has all the same information as the above but in a machine-friendly .yml format. It also contains internal variables that may be useful to those developing pipelines off the outputs of `run_qa.

The Configuration file

Because the QA needed varies greatly between datasets, run_qa is designed to be highly user-customizable, controlled through a user-created configuration YAML file. If you're unfamiliar with YAML format, see the documentation here.

In this file, users can define nested parameter-value pairs and sequences pertaining to your data. Basically, it allows you to tell run_qa what things you want to check in your data and what you expect them to be.

The contents will be quite different depending on the QA type you're running and your data, but it should follow this basic format:

datatypes:
    <Specified Data-type 1>:
        <param>:<value>
        <param>:
            <sub-param>:<value>

    <Specified Data-type 2>:
        - <sequence param>:
            <sub-param>:<value>
            <sub-param>:
                <sub-sub-param>:<value>

config:
    <Additional config options>

Parameters and sub-parameters must be within the scope of their corresponding datatype or parameter. They can either be specified directly as key-value pairs, or as yaml sequences starting with - depending on the data type. See below for data type specific parameters and config creation.

Data Types

Only the below data types are currently supported.

Raw Data QA (--datatype=raw_data)

Raw Data QA checks whether found scans are in-line with the scan Protocol, defined by the user in the supplied config. Run after import_<datatype>, this does various checks to ensure data is valid before processing. The main goal is to identify problematic sessions before you start processing, saving time and resources. It should also prevent users from needing to manually identify missing/misordered scans.

To specify Raw Data QA in your config, it must be added underneath datatypes as raw_data:

datatypes:
    raw_data:

- scan

For each scan/image the user wishes to QA, they must add a corresponding scan-config in their configuration file with the tag - scan. Each scan-config must have the series_description parameter, which is used to identify which image (as labeled in the session.txt file) you are attempting to QA.

datatypes:
    raw_data:
        - scan:
            series_description: T1w

        - scan:
            series_description: BOLD1

        - scan:
            series_description: BOLD2

Note: this series_description field also accepts the user of wildcards, *, or specifying multiple acceptable scans with |.

        - scan:
            series_description: T1w run-1|T1w run-2

        - scan:
            series_description: BOLD1*

This is all you need to run a basic QA: run_qa will simply check each session has scans that explicitly match the series_description specified. One possible use-case for a config like this is in mapping file verification for create_session_info.

To do more advanced QA, users can add a combination of parameters and sub-parameters. Aside from series_description all are optional, though depending on the data not all are practical.

Here are all potential parameters that can be specified at the scan level:

        - scan:
            series_description: --> Scan identifier, looks in session.txt
            required:           --> Whether scan must be present for a session to pass QA
            dicoms:             --> The number of dicoms before Nifti conversion (from import_dicom)
            session:            --> Contains sub-parameters related to the session.txt file
                <sub-params>
            json:               --> Contains sub-parameters related to the sidecar .json file
                <sub-params>
            nii:                --> Contains sub-parameters related to the Nifti file header
                <sub-params>
session

As you are likely familiar by now, the main identifier for scans in the QuNex session hierarchy is the session.txt file. The first level of QA, under the session key, pertains to the contents of this file.

Here are the potential parameters that can be specified at the session level, all of which are optional:

            json:
                image_count:    --> Number of expected images
                image_number:   --> Associated image number
                scan_index:     --> The scan's index, if multiple images found
                acquisition:    --> The scan's acquisition number, if split into multiple

The goal for these parameters is to help in the scenario that there are multiple images with the same series_description: these allow users to narrow down which images they actually want to QA.

json

In QuNex (and most modern neuroimaging formats), raw Nifti image files are coupled with a JSON file containing processing related information and metadata, also known as 'sidecar JSON' files. These are located with their Nifti images in sessions/<session_id>/nii.

To allow for in-depth customization, json QA allows you to specify any key, so long as they correspond with an actual value in the sidecar JSON. The only exception is the key normalized, which is an easier way to require image normalization than using the associated key. Below are some examples, but these are not exhaustive. We recommend checking the .json files of a pilot subject / similar dataset to find the keys associated with your protocol.

            json:
                normalized:             --> Whether or not it is a normalized image
                RepetitionTime:         --> EXAMPLE
                DwellTime:              --> EXAMPLE
                PhaseEncodingDirection: --> EXAMPLE
                EffectiveEchoSpacing:   --> EXAMPLE

Note: data will be converted to String (text) for actual comparison which may cause issues with mathematical notation

nii

Though sidecar JSON files contain a lot of useful information on Nifti files, some data is only available directly in the Nifti image's header. Therefore, run_qa is also able to read Nifti images' headers for key-value validation. Similar to JSON QA, this step has no requirements on keys other than that they exist in the data. The only exception is the data_shape key, which gives info not available in the header.

This is more advanced than JSON QA, as Nifti images are not simple text files you can print, you will need to use some other software to read them yourself. However, there are many options for this.

            nii:
                data_shape:             --> Data shape of the acquired data, specified as an array

- scan Example QA

Here we will run an example raw_data -scan QA. Though you can use this as a base, the keys and values must be adjusted according to your data and analysis.

Below is our session.txt file after initial import.

11:   Localizer [1/3]
12:   Localizer [2/3]
13:   Localizer [3/3]
21:   T1w_MPR
31:   T1w_MPR
41:   T2w_SPC
51:   T2w_SPC
61:   SpinEchoFieldMap_AP
71:   SpinEchoFieldMap_PA
81:   Resting_AP
91:   Resting_AP_SBRef
101:  Resting_PA
111:  Resting_PA_SBRef

And here is our mapping file. A core use-case for raw_data is that it can be used to validate HCP image mapping will work correctly, and should be setup with the mapping file in mind.

31                   => T1w
51                   => T2w
SpinEchoFieldMap_AP  => SE-FM-AP
SpinEchoFieldMap_PA  => SE-FM-PA
Resting_AP           => bold:rest
Resting_AP_SBRef     => boldref:rest
Resting_PA           => bold:rest
Resting_PA_SBRef     => boldref:rest

Let's start with the anatomical data, that being the T1 and T2 structural images.

You'll notice that we have two images for each scan: these are the original image and the normalized image. It is somewhat common in many protocols to save both and it serves as good example. In the mapping, we specify the second, normalized, image of each pair using the image number from our session.txt as this is the one we want to use in preprocessing. If we were to map both it would attempt to average them which we do not want. Therefore, we must ensure that image exists in our QA config using the normalized and image_number keys.

We may also want to ensure the data matches our defined protocol. Therefore, we will also check the data matrix shape, Repetition Time, Echo Time, and Dwell Time.

Furthermore, in this example dataset we have sessions from both GE and Siemens machines which have different protocols. We should evaluate the device manufacturer as well to help differentiate required processing.

datatypes:
  raw_data:
    - scan:
        series_description: T1w_MPR
        session:
          image_number: 31
        nifti:
          data_shape: [208, 300, 320]
        json:
          RepetitionTime: 2.5
          EchoTime: 0.00207
          Manufacturer: "Siemens"
          DwellTime: 6.5e-06
          normalized: True

Above, we have only included the normalized image as that's the one we need. However, if you want to QA multiple images that share the same name, you can do so by specifying parameters as a list:

    - scan:
        series_description: T2w_SPC
        session:
          image_number: [41, 51]
        nifti:
          data_shape: [[208, 300, 320],[208, 300, 320]
        json:
          EchoTime: 0.00207
          normalized: [False, True]

Here, for the T2 scan, we are now requiring both the original and normalized images. Again, to ensure our mapping is correct we require the normalized image be the second scan using the image_number key. Otherwise, as these images are identical, we don't need to differentiate for any of the other parameters. You can either continue to specify values as a list, but keep them the same (like for data_shape), or just leave them as one value (like for EchoTime). It will extrapolate single values to multiple scans where possible, but if there is a mismatch for multiple values (eg. you've specified three values for EchoTime, but only two for normalized) then it will cause an error.

Now let's evaluate the fieldmaps. We want to ensure we have a single SpinEcho for each direction, but as we do not care about the specific image number we will instead use the image_count key.

In addition to our basic parameter validation, we also want to make sure that the direction (AP/PA) actually matches how the scan is labelled using the PhaseEncodingDirection key.

    - scan:
        series_description: SpinEchoFieldMap_AP
        session:
          image_count: 1
        nifti:
          data_shape: [90, 90, 60]
        json:
          RepetitionTime: 6.2
          EchoTime: 0.06
          PhaseEncodingDirection: j-
    
    - scan:
        series_description: SpinEchoFieldMap_PA
        session:
          image_count: 1
        nifti:
          data_shape: [90, 90, 60]
        json:
          RepetitionTime: 6.2
          EchoTime: 0.06
          PhaseEncodingDirection: j

Finally, we have our two resting-state functional images, one AP and one PA, and their reference images.

In this example, we know that some sessions were acquired with more than just two resting Bold images. However, as this will not negatively impact our preprocessing we won't make this a condition for failing QA. Rather, for the sake of showcasing possibilities, we will simply require that sessions have at least one AP bold rest image, and one PA. You can do this by neglecting to specify the session subconfig:

    - scan:
        series_description: Resting_AP
        nifti:
          data_shape: [90, 90, 60, 333]
        json:
          RepetitionTime: 0.9
          EchoTime: 0.035
          PhaseEncodingDirection: j-

    - scan:
        series_description: Resting_AP_SBRef
        nifti:
          data_shape: [90, 90, 60]
        json:
          RepetitionTime: 0.9
          EchoTime: 0.035
          PhaseEncodingDirection: j-

    - scan:
        series_description: Resting_PA
        nifti:
          data_shape: [90, 90, 60, 333]
        json:
          RepetitionTime: 0.9
          EchoTime: 0.035
          PhaseEncodingDirection: j

    - scan:
        series_description: Resting_PA
        nifti:
          data_shape: [90, 90, 60]
        json:
          RepetitionTime: 0.9
          EchoTime: 0.035
          PhaseEncodingDirection: j-
⚠️ **GitHub.com Fallback** ⚠️