1. Description of Metrics - djkurran/Automated-framework-for-evaluating-microwave-and-multi-modality-breast-images GitHub Wiki


Images are input to the workflow as pairs: a test and a reference. The workflow is comprised of two components that are successively applied to each test-reference pair. First, an image processing component segments images into regions. These regions, or in some cases, response objects, are sets of pixels (or voxels) that characterize the region. In microwave tomography images, pixels can be classified into four types of tissue subject to their dielectric properties: adipose, transition, fibroglandular, and malignant. Based on the intensity of the backscattered energy, pixels within microwave radar images are classified into dominant scatterers (or malignant tissue). According to the degree of scattering at interfaces between tissues of disparate densities, ultrasound images are segmented to classify dense tissues and scattering from the skin surface. The output of the segmentation process is a mask which is a binary image that outlines the shape of the segmented region. The segmentation methodology and the preparation of images for analysis are summarized in section 1.1.

A mask embodies the borders, shapes, locations, and geometric features of a region. Consequently, masks are used to compare the geometric expression of regions across the test-reference image pair. This is implemented by the image analysis component that applies metrics to pairs of test-reference masks. The metrics used for image analysis are listed in Table 1. Metrics 1-8, 16, and 17 are applied to the tissue masks to evaluate the accuracy with which the geometric properties of the underlying structures are reconstructed. These metrics are described in more detail in section 1.2.

The masks are applied to the images to extract pixels bound by the regions represented by the masks. Hence, the extracted regions contain both geometric and pixel information. For the microwave tomography example, the pixel information is related to the dielectric properties reconstructed within the region by the reconstruction algorithm. Metrics 18-19 are applied to these regions to evaluate the accuracy with which both the geometric and dielectric properties of these underlying structures are reconstructed by microwave tomography. A detailed description of the metrics is in section 1.3.

For the microwave radar examples, pixel information is related to the backscattered energy reconstructed within the region by the reconstruction algorithm. Metrics 20-32 are applied to microwave radar images (i.e., backscattered energy maps) and are used for the analysis of backscattered energy properties and responses corresponding to regions where scatterers are located. These metrics are described in detail in section 1.4.


Figure 1. Demonstration of steps taken to prepare a region for image analysis. For this example, the reference and test images are segmented into regions dominated by glandular tissue. Masks represented as binary images are constructed from the regions. Segmented mask of glandular region is applied to reference and test images to extract reference and test tissues.

1.1 Preparation of images for analysis

To facilitate the assessment of image quality, the test and reference images are preprocessed to delineate the breast interior. Using a segmentation technique, the interior is further partitioned into regions dominated by a tissue type. The preprocessing steps and the segmentation approaches that are implemented are dependent on the reference image and the type of image that was reconstructed. Hence, these methods are discussed in more detail as each example is presented.

In general, when the segmentation technique is applied to the reference image, the image is segmented into regions that form binary images. There is a reference mask, refmask, for each region partitioned. For example, for cases where an image is partitioned into a region dominated by malignant tissue, the reference mask is interpreted as follows. For a pixel with a value of 1 in the reference mask, the corresponding pixel in the reference model is associated with malignant tissue. Likewise, for a pixel with a value of 0 in the reference mask, the corresponding pixel in the reference model is not associated with malignant tissue. The same interpretation applies to the reference masks for the other segmented regions.

The reference masks are applied to the reference image to extract the values of the associated tissues. These segmented property values are referred to as the reference tissue, reftissue. Figure 1 demonstrates an example of a reference mask constructed from a region of glandular tissue and a tissue mask formed when the refmask is applied to the reference image.

Likewise, each of the segmented regions of the test image forms a test mask, testmask, of the corresponding region of the breast. The test masks are binary images with the same interpretation as the reference masks. The test masks are applied to the test images. These segmented property values are referred to as the test tissue, testtissue. Figure 1 shows an example of a test mask constructed from a test image and the tissue mask formed when the testmask is applied to the test image.

To compare shapes, sizes, geometric features, and locations of regions across test-reference masks, metrics are applied to the masks. The metrics are described in detail in the following sections.


Figure 2. Regions of the test and reference masks used to form overlap metrics. Regions that denote True positive (TP), False Positive (FP), False negative (FN), and True Negative (TN) are shown. The region of interest is the region contained in the box that bounds the union of the test and reference masks.

Figure 3. Illustration of the Artifact-rejection Ratio (AR) and the Ratio-of-Detection (RD) metrics. (a) Reference mask (red) and test mask (black) intersect forming region of intersection (dark red). (b) Pixels outside region of intersection and reference region used to calculate Artifact-rejection Ratio. (c) Region of intersection used to calculate Ratio-of-Detection.


Table 1. List of metrics


1.2 Analysis of geometric properties of reference and test masks

The reference and test masks contain geometric property information. Therefore, metrics are applied to these test-reference mask pairs to evaluate the accuracy with which the geometric properties of the underlying structures are reconstructed. When applying the metrics, the test-reference masks are evaluated within a region of interest defined as the box that bounds the union between the reference and test masks and is shown in Figure 2. Each contour (or surface) and associated region (or object) is compared with the connected reference contour/surface and region/object.

The accuracy of the geometric properties of the underlying structures of the test regions is measured with the similarity between the reference and test masks using the normalized cross-correlation function given by [1],

where the two 2D masks (or 3D masks) to be compared are vectorized. The xcorrGeo value varies from 0 (no similarity) to 1 (perfect similarity). Distortion of the structure and the presence of artifacts decrease the value of the metric.

The goodness of fit between the reference and test masks is evaluated with the normalized root mean square error (NRMSE) cost function (see [2], for example) and is evaluated with,

The goodness of fit varies from –infinity (bad fit) to 1 (perfect fit).

The Jaccard similarity coefficient is defined as the size of the intersection (i.e., portion of the test region that is in agreement with the reference region) divided by the size of the union (portions that are in both the test or reference region) of the regions (Figure 2), and is evaluated with [3],

The Dice similarity coefficient is defined as the size of the intersection divided by the average size of the test and reference regions, and is given by [3],

The Dice and Jaccard coefficients are referred to as overlap metrics and are similar except that the Dice measure gives twice the weight to agreements. The metrics are reviewed in [3] and suggest that neither metric is superior to the other.

The Ratio-of-Detection (RD) metric measures the proportion of the reference mask that has been correctly reconstructed (see Figure 3.c), and is defined as [1]:

where |∙| is the cardinality of non-zero pixels within a mask. The measure implies sensitivity of the reconstruction algorithm to the reference tissue. Values close to zero imply that the algorithm is insensitive to the reference tissue, as a very small proportion of the reference tissue is reconstructed correctly within the reference region. Conversely, values close to 1 imply that the reconstruction algorithm is sensitive to the reference tissue, as most of the reference tissue is correctly reconstructed within the reference region.

The Artefact-rejection Ratio (AR) metric measures the proportion of tissue incorrectly reconstructed as reference tissue outside the reference region and is given by [1],

where |𝐫𝐞𝐟_𝐦𝐚𝐬𝐤∩test_𝐦𝐚𝐬𝐤| is the cardinality of non-zero pixels (or voxels) that are in both the reference and test masks. The measure implies the specificity of the reconstruction algorithm to the reference tissue. For example, there may be artefacts in the fat and glandular regions of the reconstructed image having the same dielectric properties as malignant tissue that decrease the value of the AR. Hence, small or negative values of AR indicate that a large proportion of tissue has been incorrectly reconstructed as malignant tissue outside the tumor region. Conversely, values close to 1 imply that only a small proportion of the malignant tissue is reconstructed outside the tumor region. That is, there are very few artefacts in the fat and glandular regions incorrectly reconstructed as malignant tissue. Refer to Figure 3.b for an illustration of the Artefact-rejection Ratio metric.

The specificity of the reconstruction algorithm is measured directly with (see [4] for example),

where the true negative pixels/voxels (TN) are all pixels (or voxels) in the region of interest that are not in the reference or test masks (Figure 1). For the fat region, the region of interest is the breast interior (i.e., inside the skin-fat interface). For the glandular, and malignant tissue, the region of interest is a box that bounds both the reference and test tumor masks (Figure 1). This measure is dependent on the size of the region of interest. Importantly, this means that the larger the region of interest, the closer the value of the measure is to 1. Hence, large values due to a large region of interest relative the reconstructed and reference masks may not be reflective of the presence of artifacts that degrade image quality.

The precision of the reconstruction algorithm is measured with [4],

and is the proportion of the test mask that has been correctly reconstructed as the reference tissue.

A radiologist may use the microwave/acoustic imaging tool for radiation planning. For this scenario, the shape fidelity of the outline of the tumor reconstructed by the microwave imaging technique to the true lesion is important. The ‘overlap ratio’ metrics are relatively insensitive to under or over estimation of the tumor region [5], so they may not be appropriate for evaluating the effectiveness of the modality for shape fidelity. For this application, the Hausdorff metric may be more useful. For example, it is particularly sensitive to the ‘panhandle problem’ as described in [5]. The ‘panhandle problem’ refers to the scenario where the reconstructed region deviates from the reference region over some local region (e.g., along the margins of a tumor) that doesn't take up much area, but results in a large shape difference. The Hausdorff metric is also sensitive to cases where there are sub-regions within the reconstructed region are disconnected but are incorrectly identified as part of the reference region [5].

Figure 4. Computing Hausdorff forward distance. Edge point a1, sampled from boundary of $test_{mask}$, is selected. The distance between a1 and all edge points (b1, b2, …, bNb) sampled from $ref_{mask}$ are computed. The minimum over this set of distances is selected. This process is repeated for all edge points sampled from $test_{mask}$ resulting in a set of minimum distances. The maximum over the set of minimum distances is the Hausdorff forward distance.

Edge points sampled along boundaries of $test_{mask}$ and $ref_{mask}$ are denoted as test = {a1, a2, …, aNa} and ref = {b1, b2, …,bNb}, respectively. Here, $N_{a}$ and $N_{b}$ denote the number of edge points sampled along the boundaries of the $test_{mask}$ and $ref_{mask}$ ,respectively. The edge points spatially characterize the shape of the $test_{mask}$ and $ref_{mask}$. Accordingly, the Hausdorff distance provides a method to evaluate how closely the shape of the test mask matches the shape of the reference mask. Note that for 3D scenarios, in order to evaluate the average Hausdorff distance, a point cloud is used to estimate the surface of the extracted test and connected reference objects. The average Hausdorff distance metric is applied to the point clouds. It is assumed that the distance between the points in ref and test is defined by the Euclidean distance (or the L2 norm) given by,

This measure is then used to evaluate the distance between a point 𝑎∈test and a set of points along the contour of the reference region, and is defined by,

The metric expressed by (10) is illustrated in Figure 4. This process is repeated for all edge points of the test mask, and the maximum distance (or mismatch) is selected. This is the Hausdorff forward distance, and is expressed as,

The Hausdorff forward distance identifies the point on the test interface that is furthest from any point on the reference interface and measures the distance from a nearest neighbor on the reference interface using the Euclidean distance. That is, (11) ranks each point on the test contour based on the distance to the nearest point on the reference contour and uses the largest ranked point as the distance. This point can be interpreted as the most mismatched point. Hence, all other points on the interface of the test mask, must be within this distance.

Similarly, the Hausdorff reverse distance is evaluated by applying the same procedure for each of the edge points of the reference mask. That is,

The Hausdorff distance combines the forward and reverse distances and is computed with [5],

The Hausdorff distance measures the degree of mismatch between the two sets by measuring the distance of the point on the test interface that is furthest from any point on the reference interface and visa versa.

In practice, taking the maximum of all the distances may provide a misleading indication of the mismatch between the shapes of the test and reference masks. For example, a segmentation error may manifest as an outlier which can significantly impact the Hausdorff distance. A possible remedy to mitigate the negative impacts on the accuracy of this measure due to the presence of outliers, is to implement a variant of the Hausdorff distance referred to as the average Hausdorff distance. For this variant, the average forward Hausdorff distance is given by,

and the average reverse Hausdorff distance is computed with,

The average Hausdorff distance proposed by [6] is evaluated with (14) and (15) and is expressed with,

The test and reference images are compared with the Normalized Root Mean Square Error (NRMSE) metric, given by [7]

and N is the number of pixels in the images. This metric provides a measure of the average relative difference between two images, with a higher number indicating a greater discrepancy.


1.3. Analysis of geometric and dielectric properties of tissue masks

In some cases, the reference image is set to the forward model used to generate numerical data. The reference masks constructed from these images, correspond to the ground truth of the tissue type represented by the mask. This is demonstrated by the example shown in figure 1 where a reference mask of the glandular tissue is constructed from the reference image.

The mask is applied to the reference image to extract the tissue properties associated with the glandular region. These segmented property values are referred to as the reference tissue, reftissue, of the regions as shown in Figure 1. Likewise, the test masks are applied to the test images. These segmented property values are referred to as the test tissue, testtissue, of the region.

The reference and test tissues contain both geometric and dielectric property information. Therefore, the metrics that are applied to these regions evaluate the accuracy with which both the geometric and dielectric properties of these underlying structures are reconstructed. This aspect of accuracy is measured with the similarity between the reference and test tissue profiles using the normalized cross-correlation function given by [1]

Distortion of the structure and the presence of artifacts are sensed by the metric. Furthermore, the metric given by (18) measures how accurately the electric properties are reconstructed within the structure.

Similarly, the goodness of fit between the reference and test tissues is evaluated with (see [2], for example)

The goodness of fit varies from –infinity (bad fit) to 1 (perfect fit).


Figure 5. Methodology used to calculate FWHM for 2D scenarios. A disk, centered on the region associated with the dominant scatterer, is created. Radius of disk incrementally increases until mean backscattered energy intensity computed over the disk is one half of the maximum backscattered energy intensity within the region associated with the dominant scatterer.

1.4. Analysis of backscatter energy properties

a) Analysis of 2D microwave radar images

For the 2D examples, the reference image is set to the forward model that is used to generate the numerical electromagnetic fields. The segmentation algorithm delineates the tumor region to construct a $ref_{mask}$. The test image is set to the backscattered energy image reconstructed from the backscattered fields. The test image is partitioned based on the intensity of the reconstructed backscatter energy. The segmentation algorithm delineates the region associated with a dominant scatterer to construct a test mask, $test_{mask}$.

Overlap metrics 4-6 (Dice coefficient, Ratio-of-Detection, and Artefact-rejection Ratio), are applied to the reference and test masks to compare how closely the region associated with a dominant scatterer in the backscattered energy image matches the tumor region within the forward model. To complement the overlap metrics, the similarity in shape between the region associated with a dominant scatterer and the tumor reference region is evaluated with (16). Metric 16 is the average Hausdorff distance.

The localization error is the distance between the center of the tumor test and reference masks, and is given by

The metric is used to imply the shift in the tumor response corresponding to the region associated with a dominant scatterer in the test image relative to the actual tumor location within the reference image.

The maximum backscattered energy intensity within the region associated with a dominant scatterer is determined and used to calculate the Full Width half Maximum (FWHM) (see [8], for example). The FWHM is calculated by creating a disk centered on the region associated with a dominant scatterer. The radius of the disk, which is the region of interest, incrementally increases. After each incremental increase in the size of the radius, the mean intensity over the area of the disk is computed. The incremental process continues until the mean intensity that is computed over the disk is one half of the maximum backscattered energy intensity within the region associated with a dominant scatterer of the backscatter energy image.

The methodology used to compute the FWHM is summarized in Figure 5 and the calculation is expressed by (21) of table 1 and is given by

The FWHM is a measure that may be used to imply the sharpness of the tumor response reconstructed in the test image. A higher value implies that the reconstructed response is spread out and smeared. Lower values are considered better, as they imply a sharp reconstructed response that has not spread out.

Similar to the FWHM calculation, the signal-to-mean ratio (SMR) is calculated by first evaluating the maximum backscattered energy intensity of the region associated with a dominant scatterer. The $test_{tissue}$ (for 2D case, or the $i^{th}$ segmented test response for the 3D case) is then subtracted from the breast interior, and the mean intensity over the resulting region is evaluated. The ratio of the maximum backscattered energy intensity of the region associated with a dominant scatterer to the mean intensity external to the response is used to determine the SMR using

where $M$ represents the non-zero pixels (or voxels for 3D scenarios) of the region that results when the region associated with a dominant scatterer is subtracted from the breast interior, $test(i)$ is the tumor response segmented from the imaging domain for 3D scenarios (i.e., comprised of voxels), and $test(i)$ is replaced with $test_{tissue}$ for 2D scenarios. For 2D scenarios, $test_{tissue}$ is the backscattered energy bound within the test mask of the dominant scatterer corresponding to the response that arises from the tumor region. It is constructed by applying the testmask to the test image.

Note that (22) is similar to the form presented in [9], except that the mean is calculated over the region outside the tumor response, rather than over the entire imaging domain. The SMR is an image quality metric that may be used to imply both the intensity of the response and the presence of clutter, noise, and artefacts within the imaging domain but external to the response.


Figure 6. Methodology used to calculate FWHM for 3D scenarios. A sphere (shown as pink object) with centroid that coincides with location of the region (shown as blue object) associated with a dominant scatterer is created. Radius of sphere incrementally increases until mean backscattered energy intensity computed over the sphere is one half of the maximum backscattered energy intensity within the region associated with a dominant scatterer.

b) Analysis of 3D microwave radar images

For the 3D examples, the reference and test images are set to two different backscattered energy images reconstructed from the same backscattered fields. However, there is a deviation in the parameters used by the reconstruction operator which leads to differences in the images. The aim is to apply metrics to the images and segmented regions within the images in order to measure the differences.

The segmentation algorithm is applied to the reference and test images to delineate responses in the images that arise due to scattering from malignant tissue or other high contrast interfaces (i.e., discontinuity of dielectric properties across tissue interface). The segmented regions form binary three-dimensional masks, referred to as response objects. Each response object is a region of connected voxels bound by a closed surface and are associated with a dominant scatterer in the reference or test image. The $i^{th}$ test object is identified as $test(i)$, and the $j^{th}$ reference object is identified as $ref(j)$.

1) General Analysis

Each test object is applied to the test image to extract voxels with values corresponding to the backscattered energy intensity within a region associated with a dominant scatterer. The maximum intensity, $I_{max,test(i)}$, over all voxels bound by the $i^{th}$ test object is determined with

where $I$ is backscattered intensity value assigned to the $n^{th}$ voxel $v_n$ bound by the test object.

Next, the volume of the $i^{th}$ test object, $VOL_{test(i)}$, comprised of $N$ voxels $v_n$ is determined using

The value of $VOL_{test(i)}$ implies the extent of the reconstructed response in the direction of each coordinate axes. It may be used to complement the maximum intensity metric, $I_{max,test(i)}$, and the FWHM metric to characterize the response.

The coordinates of the maximum intensity, $I_{max,test(i)}(\vec{r})$, within the test object are evaluated and are used to compute the FWHM of the extent of the response using (21). The FWHM is extended to 3D scenarios by using a sphere, embedded in the test image, having a centroid that coincides with the location of maximum intensity within the test object. The radius of the sphere, which is the region of interest, incrementally increases. After each incremental increase in size of the radius, the mean intensity of the voxel values bound by the sphere is computed. The incremental process continues until the mean intensity of the voxels bound by the sphere is one half of the maximum intensity at the centroid.

The result corresponds to the FWHM when the incremental process has terminated. The 3D variant of the methodology for evaluating the FWHM is summarized in Figure 6. As described for the 2D scenario, the FWHM is a measure that may be used to imply the sharpness of the tumor response reconstructed in the test image. A higher value implies that the reconstructed response is spread out and smeared. Lower values are considered better, as they imply a sharp reconstructed response that has not spread out.

Finally, the maximum backscattered energy intensity of the test object used to compute the signal-to-mean ratio (SMR) with (22) that has been adapted for 3D scenarios. The evaluation is demonstrated with Figure 7. The $i^{th}$ test region, shown as the blue object in Figure 7 is subtracted from the breast interior (pink object), and the mean backscattered energy intensity over all voxels within the resulting region is evaluated. The ratio of the maximum backscattered energy intensity of the region associated with a dominant scatterer to the mean intensity is used to determine the SMR.

The analysis is repeated for each of the reference objects. That is, each reference object is applied to the reference image to extract voxels with values corresponding to the backscatter energy intensity within a region associated with a dominant scatterer. The maximum intensity, $I_{max,ref(j)}$, over all voxels bound by the $j^{th}$ reference object is determined with

where $I$ is backscattered intensity value assigned to the $n^{th}$ voxel $v_n$ bound by the reference object.

The volume of the $j^{th}$ reference object, $VOL_{ref(j)}$, comprised of $N$ voxels $v_n$ is determined using

The values for FWHM and SMR are computed for each reference object with (21) and (22), respectively. When using (22), $ref(j)$ is used instead of $test(i)$.

To assist with the interpretation of the results, the maximum backscattered energy intensity of each test object is scaled to the maximum value over the imaging domain of the test image. The value, along with the other values calculated for the general analysis are presented in a table. The objects are displayed within the imaging domain bound by the tessellated breast surface. A similar table and display are provided for the reference objects.

Figure 7. Example of SMR calculation for 3D scenarios. Maximum intensity over test object (shown in blue) is computed. The mean intensity is calculated over all voxel values within the breast interior (shown in pink), but outside the test object. Ratio of maximum intensity over the mean intensity is used to evaluate SMR.

2) Geometric analysis

For the 3D examples presented in sections 5 and 6, the parameter values used by the reconstruction operator to form the test image differs from those values used to reconstruct the reference image. However, the reconstruction operator is applied to the same backscattered fields calculated from the same forward model. Accordingly, the backscatter energy intensities are reconstructed on voxels within an imaging domain with the same coordinates. Therefore, as a preprocessing step before applying the geometric metrics to a test object, the reference objects are transformed to a space shared by the test object.

The test object under investigation is then examined to determine if it is connected to any of the reference objects. The objects are connected if they share any of the same voxels. If this is the case, then the connected reference and test objects are evaluated within a region of interest defined as the box that bounds the union between the reference and test objects. The region is represented by a 3D variant of Figure 1. A comprehensive geometric property analysis is then performed to assess the overlap between these objects using the metrics described in section 1 (e.g. Dice Coefficient, RD, AR). If the test object is not connected to any reference object, then the geometric analysis is not prescribed.

Since the reference and test images are reconstructions, the aim of the overlap metrics is to measure how closely responses in the test image that arise due to scattering from high contrast interfaces match corresponding responses within the reference region. Results are presented in section 5.5, but are not included in the submitted manuscript [11]. When comparing reconstructions, the geometric analysis metric values are typically more meaningful when measuring changes to a dependent variable due to a small perturbation of an independent variable, rather than a large change. The metric values are also most meaningful when the ground truth is used as the reference image. That is, when the responses reconstructed within the test image are compared with malignant tissue regions within the forward model.

3) Comparative backscatter energy analysis

For each test object, the nearest reference object is identified (i.e., shortest distance between centroids). For example, for the $i^{th}$ test object $test(i)$ the nearest reference object is the $j^{th}$ reference object $ref(j)$.

The maximum backscattered energy intensity within $test(i)$ and $ref(j)$ is $I_{max,test(i)}$ and $I_{max,ref(j)}$, respectively. The coordinates of these maximum intensities, $I_{max,test(i)}(\vec{r})$ and $I_{max,ref(j)}(\vec{r})$, are used to evaluate the shift-in-response between $test(i)$ and $ref(j)$, respectively, with

where $\vec{r}$ is a coordinate within the imaging domain.

The intensity ratio measures the ratio between maximum intensities within $test(i)$ and the nearest reference object $ref(j)$ and is given by

A ratio that exceeds $1$ may imply a stronger and more intense response that has been reconstructed in the test image relative to the reference image. Otherwise, the response reconstructed in the test image is diminished and weaker, relative to the response in the reference image. Of course, the metric is most meaningful when the reconstructed test response under investigation is compared with a reference response that is in close proximity. This implies that the test response and the corresponding reference are associated with the same dominant scatterer.

The volume ratio measures the ratio between volumes $test(i)$ and the nearest reference object $ref(j)$ and is expressed with

The volume ratio is intended to complement the FWHM metric to characterize a change in the test response relative to the reference. A ratio that exceeds $1$ may be interpreted as a smearing of the response in the test image relative to the reference image. Otherwise, a more focused and sharper test response relative to the reference image may be implied. Similar to the shift-in-response (27) and the intensity ratio (28), the metric is most meaningful when the reconstructed test response under investigation is compared with a reference response that is in close proximity.

The change in FWHM, $\Delta FWHM$, is simply the difference in FWHM between the test object and the nearest reference object, and is given by

A negative distance suggests the extent of the test response is reduced and more compact relative that of the reference response. A positive distance suggests a broadening or smearing of the test response compared to the reference response. The metric is intended to complement the volume ratio (29) to characterize alterations of the test response relative to the reference. However, it is paramount that the shift-in-response (27) and intensity ratio (28) be included in the analysis to obtain an informed account of an alteration of a response.

The percent change in SMR relative to the reference, $\Delta SMR$%, is calculated with

As indicated in section 1.4(a), the SMR is an image quality metric that may be used to imply both the intensity of the response and the presence of clutter, noise, and artefacts external to the response. Hence, a negative percent change may imply that the intensity of the test response has diminished relative to the reference. This possibility may be supported by a decrease in the intensity ratio (i.e., Intensity Ratio $<$ $1$). Likewise, given there is no change in the relative intensity (i.e., Intensity Ratio $\simeq$ $1$), it may imply an increase in clutter, noise, and artefacts within the test image. Of course, a negative change in SMR often implies a combination of factors such as a decrease in intensity of the test response and an increase in clutter within the test image relative to the reference. A positive percent change may imply the opposite, such as an increase in the intensity of the test response or a decrease in clutter external to the test response relative to the reference.

Note that the SMR is calculated over the entire imaging domain that is external to the segmented response (or response object). Artifacts, clutter, and other responses may be present in the test image but are not in close proximity to the response being investigated. This may influence the SMR value that is calculated but may not have any impact on the FWHM value or volume ratio that is calculated. Therefore, it is important to take into account all of the comparative metrics when assessing the impact that a change in a dependent variable, for example, has on a response.

4) Comparison of 2D slices along each coordinate axis

Two-dimensional slices of the test and reference imaging domains are compared along each of the coordinate axes with the Normalized Root Mean Square Error (NRMSE) [7]. The metric is given by (17) except $ref_{mask}$ and $test_{mask}$ are replaced with two-dimensional slices of the reference and test imaging domains, respectively, at a point along a coordinate axis.

Each slice is preprocessed by removing pixels not contained in the imaging domain. Inclusion of these pixels may reduce the accuracy of the metric. For example, if both the test and reference slices contain a large number of background pixels, then a low metric value (i.e., high degree of similarity) may be calculated. This may give a researcher a misleading indication that the slices are closely matched. The metric is then applied to vectorized versions of the preprocessed slices to measure the normalized distance between the test and reference vectors. Hence, it heuristically informs us how 'close' the test image is to the reference image. A value of zero indicates a perfect match between images; a large value indicates that there are significant differences between test and reference images. For the 3D examples, the NRMSE implies how much the test image has changed relative to the reference image due to the perturbation of the reconstruction operator parameters.

The NRMSE is applied over 2D slices evenly spaced along a coordinate axis, resulting in a set of values that can be plotted. The process is repeated for each coordinate axis. The plots may be used as a tool to evaluate how the test and reference images deviate from each other along each of the coordinate axes.

To complement the NRMSE analysis, the Structural Similarity Index (SSIM) presented in [10] is applied over 2D slices of the test and reference imaging domains evenly spaced along a each of the coordinate axes. The SSIM measures perceptual differences between two similar images. The computed values range from $0$ to $1$, where $1$ indicates that there are no perceptual differences between the test and the reference images.

The SSIM Structural Similarity Index of image $x$ using $y$ as the reference image is given by

where $\mu_x$, $\mu_y$, $\sigma_x$, $\sigma_y$, $\sigma_{xy}$ are the local means, standard deviations, and cross-covariance for images $x$, $y$; $C_1 = (0.01L)^2$, $C_2 = (0.03L)^2$, and $L$ is the dynamic range of the image data.

The variables $x$ and $y$ in (32) are replaced with two-dimensional slices of the test and reference imaging domains, respectively, at a point along a coordinate axis. Similar to the NRMSE metric, each slice is preprocessed by removing pixels not contained in the imaging domain.

As presented in [10], the motivation for the development of the metric was to quantify image quality degradation caused by processing such as data compression or by losses in data transmission. In this context, SSIM was used to measure subtle differences between two similar images. For the 3D examples presented in the study [11], the metric is used to quantify differences that arise in the test image relative to the reference image due to a perturbation of a reconstruction operator parameter. The parameter perturbation often manifests as significant changes within the test image, relative to the reference. Under these circumstances, as a similarity metric, the SSIM is of limited utility.

⚠️ **GitHub.com Fallback** ⚠️