models MedImageParse3D - Azure/azureml-assets GitHub Wiki

MedImageParse3D

Overview

Biomedical image analysis is fundamental for biomedical discovery in cell biology, pathology, radiology, and many other biomedical domains. 3D medical images such as CT and MRI play unique roles in clinical practices. MedImageParse 3D is a foundation model for imaging parsing that can jointly conduct segmentation, detection, and recognition for 3D medical images including CT and MRI. Through joint learning, we can improve accuracy for individual tasks and enable novel applications such as segmenting relevant objects in an image through a text prompt, rather than requiring users to laboriously specify the bounding box for each object.

MedImageParse 3D was trained on a large dataset comprising triples of image, segmentation mask, and textual description. It takes in 3D medical image volume with a text prompt about the target object type (e.g. pancreas in CT), and outputs the corresponding segmentation mask in 3D volume the same shape as the input image. MedImageParse 3D is also able to identify invalid user inputs describing objects that do not exist in the image. MedImageParse 3D can perform object detection, which aims to locate a specific object of interest, including objects with irregular shapes or of small size.

Traditional segmentation models do segmentation alone, requiring a fully supervised mask during training and typically need either manual bounding boxes or automatic proposals at inference if multiple objects are present. Such model doesn’t inherently know which object to segment unless trained specifically for that class, and it can’t take a text query to switch targets. MedImageParse 3D can segment via text prompts describing the object without needing a user-drawn bounding box. This semantic prompt-based approach lets it parse the image and find relevant objects anywhere in the image.

In summary, MedImageParse 3D shows potential to be a building block for an all-in-one tool for biomedical image analysis by jointly solving segmentation, detection, and recognition. It is broadly applicable to different 3D image modalities through text prompting, which may pave a future path for efficient and accurate image-based biomedical discovery when built upon and integrated into an application.

Model Architecture

MedImageParse 3D is built upon BiomedParse with the BoltzFormer architecture, optimized for locating small objects in 3D images. Leveraging Boltzmann attention sampling mechanisms, it excels at identifying subtle patterns corresponding to biomedical terminologies, as well as extracting contextually relevant information from dense scientific texts. The model is pre-trained on vast 3D medical image datasets, allowing it to generalize across various biomedical domains with high accuracy.

Sample inputs and outputs (for real time inference)

Input

import base64
data = {
    "input_data": {
        "columns": ["image", "text"],
        "index": [0],
        "data": [
            [
                # Base64-encoded .nii.gz data:
                base64.b64encode(open("./examples/example.nii.gz", "rb").read()).decode("utf-8"),
                # Example text/string input:
                "pancreas"
            ]
        ]
    }
}
  • "columns" describes the types of inputs your model expects (in this case, "image" and "text").
  • "data" is where you actually provide the values: the first element is the Base64-encoded NIfTI, and the second is a text parameter (e.g., "pancreas").

Output

[
  {
    "nifti_file": "{\"data\":\"<BASE64_ENCODED_BYTES>\"}"
  }
]

The field "<Base64EncodedNifti>" contains raw binary NIfTI data, encoded in Base64.

The provided function decode_base64_to_nifti handles the decoding logic:

import json
import base64
import tempfile
import nibabel as nib

def decode_base64_to_nifti(base64_string: str) -> nib.Nifti1Image:
    """
    Decode a Base64 string back to a NIfTI image.
  
    The function expects `base64_string` to be a JSON string
    of the form: '{"data": "<Base64EncodedBytes>"}'.
    """
    # Convert the 'nifti_file' string to a Python dict, then extract the 'data' field
    base64_string = json.loads(base64_string)["data"]
  
    # Decode Base64 string to raw bytes
    byte_data = base64.b64decode(base64_string)
  
    # Write these bytes to a temporary file and load as a NIfTI image
    with tempfile.NamedTemporaryFile(suffix='.nii.gz', delete=False) as temp_file:
        temp_file.write(byte_data)
        nifti_image = nib.load(temp_file.name)
  
    # Return the voxel data as a Numpy array
    return nifti_image.get_fdata()

The output can be parsed using:

import json

# Suppose `response` is the raw byte response from urllib
response_data = json.loads(response)

# Extract the JSON-stringified NIfTI
nifti_file_str = response_data[0]["nifti_file"]

# Decode to get the NIfTI volume as a Numpy array
segmentation_array = decode_base64_to_nifti(nifti_file_str)
print(segmentation_array.shape)  # e.g., (512, 512, 128)

Optionally, the plot_segmentation_masks helper function shows slices of the 3D array if they contain non-zero content:

import matplotlib.pyplot as plt 

def plot_segmentation_masks(segmentation_masks):
    """
    Plot each axial slice (z-slice) of the segmentation if it contains a non-zero mask.
    """
    index = 1
    plt.figure(figsize=(15, 15))
    for i in range(segmentation_masks.shape[2]):
        if segmentation_masks[:, :, i].sum() > 0:
            plt.subplot(4, 4, index)
            plt.imshow(segmentation_masks[:, :, i], cmap='gray')
            plt.axis('off')
            index += 1
    plt.show()

Version: 1

Tags

Preview Featured task : image-segmentation industry : health-and-life-sciences displayName : MedImageParse 3D author : Microsoft hiddenlayerscanned SharedComputeCapacityEnabled license : mit languages : en `evaluation :

We benchmarked MedImageParse 3D against task-specific nnU-Net models on AMOS22 CT and MRI datasets. Note that we trained a single model to solve all different tasks solely via text prompting, e.g. "gallbladder in abdomen MRI", while nnU-Net was trained as multiple expert models for each individual object in each modality. Therefore, we made this comparison of one single model v.s. 27 task-specific models.

CT

Dice score (%) aorta bladder duodenum esophagus gallbladder left adrenal gland left kidney liver pancreas IVC right adrenal gland right kidney spleen stomach Average
MedImageParse 3D 95.27 90.17 83.27 87.11 85.96 79.48 96.39 97.71 88.42 92.02 79.39 96.88 96.91 91.49 90.00
nnU-Net 95.20 87.52 80.72 87.31 83.06 78.06 95.39 96.09 86.57 90.38 78.24 93.19 96.91 89.79 88.35
SegVol 92.07 88.03 72.49 64.47 79.05 76.31 94.58 96.24 80.97 83.65 71.07 92.92 94.03 88.82 83.75

MRI

Dice score (%) aorta duodenum esophagus gallbladder left adrenal gland left kidney liver pancreas IVC right adrenal gland right kidney spleen stomach Average
MedImageParse 3D 95.73 76.03 81.38 66.58 63.35 96.92 97.65 88.70 87.26 68.14 96.69 96.88 88.93 84.94
nnU-Net 95.64 66.78 73.62 66.32 57.15 95.82 97.25 79.29 90.66 53.29 85.48 96.66 88.80 80.52

notes :

Intended Use

Primary Use Cases

  • Supported Data Input Format
  1. The model expect 3D NIfTI images by default.
  2. The model outputs pixel probabilities in the same shape as the input image. The probability threshold for segmentation mask is 0.5.
  3. The model takes in text prompts for segmentation and doesn't have a fixed number of targets to handle. However, to ensure quality performance, we recommend the following tasks based on evaluation results. Wil will extend the model capability with more object types including tumors and nodules.
  • CT: abdomen: adrenal gland, aorta, bladder, duodenum, esophagus, gallbladder, kidney, left adrenal gland, left kidney, liver, pancreas, postcava, right adrenal gland, right kidney, spleen, stomach
  • MRI: abdomen: aorta, esophagus, gallbladder, kidney, left kidney, liver, pancreas, postcava, right kidney, spleen, stomach

Out-of-Scope Use Cases

This model is intended and provided as-is for research and model development exploration. MedImageParse 3D is not designed or intended to be deployed in clinical settings as-is nor is it intended for use in the diagnosis or treatment of any health or medical condition, and the model’s performance for such purposes has not been established. You bear sole responsibility and liability for any use of MedImageParse 3D, including verification of outputs and incorporation into any product or service intended for a medical purpose or to inform clinical decision-making, compliance with applicable healthcare laws and regulations, and obtaining any necessary clearances or approvals. When evaluating the model for your use case, carefully consider the impacts of overreliance, including overreliance within the context of radiology specifically and more generally for generative AI Appropriate reliance on Generative AI: Research synthesis - Microsoft Research

Responsible AI Considerations

Microsoft believes Responsible AI is a shared responsibility and we have identified six principles and practices to help organizations address risks, innovate, and create value: fairness, reliability and safety, privacy and security, inclusiveness, transparency, and accountability. When downloaded or used in accordance with our terms of service, developers should work with their supporting model team to ensure this model meets requirements for the relevant use case and addresses unforeseen product misuse. 

While testing the model with images and/or text, ensure that the data is PHI free and that there are no patient information or information that can be tracked to a patient identity.

The model is not designed for the following use cases:

  • Use by clinicians to inform clinical decision-making, as a diagnostic tool or as a medical device - Although MedImageParse 3D is highly accurate in parsing biomedical data, it is not designed or intended to be deployed in clinical settings as-is not is it for use in the diagnosis, cure, mitigation, treatment, or prevention of disease or other conditions (including to support clinical decision-making), or as a substitute of professional medical advice, diagnosis, treatment, or clinical judgment of a healthcare professional. 

  • Scenarios without consent for data - Any scenario that uses health data for a purpose for which consent was not obtained.  

  • Use outside of health scenarios - Any scenario that uses non-medical related image and/or serving purposes outside of the healthcare domain. 

Please see Microsoft's Responsible AI Principles and approach available at https://www.microsoft.com/en-us/ai/principles-and-approach/

Training Data

The training data include AMOS22-CT, AMOS22-MRI.

License and where to send questions or comments about the model

The license for MedImageParse 3D is the MIT license. Please cite our Paper if you use the model for your research. For questions or comments, please contact: [email protected]

Citation

Zhao, T., Gu, Y., Yang, J. et al. A foundation model for joint segmentation, detection and recognition of biomedical objects across nine modalities. Nat Methods (2024). https://doi.org/10.1038/s41592-024-02499-w inputModalities : image outputModalities : text,image keywords : Multimodal inference_compute_allow_list : ['Standard_NC24ads_A100_v4', 'Standard_NC48ads_A100_v4', 'Standard_NC96ads_A100_v4', 'Standard_ND96asr_v4', 'Standard_ND96amsr_A100_v4', 'Standard_ND40rs_v2', 'Standard_NC40ads_H100_v5', 'Standard_NC80adis_H100_v5', 'Standard_ND96isr_H100_v5'] inference_supported_envs : ['hf']`

View in Studio: https://ml.azure.com/registries/azureml/models/MedImageParse3D/version/1

License: mit

Properties

inference-min-sku-spec: 24|1|220|64

inference-recommended-sku: Standard_NC24ads_A100_v4, Standard_NC48ads_A100_v4, Standard_NC96ads_A100_v4, Standard_ND96asr_v4, Standard_ND96amsr_A100_v4, Standard_ND40rs_v2, Standard_NC40ads_H100_v5, Standard_NC80adis_H100_v5, Standard_ND96isr_H100_v5

languages: en

SharedComputeCapacityEnabled: True

⚠️ **GitHub.com Fallback** ⚠️