MedImageParse3D

Overview

Biomedical image analysis is fundamental for biomedical discovery in cell biology, pathology, radiology, and many other biomedical domains. 3D medical images such as CT and MRI play unique roles in clinical practices. MedImageParse 3D is a foundation model for imaging parsing that can jointly conduct segmentation, detection, and recognition for 3D medical images including CT and MRI. Through joint learning, we can improve accuracy for individual tasks and enable novel applications such as segmenting relevant objects in an image through a text prompt, rather than requiring users to laboriously specify the bounding box for each object.

MedImageParse 3D was trained on a large dataset comprising triples of image, segmentation mask, and textual description. It takes in 3D medical image volume with a text prompt about the target object type (e.g. pancreas in CT), and outputs the corresponding segmentation mask in 3D volume the same shape as the input image. MedImageParse 3D is also able to identify invalid user inputs describing objects that do not exist in the image. MedImageParse 3D can perform object detection, which aims to locate a specific object of interest, including objects with irregular shapes or of small size.

Traditional segmentation models do segmentation alone, requiring a fully supervised mask during training and typically need either manual bounding boxes or automatic proposals at inference if multiple objects are present. Such model doesn’t inherently know which object to segment unless trained specifically for that class, and it can’t take a text query to switch targets. MedImageParse 3D can segment via text prompts describing the object without needing a user-drawn bounding box. This semantic prompt-based approach lets it parse the image and find relevant objects anywhere in the image.

In summary, MedImageParse 3D shows potential to be a building block for an all-in-one tool for biomedical image analysis by jointly solving segmentation, detection, and recognition. It is broadly applicable to different 3D image modalities through text prompting, which may pave a future path for efficient and accurate image-based biomedical discovery when built upon and integrated into an application.

Model Architecture

MedImageParse 3D is built upon BiomedParse with the BoltzFormer architecture, optimized for locating small objects in 3D images. Leveraging Boltzmann attention sampling mechanisms, it excels at identifying subtle patterns corresponding to biomedical terminologies, as well as extracting contextually relevant information from dense scientific texts. The model is pre-trained on vast 3D medical image datasets, allowing it to generalize across various biomedical domains with high accuracy.

Sample inputs and outputs (for real time inference)

Input

import base64
data = {
    "input_data": {
        "columns": [ "image", "text" ],
        "index": [ 0 ],
        "data": [
            [
                base64_image,
                "CT imaging of the spleen within the abdomen & Presence of the right kidney detected in abdominal CT images & Abdominal CT showing the left kidney & CT scan of the gallbladder in the abdominal region & CT scan of the esophagus in the abdominal region & Visualization of the liver in abdominal CT imaging & CT imaging of the stomach in the abdomen & Abdominal CT showing aortic structures & Inferior vena cava in abdominal CT & CT scan of the pancreas in the abdominal region & CT imaging of the right adrenal gland in the abdomen & CT imaging of the left adrenal gland in the abdomen & Visualization of the duodenum in abdominal CT imaging & Bladder observed in abdominal CT scans & CT scan of the prostate/uterus in the abdominal region"

            ]
        ]
    }
}

"columns" describes the types of inputs your model expects (in this case, "image" and "text").
"data" is where you actually provide the values: the first element is the Base64-encoded NIfTI, and the second is a text parameter (e.g., "CT imaging of the spleen within the abdomen & Presence of the right kidney detected in abdominal CT images ..." where '&' seperates multiple prompts ).

Output

[
  {
    "nifti_file": "{\"data\":\"<BASE64_ENCODED_BYTES>\"}"
  }
]

The field "<Base64EncodedNifti>" contains raw binary NIfTI data, encoded in Base64.

The provided function decode_base64_to_nifti handles the decoding logic:

import json
import base64
import tempfile
import nibabel as nib

def decode_base64_to_nifti(base64_string: str) -> nib.Nifti1Image:
    """
    Decode a Base64 string back to a NIfTI image.
  
    The function expects `base64_string` to be a JSON string
    of the form: '{"data": "<Base64EncodedBytes>"}'.
    """
    # Convert the 'nifti_file' string to a Python dict, then extract the 'data' field
    base64_string = json.loads(base64_string)["data"]
  
    # Decode Base64 string to raw bytes
    byte_data = base64.b64decode(base64_string)
  
    # Write these bytes to a temporary file and load as a NIfTI image
    with tempfile.NamedTemporaryFile(suffix='.nii.gz', delete=False) as temp_file:
        temp_file.write(byte_data)
        nifti_image = nib.load(temp_file.name)
  
    # Return the voxel data as a Numpy array
    return nifti_image.get_fdata()

The output can be parsed using:

import json

# Suppose `response` is the raw byte response from urllib
response_data = json.loads(response)

# Extract the JSON-stringified NIfTI
nifti_file_str = response_data[0]["nifti_file"]

# Decode to get the NIfTI volume as a Numpy array
segmentation_array = decode_base64_to_nifti(nifti_file_str)
print(segmentation_array.shape)

Optionally, to visualize the output:

# --- Quick visualization of one axial slice ---
import matplotlib.pyplot as plt
import numpy as np

slice_id = 40  # choose an in-bounds slice index along the third axis (H, W, Z)
slice_ = segmentation_array[:, :, slice_id]

# Exclude background (0) when listing labels
labels = np.unique(slice_)
labels = labels[labels != 0]

plt.figure(figsize=(6, 6))
plt.imshow(slice_, cmap="gray")
plt.title(f"Masks @ slice {slice_id} | labels: {labels.tolist()}")
plt.axis("off")
plt.show()

Version: 3

Dice score (%)	aorta	bladder	duodenum	esophagus	gallbladder	left adrenal gland	left kidney	liver	pancreas	IVC	right adrenal gland	right kidney	spleen	stomach	Average
MedImageParse 3D	95.27	90.17	83.27	87.11	85.96	79.48	96.39	97.71	88.42	92.02	79.39	96.88	96.91	91.49	90.00
nnU-Net	95.20	87.52	80.72	87.31	83.06	78.06	95.39	96.09	86.57	90.38	78.24	93.19	96.91	89.79	88.35
SegVol	92.07	88.03	72.49	64.47	79.05	76.31	94.58	96.24	80.97	83.65	71.07	92.92	94.03	88.82	83.75

Dice score (%)	aorta	duodenum	esophagus	gallbladder	left adrenal gland	left kidney	liver	pancreas	IVC	right adrenal gland	right kidney	spleen	stomach	Average
MedImageParse 3D	95.73	76.03	81.38	66.58	63.35	96.92	97.65	88.70	87.26	68.14	96.69	96.88	88.93	84.94
nnU-Net	95.64	66.78	73.62	66.32	57.15	95.82	97.25	79.29	90.66	53.29

Method	DSC (Semantic)	NSD (Semantic)	F1 (Instance)	DSC TP (Instance)
CAT	0.7211	0.7227	0.2993	0.3717
SAT	0.6780	0.6726	0.2517	0.3954
MedImageParse 3D	0.8512	0.8965	0.5119	0.6749

Method	DSC (Semantic)	NSD (Semantic)	F1 (Instance)	DSC TP (Instance)
CAT	0.5415	0.6193	0.1375	0.2813
SAT	0.5610	0.6669	0.1228	0.2728
MedImageParse 3D	0.7396	0.8664	0.5317	0.7053

Method	DSC (Semantic)	NSD (Semantic)	F1 (Instance)	DSC TP (Instance)
CAT	–	–	0.0313	0.3628
SAT	–	–	0.2006	0.4243
MedImageParse 3D	–	–	0.1939	0.6552

Method	DSC (Semantic)	NSD (Semantic)	F1 (Instance)	DSC TP (Instance)
CAT	–	–	0.1098	0.2779
SAT	–	–	0.4200	0.7863
MedImageParse 3D	–	–	0.3132	0.7185

Method	DSC (Semantic)	NSD (Semantic)	F1 (Instance)	DSC TP (Instance)
CAT	0.8594	0.8360	–	–
SAT	0.8558	0.7924	–	–
MedImageParse 3D	0.9050	0.9135	–	–

Intended Use

Primary Use Cases

Supported Data Input Format

The model expect 3D NIfTI images by default.
The model outputs pixel probabilities in the same shape as the input image. The probability threshold for segmentation mask is 0.5.
The model takes in text prompts for segmentation and doesn't have a fixed number of targets to handle. However, to ensure quality performance, we recommend the following tasks based on evaluation results. Wil will extend the model capability with more object types including tumors and nodules.

CT: oncology/pathology (adrenocortical carcinoma, kidney lesions/cysts L/R, liver tumors, lung lesions, pancreas tumors, head–neck cancer, colon cancer primaries, COVID-19, whole-body lesion, lymph nodes); thoracic (lungs L/R, lobes LUL/LLL/RUL/RML/RLL, trachea, airway tree); abdomen/pelvis (spleen, liver, gallbladder, stomach, pancreas, duodenum, small bowel, colon, esophagus); GU/endocrine (kidneys L/R, adrenal glands L/R, bladder, prostate, uterus); vascular (aorta/tree, SVC, IVC, pulmonary vein, brachiocephalic trunk, subclavian/carotid arteries L/R, brachiocephalic veins L/R, left atrial appendage, portal/splenic vein, iliac arteries/veins L/R); cardiac (heart); head/neck (carotids L/R, submandibular/parotid/lacrimal glands L/R, thyroid, larynx glottic/supraglottic, lips, buccal mucosa, oral cavity, cervical esophagus, cricopharyngeal inlet, arytenoids, eyeball segments ant/post L/R, optic chiasm, optic nerves L/R, cochleae L/R, pituitary, brainstem, spinal cord); neuro/cranial (brain, skull, Circle of Willis CTA); spine/MSK (sacrum, vertebrae C1–S1, humeri/scapulae/clavicles/femora/hips L/R, gluteus maximus/medius/minimus L/R, autochthon L/R, iliopsoas L/R).
MRI: abdomen/pelvis (spleen, liver, gallbladder, stomach, pancreas, duodenum, small bowel, colon whole, esophagus, bladder, prostate, uterus); colon segments (cecum, appendix, ascending, transverse, descending, sigmoid, rectum); GU (prostate transition zone, prostate lesion); cardiac CMR (LV, RV, myocardium, LA, RA); thoracic (lungs L/R); vascular (aorta, pulmonary artery, SVC, IVC, portal/splenic vein, iliac arteries/veins L/R, carotid arteries L/R, jugular veins L/R); neuro tumors/ischemia (brain, brain tumor, stroke lesion, GTVp/GTVn tumor, vestibular schwannoma intra/extra-meatal, cochleae L/R); glioma components (non-enhancing tumor core, non-enhancing FLAIR hyperintensity, enhancing tissue, resection cavity); white matter disease (WM hyperintensities FLAIR/T1); neurovascular (Circle of Willis MRA); spine/MSK (sacrum, vertebrae regional, discs, spinal canal/cord, humeri/femora/hips L/R, gluteus maximus/medius/minimus L/R, autochthon L/R, iliopsoas L/R).
Ultrasound: cardiac (LV, myocardium, LA), neck (thyroid, carotid artery, jugular vein), neuro (brain tumor), calf MSK (soleus, gastrocnemius medialis/lateralis).
PET / PET-CT: whole-body lesion.
Electron Microscopy: endolysosomes, mitochondria, nuclei, neuronal ultrastructure, synaptic clefts, axon.
Light-Sheet Microscopy: brain neural activity, Alzheimer’s plaque, nuclei, vessel.

Out-of-Scope Use Cases

This model is intended and provided as-is for research and model development exploration. MedImageParse 3D is not designed or intended to be deployed in clinical settings as-is nor is it intended for use in the diagnosis or treatment of any health or medical condition, and the model’s performance for such purposes has not been established. You bear sole responsibility and liability for any use of MedImageParse 3D, including verification of outputs and incorporation into any product or service intended for a medical purpose or to inform clinical decision-making, compliance with applicable healthcare laws and regulations, and obtaining any necessary clearances or approvals. When evaluating the model for your use case, carefully consider the impacts of overreliance, including overreliance within the context of radiology specifically and more generally for generative AI Appropriate reliance on Generative AI: Research synthesis - Microsoft Research

Responsible AI Considerations

Microsoft believes Responsible AI is a shared responsibility and we have identified six principles and practices to help organizations address risks, innovate, and create value: fairness, reliability and safety, privacy and security, inclusiveness, transparency, and accountability. When downloaded or used in accordance with our terms of service, developers should work with their supporting model team to ensure this model meets requirements for the relevant use case and addresses unforeseen product misuse. 

While testing the model with images and/or text, ensure that the data is PHI free and that there are no patient information or information that can be tracked to a patient identity.

The model is not designed for the following use cases:

Use by clinicians to inform clinical decision-making, as a diagnostic tool or as a medical device - Although MedImageParse 3D is highly accurate in parsing biomedical data, it is not designed or intended to be deployed in clinical settings as-is not is it for use in the diagnosis, cure, mitigation, treatment, or prevention of disease or other conditions (including to support clinical decision-making), or as a substitute of professional medical advice, diagnosis, treatment, or clinical judgment of a healthcare professional. 
Scenarios without consent for data - Any scenario that uses health data for a purpose for which consent was not obtained.  
Use outside of health scenarios - Any scenario that uses non-medical related image and/or serving purposes outside of the healthcare domain.

Please see Microsoft's Responsible AI Principles and approach available at https://www.microsoft.com/en-us/ai/principles-and-approach/

Training Data

Data description:

The dataset covers five commonly used 3D biomedical image modalities: CT, MR, PET, Ultrasound, and Microscopy. All the images are from public datasets with a License for redistribution. All images were processed to npz format with an intensity range of [0, 255]. Specifically, for CT images, the Hounsfield units were normalized using typical window width and level values: soft tissues (W:400, L:40), lung (W:1500, L:-160), brain (W:80, L:40), and bone (W:1800, L:400). Subsequently, the intensity values were rescaled to the range of [0, 255]. For other images, the intensity values were clipped in the range between the 0.5th and 99.5th percentiles before rescaling them to the range of [0, 255]. If the original intensity range is already in [0, 255], no preprocessing was applied.

License and where to send questions or comments about the model

The license for MedImageParse 3D is the MIT license. Please cite our Paper if you use the model for your research. For questions or comments, please contact: [email protected]

Citation

Zhao, T., Gu, Y., Yang, J. et al. A foundation model for joint segmentation, detection and recognition of biomedical objects across nine modalities. Nat Methods (2024). https://doi.org/10.1038/s41592-024-02499-w inputModalities : image outputModalities : text,image keywords : Multimodal inference_compute_allow_list : ['Standard_NC24ads_A100_v4', 'Standard_NC48ads_A100_v4', 'Standard_NC96ads_A100_v4', 'Standard_ND96asr_v4', 'Standard_ND96amsr_A100_v4', 'Standard_ND40rs_v2', 'Standard_NC40ads_H100_v5', 'Standard_NC80adis_H100_v5', 'Standard_ND96isr_H100_v5'] inference_supported_envs : ['hf']`

View in Studio: https://ml.azure.com/registries/azureml/models/MedImageParse3D/version/3

License: mit

Properties

inference-min-sku-spec: 24|1|220|64

inference-recommended-sku: Standard_NC24ads_A100_v4, Standard_NC48ads_A100_v4, Standard_NC96ads_A100_v4, Standard_ND96asr_v4, Standard_ND96amsr_A100_v4, Standard_ND40rs_v2, Standard_NC40ads_H100_v5, Standard_NC80adis_H100_v5, Standard_ND96isr_H100_v5

languages: en

SharedComputeCapacityEnabled: True

models MedImageParse3D - Azure/azureml-assets GitHub Wiki

MedImageParse3D

Overview

Model Architecture

Sample inputs and outputs (for real time inference)

Tags

Intended Use

Primary Use Cases

Out-of-Scope Use Cases

Responsible AI Considerations

Training Data

License and where to send questions or comments about the model

Citation

Properties

⚠️ GitHub.com Fallback ⚠️

models MedImageParse3D - Azure/azureml-assets GitHub Wiki

MedImageParse3D

Overview

Model Architecture

Sample inputs and outputs (for real time inference)

Tags

Intended Use

Primary Use Cases

Out-of-Scope Use Cases

Responsible AI Considerations

Training Data

License and where to send questions or comments about the model

Citation

Properties

⚠️ **GitHub.com Fallback** ⚠️

⚠️ GitHub.com Fallback ⚠️