models MedImageParse3D - Azure/azureml-assets GitHub Wiki
Biomedical image analysis is fundamental for biomedical discovery in cell biology, pathology, radiology, and many other biomedical domains. 3D medical images such as CT and MRI play unique roles in clinical practices. MedImageParse 3D is a foundation model for imaging parsing that can jointly conduct segmentation, detection, and recognition for 3D medical images including CT and MRI. Through joint learning, we can improve accuracy for individual tasks and enable novel applications such as segmenting relevant objects in an image through a text prompt, rather than requiring users to laboriously specify the bounding box for each object.
MedImageParse 3D was trained on a large dataset comprising triples of image, segmentation mask, and textual description. It takes in 3D medical image volume with a text prompt about the target object type (e.g. pancreas in CT), and outputs the corresponding segmentation mask in 3D volume the same shape as the input image. MedImageParse 3D is also able to identify invalid user inputs describing objects that do not exist in the image. MedImageParse 3D can perform object detection, which aims to locate a specific object of interest, including objects with irregular shapes or of small size.
Traditional segmentation models do segmentation alone, requiring a fully supervised mask during training and typically need either manual bounding boxes or automatic proposals at inference if multiple objects are present. Such model doesn’t inherently know which object to segment unless trained specifically for that class, and it can’t take a text query to switch targets. MedImageParse 3D can segment via text prompts describing the object without needing a user-drawn bounding box. This semantic prompt-based approach lets it parse the image and find relevant objects anywhere in the image.
In summary, MedImageParse 3D shows potential to be a building block for an all-in-one tool for biomedical image analysis by jointly solving segmentation, detection, and recognition. It is broadly applicable to different 3D image modalities through text prompting, which may pave a future path for efficient and accurate image-based biomedical discovery when built upon and integrated into an application.
MedImageParse 3D is built upon BiomedParse with the BoltzFormer architecture, optimized for locating small objects in 3D images. Leveraging Boltzmann attention sampling mechanisms, it excels at identifying subtle patterns corresponding to biomedical terminologies, as well as extracting contextually relevant information from dense scientific texts. The model is pre-trained on vast 3D medical image datasets, allowing it to generalize across various biomedical domains with high accuracy.
Input
import base64
data = {
"input_data": {
"columns": [ "image", "text" ],
"index": [ 0 ],
"data": [
[
base64_image,
"CT imaging of the spleen within the abdomen & Presence of the right kidney detected in abdominal CT images & Abdominal CT showing the left kidney & CT scan of the gallbladder in the abdominal region & CT scan of the esophagus in the abdominal region & Visualization of the liver in abdominal CT imaging & CT imaging of the stomach in the abdomen & Abdominal CT showing aortic structures & Inferior vena cava in abdominal CT & CT scan of the pancreas in the abdominal region & CT imaging of the right adrenal gland in the abdomen & CT imaging of the left adrenal gland in the abdomen & Visualization of the duodenum in abdominal CT imaging & Bladder observed in abdominal CT scans & CT scan of the prostate/uterus in the abdominal region"
]
]
}
}
-
"columns"
describes the types of inputs your model expects (in this case,"image"
and"text"
). -
"data"
is where you actually provide the values: the first element is the Base64-encoded NIfTI, and the second is a text parameter (e.g.,"CT imaging of the spleen within the abdomen & Presence of the right kidney detected in abdominal CT images ..." where '&' seperates multiple prompts
).
Output
[
{
"nifti_file": "{\"data\":\"<BASE64_ENCODED_BYTES>\"}"
}
]
The field "<Base64EncodedNifti>"
contains raw binary NIfTI data, encoded in Base64.
The provided function decode_base64_to_nifti
handles the decoding logic:
import json
import base64
import tempfile
import nibabel as nib
def decode_base64_to_nifti(base64_string: str) -> nib.Nifti1Image:
"""
Decode a Base64 string back to a NIfTI image.
The function expects `base64_string` to be a JSON string
of the form: '{"data": "<Base64EncodedBytes>"}'.
"""
# Convert the 'nifti_file' string to a Python dict, then extract the 'data' field
base64_string = json.loads(base64_string)["data"]
# Decode Base64 string to raw bytes
byte_data = base64.b64decode(base64_string)
# Write these bytes to a temporary file and load as a NIfTI image
with tempfile.NamedTemporaryFile(suffix='.nii.gz', delete=False) as temp_file:
temp_file.write(byte_data)
nifti_image = nib.load(temp_file.name)
# Return the voxel data as a Numpy array
return nifti_image.get_fdata()
The output can be parsed using:
import json
# Suppose `response` is the raw byte response from urllib
response_data = json.loads(response)
# Extract the JSON-stringified NIfTI
nifti_file_str = response_data[0]["nifti_file"]
# Decode to get the NIfTI volume as a Numpy array
segmentation_array = decode_base64_to_nifti(nifti_file_str)
print(segmentation_array.shape)
Optionally, to visualize the output:
# --- Quick visualization of one axial slice ---
import matplotlib.pyplot as plt
import numpy as np
slice_id = 40 # choose an in-bounds slice index along the third axis (H, W, Z)
slice_ = segmentation_array[:, :, slice_id]
# Exclude background (0) when listing labels
labels = np.unique(slice_)
labels = labels[labels != 0]
plt.figure(figsize=(6, 6))
plt.imshow(slice_, cmap="gray")
plt.title(f"Masks @ slice {slice_id} | labels: {labels.tolist()}")
plt.axis("off")
plt.show()
Version: 3
Preview
Featured
task : image-segmentation
industry : health-and-life-sciences
displayName : MedImageParse 3D
author : Microsoft
hiddenlayerscanned
SharedComputeCapacityEnabled
license : mit
languages : en
`evaluation :
We benchmarked MedImageParse 3D against task-specific nnU-Net models on AMOS22 CT and MRI datasets. Note that we trained a single model to solve all different tasks solely via text prompting, e.g. "gallbladder in abdomen MRI", while nnU-Net was trained as multiple expert models for each individual object in each modality. Therefore, we made this comparison of one single model v.s. 27 task-specific models.
CT
Dice score (%) | aorta | bladder | duodenum | esophagus | gallbladder | left adrenal gland | left kidney | liver | pancreas | IVC | right adrenal gland | right kidney | spleen | stomach | Average |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
MedImageParse 3D | 95.27 | 90.17 | 83.27 | 87.11 | 85.96 | 79.48 | 96.39 | 97.71 | 88.42 | 92.02 | 79.39 | 96.88 | 96.91 | 91.49 | 90.00 |
nnU-Net | 95.20 | 87.52 | 80.72 | 87.31 | 83.06 | 78.06 | 95.39 | 96.09 | 86.57 | 90.38 | 78.24 | 93.19 | 96.91 | 89.79 | 88.35 |
SegVol | 92.07 | 88.03 | 72.49 | 64.47 | 79.05 | 76.31 | 94.58 | 96.24 | 80.97 | 83.65 | 71.07 | 92.92 | 94.03 | 88.82 | 83.75 |
MRI
Dice score (%) | aorta | duodenum | esophagus | gallbladder | left adrenal gland | left kidney | liver | pancreas | IVC | right adrenal gland | right kidney | spleen | stomach | Average |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
MedImageParse 3D | 95.73 | 76.03 | 81.38 | 66.58 | 63.35 | 96.92 | 97.65 | 88.70 | 87.26 | 68.14 | 96.69 | 96.88 | 88.93 | 84.94 |
nnU-Net | 95.64 | 66.78 | 73.62 | 66.32 | 57.15 | 95.82 | 97.25 | 79.29 | 90.66 | 53.29 |
We evaluated MedImageParse 3D on the CVPR 2025 Foundation Models for Text-guided 3D Biomedical Image Segmentation open-challenge validation set, which includes both semantic and instance segmentation tasks. For semantic segmentation, we report Dice Similarity Coefficient (DSC) for region overlap and Normalized Surface Distance (NSD) for boundary accuracy. For instance segmentation, we report the F1 score at an IoU threshold of 0.5 and DSC for true-positive instances.
CT
Method | DSC (Semantic) | NSD (Semantic) | F1 (Instance) | DSC TP (Instance) |
---|---|---|---|---|
CAT | 0.7211 | 0.7227 | 0.2993 | 0.3717 |
SAT | 0.6780 | 0.6726 | 0.2517 | 0.3954 |
MedImageParse 3D | 0.8512 | 0.8965 | 0.5119 | 0.6749 |
MRI
Method | DSC (Semantic) | NSD (Semantic) | F1 (Instance) | DSC TP (Instance) |
---|---|---|---|---|
CAT | 0.5415 | 0.6193 | 0.1375 | 0.2813 |
SAT | 0.5610 | 0.6669 | 0.1228 | 0.2728 |
MedImageParse 3D | 0.7396 | 0.8664 | 0.5317 | 0.7053 |
Microscopy
Method | DSC (Semantic) | NSD (Semantic) | F1 (Instance) | DSC TP (Instance) |
---|---|---|---|---|
CAT | – | – | 0.0313 | 0.3628 |
SAT | – | – | 0.2006 | 0.4243 |
MedImageParse 3D | – | – | 0.1939 | 0.6552 |
PET
Method | DSC (Semantic) | NSD (Semantic) | F1 (Instance) | DSC TP (Instance) |
---|---|---|---|---|
CAT | – | – | 0.1098 | 0.2779 |
SAT | – | – | 0.4200 | 0.7863 |
MedImageParse 3D | – | – | 0.3132 | 0.7185 |
Ultrasound
Method | DSC (Semantic) | NSD (Semantic) | F1 (Instance) | DSC TP (Instance) |
---|---|---|---|---|
CAT | 0.8594 | 0.8360 | – | – |
SAT | 0.8558 | 0.7924 | – | – |
MedImageParse 3D | 0.9050 | 0.9135 | – | – |
Note: “–” indicates that the metric is not applicable for that modality/method.
notes :
- Supported Data Input Format
- The model expect 3D NIfTI images by default.
- The model outputs pixel probabilities in the same shape as the input image. The probability threshold for segmentation mask is 0.5.
- The model takes in text prompts for segmentation and doesn't have a fixed number of targets to handle. However, to ensure quality performance, we recommend the following tasks based on evaluation results. Wil will extend the model capability with more object types including tumors and nodules.
- CT: oncology/pathology (adrenocortical carcinoma, kidney lesions/cysts L/R, liver tumors, lung lesions, pancreas tumors, head–neck cancer, colon cancer primaries, COVID-19, whole-body lesion, lymph nodes); thoracic (lungs L/R, lobes LUL/LLL/RUL/RML/RLL, trachea, airway tree); abdomen/pelvis (spleen, liver, gallbladder, stomach, pancreas, duodenum, small bowel, colon, esophagus); GU/endocrine (kidneys L/R, adrenal glands L/R, bladder, prostate, uterus); vascular (aorta/tree, SVC, IVC, pulmonary vein, brachiocephalic trunk, subclavian/carotid arteries L/R, brachiocephalic veins L/R, left atrial appendage, portal/splenic vein, iliac arteries/veins L/R); cardiac (heart); head/neck (carotids L/R, submandibular/parotid/lacrimal glands L/R, thyroid, larynx glottic/supraglottic, lips, buccal mucosa, oral cavity, cervical esophagus, cricopharyngeal inlet, arytenoids, eyeball segments ant/post L/R, optic chiasm, optic nerves L/R, cochleae L/R, pituitary, brainstem, spinal cord); neuro/cranial (brain, skull, Circle of Willis CTA); spine/MSK (sacrum, vertebrae C1–S1, humeri/scapulae/clavicles/femora/hips L/R, gluteus maximus/medius/minimus L/R, autochthon L/R, iliopsoas L/R).
- MRI: abdomen/pelvis (spleen, liver, gallbladder, stomach, pancreas, duodenum, small bowel, colon whole, esophagus, bladder, prostate, uterus); colon segments (cecum, appendix, ascending, transverse, descending, sigmoid, rectum); GU (prostate transition zone, prostate lesion); cardiac CMR (LV, RV, myocardium, LA, RA); thoracic (lungs L/R); vascular (aorta, pulmonary artery, SVC, IVC, portal/splenic vein, iliac arteries/veins L/R, carotid arteries L/R, jugular veins L/R); neuro tumors/ischemia (brain, brain tumor, stroke lesion, GTVp/GTVn tumor, vestibular schwannoma intra/extra-meatal, cochleae L/R); glioma components (non-enhancing tumor core, non-enhancing FLAIR hyperintensity, enhancing tissue, resection cavity); white matter disease (WM hyperintensities FLAIR/T1); neurovascular (Circle of Willis MRA); spine/MSK (sacrum, vertebrae regional, discs, spinal canal/cord, humeri/femora/hips L/R, gluteus maximus/medius/minimus L/R, autochthon L/R, iliopsoas L/R).
- Ultrasound: cardiac (LV, myocardium, LA), neck (thyroid, carotid artery, jugular vein), neuro (brain tumor), calf MSK (soleus, gastrocnemius medialis/lateralis).
- PET / PET-CT: whole-body lesion.
- Electron Microscopy: endolysosomes, mitochondria, nuclei, neuronal ultrastructure, synaptic clefts, axon.
- Light-Sheet Microscopy: brain neural activity, Alzheimer’s plaque, nuclei, vessel.
This model is intended and provided as-is for research and model development exploration. MedImageParse 3D is not designed or intended to be deployed in clinical settings as-is nor is it intended for use in the diagnosis or treatment of any health or medical condition, and the model’s performance for such purposes has not been established. You bear sole responsibility and liability for any use of MedImageParse 3D, including verification of outputs and incorporation into any product or service intended for a medical purpose or to inform clinical decision-making, compliance with applicable healthcare laws and regulations, and obtaining any necessary clearances or approvals. When evaluating the model for your use case, carefully consider the impacts of overreliance, including overreliance within the context of radiology specifically and more generally for generative AI Appropriate reliance on Generative AI: Research synthesis - Microsoft Research
Microsoft believes Responsible AI is a shared responsibility and we have identified six principles and practices to help organizations address risks, innovate, and create value: fairness, reliability and safety, privacy and security, inclusiveness, transparency, and accountability. When downloaded or used in accordance with our terms of service, developers should work with their supporting model team to ensure this model meets requirements for the relevant use case and addresses unforeseen product misuse.
While testing the model with images and/or text, ensure that the data is PHI free and that there are no patient information or information that can be tracked to a patient identity.
The model is not designed for the following use cases:
- Use by clinicians to inform clinical decision-making, as a diagnostic tool or as a medical device - Although MedImageParse 3D is highly accurate in parsing biomedical data, it is not designed or intended to be deployed in clinical settings as-is not is it for use in the diagnosis, cure, mitigation, treatment, or prevention of disease or other conditions (including to support clinical decision-making), or as a substitute of professional medical advice, diagnosis, treatment, or clinical judgment of a healthcare professional.
- Scenarios without consent for data - Any scenario that uses health data for a purpose for which consent was not obtained.
- Use outside of health scenarios - Any scenario that uses non-medical related image and/or serving purposes outside of the healthcare domain.
Please see Microsoft's Responsible AI Principles and approach available at https://www.microsoft.com/en-us/ai/principles-and-approach/
Data description:
The dataset covers five commonly used 3D biomedical image modalities: CT, MR, PET, Ultrasound, and Microscopy. All the images are from public datasets with a License for redistribution. All images were processed to npz format with an intensity range of [0, 255]. Specifically, for CT images, the Hounsfield units were normalized using typical window width and level values: soft tissues (W:400, L:40), lung (W:1500, L:-160), brain (W:80, L:40), and bone (W:1800, L:400). Subsequently, the intensity values were rescaled to the range of [0, 255]. For other images, the intensity values were clipped in the range between the 0.5th and 99.5th percentiles before rescaling them to the range of [0, 255]. If the original intensity range is already in [0, 255], no preprocessing was applied.
The license for MedImageParse 3D is the MIT license. Please cite our Paper
if you use the model for your research.
For questions or comments, please contact: [email protected]
Zhao, T., Gu, Y., Yang, J. et al. A foundation model for joint segmentation, detection and recognition of biomedical objects across nine modalities. Nat Methods (2024). https://doi.org/10.1038/s41592-024-02499-w
inputModalities : image
outputModalities : text,image
keywords : Multimodal
inference_compute_allow_list : ['Standard_NC24ads_A100_v4', 'Standard_NC48ads_A100_v4', 'Standard_NC96ads_A100_v4', 'Standard_ND96asr_v4', 'Standard_ND96amsr_A100_v4', 'Standard_ND40rs_v2', 'Standard_NC40ads_H100_v5', 'Standard_NC80adis_H100_v5', 'Standard_ND96isr_H100_v5']
inference_supported_envs : ['hf']`
View in Studio: https://ml.azure.com/registries/azureml/models/MedImageParse3D/version/3
License: mit
inference-min-sku-spec: 24|1|220|64
inference-recommended-sku: Standard_NC24ads_A100_v4, Standard_NC48ads_A100_v4, Standard_NC96ads_A100_v4, Standard_ND96asr_v4, Standard_ND96amsr_A100_v4, Standard_ND40rs_v2, Standard_NC40ads_H100_v5, Standard_NC80adis_H100_v5, Standard_ND96isr_H100_v5
languages: en
SharedComputeCapacityEnabled: True