models hibou b - Azure/azureml-assets GitHub Wiki
Hibou-B is a foundational vision transformer developed for digital pathology, designed to generate high-quality feature representations from histology image patches. These representations can be leveraged for a range of downstream tasks, including classification, segmentation, and detection.
Built on the ViT-B/14 architecture, Hibou-B processes 224 × 224 input patches and was trained on 512 million tissue patches extracted from a diverse dataset of 1.3 million whole slide images. The model was trained using self-supervised learning with DINOv2, incorporating stain normalization and extensive geometric augmentations to enable robust generalization across varied histopathological domains.
To understand the capabilities of Hibou-B, we evaluated it across a range of public digital pathology benchmarks. The model was tested at both patch-level and slide-level granularity to assess its generalization and diagnostic utility across different cancer types and tissue modalities.
Category | Benchmark | Hibou-B |
---|---|---|
Patch-level | CRC-100K | 95.5 |
PCAM | 94.6 | |
MHIST | 81.2 | |
MSI-CRC | 77.9 | |
MSI-STAD | 79.7 | |
TIL-DET | 94.2 | |
Slide-level | BRCA | 92.9 |
NSCLC | 95.2 | |
RCC | 99.3 |
Single PDB Input:
data = {
"input_data": {
"columns": ["image"],
"data": [
["base64_encoded_image_string"]
]
}
}
Multiple PDB Input:
data = {
"input_data": {
"columns": ["image"],
"data": [
["base64_encoded_image_string_1"],
["base64_encoded_image_string_2"]
]
}
}
Output Sample:
[
{
"image_features": [2.6749861240386963, -0.7507642507553101, 0.2108164280653, ...]
}
]
Output Processing Example:
def process_hibou_predictions(result):
"""Process Hibou-B embedding predictions."""
if not result:
print("No predictions found")
return
# Handle the response format: [{'image_features': [embedding_list]}]
if isinstance(result, list) and len(result) > 0:
first_result = result[0]
if isinstance(first_result, dict) and 'image_features' in first_result:
embeddings = first_result['image_features']
embedding_dim = len(embeddings)
print(f"Received embeddings with dimension: {embedding_dim}")
# Calculate statistics
embedding_array = np.array(embeddings)
print(f"Embedding statistics:")
print(f" - Mean: {np.mean(embedding_array):.4f}")
print(f" - Std: {np.std(embedding_array):.4f}")
print(f" - Min: {np.min(embedding_array):.4f}")
print(f" - Max: {np.max(embedding_array):.4f}")
return embeddings
else:
print(f"Unexpected result format - missing 'image_features' key")
print(f"Available keys: {list(first_result.keys()) if isinstance(first_result, dict) else 'Not a dict'}")
else:
print(f"Unexpected result format: {type(result)}")
return None
def visualize_embeddings_results(original_image, embeddings, save_path=None):
"""Visualize the embedding results with distribution plot."""
plt.figure(figsize=(15, 6))
plt.subplot(1, 3, 1)
plt.imshow(original_image)
plt.title('Original Image')
plt.axis('off')
plt.subplot(1, 3, 2)
plt.imshow(original_image)
embedding_dim = len(embeddings) if embeddings else 0
plt.title(f'Processed - Embedding dim: {embedding_dim}')
plt.axis('off')
# Show embedding distribution
plt.subplot(1, 3, 3)
if embeddings:
embedding_array = np.array(embeddings)
plt.hist(embedding_array, bins=50, alpha=0.7, color='blue')
plt.title(f'Embedding Distribution\nMean: {np.mean(embedding_array):.3f}\nStd: {np.std(embedding_array):.3f}')
plt.xlabel('Embedding Value')
plt.ylabel('Frequency')
else:
plt.text(0.5, 0.5, 'No embeddings', ha='center', va='center')
plt.axis('off')
plt.tight_layout()
if save_path:
plt.savefig(save_path, bbox_inches='tight', dpi=300)
plt.show()
- Supported Data Input Format
-
Input Format: The model accepts histopathology images in png. Images can be provided as base64-encoded image strings.
-
Output Format: The model generates dense embedding vectors representing the visual features of histopathology image in 768-dimensional feature vectors.
-
Data Sources and Technical Details: For comprehensive information about training datasets, model architecture, and validation results, refer to the official hibou repository
Version: 1
task : embeddings
industry : health-and-life-sciences
Preview
licenseDescription : This model is provided under the License Terms available at <https://github.com/HistAI/hibou/blob/main/LICENSE>.
inference_supported_envs : ['hf']
license : apache-2.0
author : HistAI
hiddenlayerscanned
SharedComputeCapacityEnabled
inference_compute_allow_list : ['Standard_NC4as_T4_v3', 'Standard_NC8as_T4_v3', 'Standard_NC16as_T4_v3', 'Standard_NC64as_T4_v3', 'Standard_NC6s_v3', 'Standard_NC12s_v3', 'Standard_NC24s_v3', 'Standard_NC24ads_A100_v4', 'Standard_NC48ads_A100_v4', 'Standard_NC96ads_A100_v4', 'Standard_ND96asr_v4', 'Standard_ND96amsr_A100_v4', 'Standard_ND40rs_v2', 'Standard_NC40ads_H100_v5', 'Standard_NC80adis_H100_v5', 'Standard_ND96isr_H100_v5']
View in Studio: https://ml.azure.com/registries/azureml/models/hibou-b/version/1
License: apache-2.0
inference-min-sku-spec: 4|1|28|64
inference-recommended-sku: Standard_NC4as_T4_v3, Standard_NC8as_T4_v3, Standard_NC16as_T4_v3, Standard_NC64as_T4_v3, Standard_NC6s_v3, Standard_NC12s_v3, Standard_NC24s_v3, Standard_NC24ads_A100_v4, Standard_NC48ads_A100_v4, Standard_NC96ads_A100_v4, Standard_ND96asr_v4, Standard_ND96amsr_A100_v4, Standard_ND40rs_v2, Standard_NC40ads_H100_v5, Standard_NC80adis_H100_v5, Standard_ND96isr_H100_v5
languages: en
SharedComputeCapacityEnabled: True