models Phi 3.5 vision instruct - Azure/azureml-assets GitHub Wiki

Phi-3.5-vision-instruct

Overview

Phi-3.5-vision is a lightweight, state-of-the-art open multimodal model built upon datasets which include - synthetic data and filtered publicly available websites - with a focus on very high-quality, reasoning dense data both on text and vision. The model belongs to the Phi-3 model family, and the multimodal version comes with 128K context length (in tokens) it can support. The model underwent a rigorous enhancement process, incorporating both supervised fine-tuning and direct preference optimization to ensure precise instruction adherence and robust safety measures.

Resources

🏑 Phi-3 Portal
πŸ“° Phi-3 Microsoft Blog
πŸ“– Phi-3 Technical Report
πŸ‘©β€πŸ³ Phi-3 Cookbook

Model Summary

Architecture Phi-3.5-vision has 4.2B parameters and contains image encoder, connector, projector, and Phi-3 Mini language model.
Inputs Text and Image. It’s best suited for prompts using the chat format.
Context length 128K tokens
GPUs 256 A100-80G
Training time 6 days
Training data 500B tokens (vision tokens + text tokens)
Outputs Generated text in response to the input
Dates Trained between July and August 2024
Status This is a static model trained on an offline text dataset with cutoff date March 15, 2024. Future versions of the tuned models may be released as we improve models.
Release date August 20, 2024
License MIT

Version: 2

Tags

maas-inference : true disable-batch : true _aml_system_vanity_registry : azureml-phi inference_supported_envs : ['vllm'] model_specific_defaults : {'apply_deepspeed': 'true', 'deepspeed_stage': 2, 'apply_lora': 'true', 'apply_ort': 'false', 'precision': 16, 'ignore_mismatched_sizes': 'false', 'num_train_epochs': 1, 'per_device_train_batch_size': 1, 'per_device_eval_batch_size': 1, 'gradient_accumulation_steps': 1, 'learning_rate': 5e-06, 'lr_scheduler_type': 'cosine', 'logging_strategy': 'steps', 'logging_steps': 10, 'save_total_limit': 1}

View in Studio: https://ml.azure.com/registries/azureml/models/Phi-3.5-vision-instruct/version/2

Properties

SharedComputeCapacityEnabled: True

languages: en

inference-min-sku-spec: 24|1|220|64

inference-recommended-sku: Standard_NC24ads_A100_v4, Standard_NC48ads_A100_v4, Standard_NC96ads_A100_v4, Standard_ND96amsr_A100_v4

⚠️ **GitHub.com Fallback** ⚠️