models Phi 3.5 vision instruct - Azure/azureml-assets GitHub Wiki
Phi-3.5-vision is a lightweight, state-of-the-art open multimodal model built upon datasets which include - synthetic data and filtered publicly available websites - with a focus on very high-quality, reasoning dense data both on text and vision. The model belongs to the Phi-3 model family, and the multimodal version comes with 128K context length (in tokens) it can support. The model underwent a rigorous enhancement process, incorporating both supervised fine-tuning and direct preference optimization to ensure precise instruction adherence and robust safety measures.
π‘ Phi-3 Portal
π° Phi-3 Microsoft Blog
π Phi-3 Technical Report
π©βπ³ Phi-3 Cookbook
| Architecture | Phi-3.5-vision has 4.2B parameters and contains image encoder, connector, projector, and Phi-3 Mini language model. |
| Inputs | Text and Image. Itβs best suited for prompts using the chat format. |
| Context length | 128K tokens |
| GPUs | 256 A100-80G |
| Training time | 6 days |
| Training data | 500B tokens (vision tokens + text tokens) |
| Outputs | Generated text in response to the input |
| Dates | Trained between July and August 2024 |
| Status | This is a static model trained on an offline text dataset with cutoff date March 15, 2024. Future versions of the tuned models may be released as we improve models. |
| Release date | August 20, 2024 |
| License | MIT |
Version: 2
maas-inference : true disable-batch : true _aml_system_vanity_registry : azureml-phi inference_supported_envs : ['vllm'] model_specific_defaults : {'apply_deepspeed': 'true', 'deepspeed_stage': 2, 'apply_lora': 'true', 'apply_ort': 'false', 'precision': 16, 'ignore_mismatched_sizes': 'false', 'num_train_epochs': 1, 'per_device_train_batch_size': 1, 'per_device_eval_batch_size': 1, 'gradient_accumulation_steps': 1, 'learning_rate': 5e-06, 'lr_scheduler_type': 'cosine', 'logging_strategy': 'steps', 'logging_steps': 10, 'save_total_limit': 1}
View in Studio: https://ml.azure.com/registries/azureml/models/Phi-3.5-vision-instruct/version/2
SharedComputeCapacityEnabled: True
languages: en
inference-min-sku-spec: 24|1|220|64
inference-recommended-sku: Standard_NC24ads_A100_v4, Standard_NC48ads_A100_v4, Standard_NC96ads_A100_v4, Standard_ND96amsr_A100_v4