models Phi 3.5 MoE instruct - Azure/azureml-assets GitHub Wiki

Phi-3.5-MoE-instruct

Overview

Phi-3.5-MoE is a lightweight, state-of-the-art open model built upon datasets used for Phi-3 - synthetic data and filtered publicly available documents - with a focus on very high-quality, reasoning dense data. The model supports multilingual and comes with 128K context length (in tokens). The model underwent a rigorous enhancement process, incorporating supervised fine-tuning, proximal policy optimization, and direct preference optimization to ensure precise instruction adherence and robust safety measures.

Resources

🏡 Phi-3 Portal
📰 Phi-3 Microsoft Blog
📖 Phi-3 Technical Report
👩‍🍳 Phi-3 Cookbook

Model Architecture

Phi-3.5-MoE has 16x3.8B parameters with 6.6B active parameters when using 2 experts. The model is a mixture-of-expert decoder-only Transformer model using the tokenizer with vocabulary size of 32,064.

Training Data

This is a static model trained on an offline dataset with 4.9T tokens and a cutoff date October 2023 for publicly available data. Future versions of the tuned models may be released as we improve models.

Version: 5

Tags

maas-inference : true disable-batch : true _aml_system_vanity_registry : azureml-phi maas-finetuning-deploy-regions : ['eastus2', 'eastus', 'northcentralus', 'westus3', 'westus', 'southcentralus'] inference_supported_envs : ['vllm'] model_specific_defaults : {'apply_deepspeed': 'true', 'deepspeed_stage': 3, 'apply_lora': 'true', 'apply_ort': 'false', 'precision': 16, 'max_seq_length': 16384, 'ignore_mismatched_sizes': 'false', 'num_train_epochs': 1, 'per_device_train_batch_size': 1, 'per_device_eval_batch_size': 1, 'gradient_accumulation_steps': 1, 'learning_rate': 2e-05, 'learning_rate_min': 2e-06, 'learning_rate_max': 0.0002, 'lr_scheduler_type': 'cosine', 'logging_strategy': 'steps', 'logging_steps': 10, 'save_total_limit': 1}

View in Studio: https://ml.azure.com/registries/azureml/models/Phi-3.5-MoE-instruct/version/5

Properties

SharedComputeCapacityEnabled: True

languages: en

inference-min-sku-spec: 48|2|440|128

inference-recommended-sku: Standard_NC48ads_A100_v4, Standard_NC96ads_A100_v4, Standard_ND96amsr_A100_v4

finetuning-tasks: chat-completion

finetune-min-sku-spec: 96|4|880|256

finetune-recommended-sku: Standard_NC96ads_A100_v4, Standard_ND96amsr_A100_v4

⚠️ **GitHub.com Fallback** ⚠️