Phi-3-medium-4k-instruct

Overview

The Phi-3-Medium-4K-Instruct is a 14B parameters, lightweight, state-of-the-art open model trained with the Phi-3 datasets that includes both synthetic data and the filtered publicly available websites data with a focus on high-quality and reasoning dense properties. The model belongs to the Phi-3 family with the Medium version in two variants 4K and 128K which is the context length (in tokens) that it can support.

The model underwent a post-training process that incorporates both supervised fine-tuning and direct preference optimization for the instruction following and safety measures. When assessed against benchmarks testing common sense, language understanding, math, code, long context and logical reasoning, Phi-3-Medium-4K-Instruct showcased a robust and state-of-the-art performance among models of the same-size and next-size-up.

Resources

🏡 Phi-3 Portal
📰 Phi-3 Microsoft Blog
📖 Phi-3 Technical Report
🛠️ Phi-3 on Azure AI Studio
👩‍🍳 Phi-3 Cookbook

Model Architecture

Phi-3-Medium-4K-Instruct has 14B parameters and is a dense decoder-only Transformer model. The model is fine-tuned with Supervised fine-tuning (SFT) and Direct Preference Optimization (DPO) to ensure alignment with human preferences and safety guidelines.

Training Datasets

Our training data includes a wide variety of sources, totaling 4.8 trillion tokens (including 10% multilingual), and is a combination of

Publicly available documents filtered rigorously for quality, selected high-quality educational data, and code;
Newly created synthetic, "textbook-like" data for the purpose of teaching math, coding, common sense reasoning, general knowledge of the world (science, daily activities, theory of mind, etc.);
High quality chat format supervised data covering various topics to reflect human preferences on different aspects such as instruct-following, truthfulness, honesty and helpfulness.

We are focusing on the quality of data that could potentially improve the reasoning ability for the model, and we filter the publicly available documents to contain the correct level of knowledge. As an example, the result of a game in premier league in a particular day might be good training data for frontier models, but we need to remove such information to leave more model capacity for reasoning for the small size models. More details about data can be found in the Phi-3 Technical Report.

Version: 6

Tags

_aml_system_vanity_registry : azureml-phi model_specific_defaults : {'apply_deepspeed': 'true', 'deepspeed_stage': 3, 'apply_lora': 'false', 'apply_ort': 'false', 'precision': 16, 'ignore_mismatched_sizes': 'false', 'num_train_epochs': 1, 'per_device_train_batch_size': 4, 'per_device_eval_batch_size': 4, 'gradient_accumulation_steps': 4, 'learning_rate': 5e-06, 'learning_rate_min': 5e-07, 'learning_rate_max': 5e-05, 'lr_scheduler_type': 'cosine', 'logging_strategy': 'steps', 'logging_steps': 10, 'save_total_limit': 1, 'max_seq_length': 4096}

View in Studio: https://ml.azure.com/registries/azureml/models/Phi-3-medium-4k-instruct/version/6

models Phi 3 medium 4k instruct - Azure/azureml-assets GitHub Wiki

Phi-3-medium-4k-instruct

Overview

Resources

Model Architecture

Training Datasets

Tags

⚠️ GitHub.com Fallback ⚠️

models Phi 3 medium 4k instruct - Azure/azureml-assets GitHub Wiki

Phi-3-medium-4k-instruct

Overview

Resources

Model Architecture

Training Datasets

Tags

⚠️ **GitHub.com Fallback** ⚠️

⚠️ GitHub.com Fallback ⚠️