models microsoft phi 1 5 - Azure/azureml-assets GitHub Wiki

microsoft-phi-1-5

Overview

Microsoft Phi-1.5

Phi-1.5 is a Transformer-based language model with 1.3 billion parameters. It was trained on a combination of data sources, including an additional source of NLP synthetic texts. Phi-1.5 performs exceptionally well on benchmarks testing common sense, language understanding, and logical reasoning among models with less than 10 billion parameters. The model is open-source and intended for research purposes to explore safety challenges in language models.

Intended Uses

Phi-1.5 is best suited for prompts using the QA format, the chat format, and the code format. Note: that phi-1.5, being a base model, often produces irrelevant text following the main answer

Limitations

  • Generate Inaccurate Code and Facts: The model often produces incorrect code snippets and statements. Users should treat these outputs as suggestions or starting points, not as definitive or accurate solutions.
  • Limited Scope for code: If the model generates Python scripts that utilize uncommon packages or scripts in other languages, we strongly recommend users manually verify all API uses.
  • Unreliable Responses to Instruction: The model has not undergone instruction fine-tuning. As a result, it may struggle or fail to adhere to intricate or nuanced instructions provided by users.
  • Language Limitations: The model is primarily designed to understand standard English. Informal English, slang, or any other language outside of English might pose challenges to its comprehension, leading to potential misinterpretations or errors in response.
  • Potential Societal Biases: Regardless of the safe data used for its training, the model is not entirely free from societal biases. There's a possibility it may generate content that mirrors these societal biases, particularly if prompted or instructed to do so. We urge users to be aware of this and to exercise caution and critical thinking when interpreting model outputs.
  • Toxicity: Despite that the model is trained with carefully selected data, the model can still produce harmful content if explicitly prompted or instructed to do so. We chose to release the model for research purposes only -- We hope to help the open-source community develop the most effective ways to reduce the toxicity of a model directly after pretraining.

Training:

  • The model was trained with 30 billion tokens, including 150 billion training tokens, using 32 GPUs over 8 days.
  • Software used includes PyTorch, DeepSpeed, and flash-attention.

License:

The model is licensed under the Research License.

Sample inputs and outputs (for real-time inference)

Sample Question-Answering input

{
  "input_data": {
    "input_string": [
      "What is a fermi paradox?"
    ],
    "parameters": {
      "top_p": 0.9,
      "temperature": 0.6,
      "max_new_tokens": 200,
      "do_sample": true
    }
  }
}

Sample output

{
  "output": [
    "What is a fermi paradox? A fermi paradox is a question that asks why there are no signs of intelligent life outside of Earth. If there is life out there, why haven't we heard from them? What is the Drake equation? The Drake equation is a way to estimate the number of civilizations in our galaxy that could communicate with us. It takes into account factors like the number of stars, the number of planets that could support life, and the likelihood of life evolving to the point of developing technology. What is the Fermi paradox? The Fermi paradox is a question that asks why there are no signs of intelligent life outside of Earth. If there is life out there, why haven't we heard from them? What is the Drake equation? The Drake equation is a way to estimate the number of civilizations in our galaxy that could communicate with us. It takes into account factors like the number of stars, the number of planets that could"
  ]
}

Sample Chat input

{
  "input_data": {
    "input_string": [
      "Alice: What is a fermi paradox?"
    ],
    "parameters": {
      "top_p": 0.9,
      "temperature": 0.6,
      "max_new_tokens": 100,
      "do_sample": true
    }
  }
}

Sample output

{
  "output": [
    "Alice: What is a fermi paradox? Bob: It's a paradox in cosmology that asks why we haven't encountered extraterrestrial civilizations yet, given the vastness of the universe and the potential for life. Alice: That's a tough one. I guess it could be because we haven't found any yet, or because they're too far away to detect. Bob: Yeah, there are a lot of different theories about it. But one thing's for sure, the universe is full of mysteries that we"
  ]
}

Sample Code input

{
  "input_data": {
    "input_string": [
      "def is_prime("
    ],
    "parameters": {
      "top_p": 0.9,
      "temperature": 0.6,
      "max_new_tokens": 100,
      "do_sample": true
    }
  }
}

Sample output

{
  "output": [
    "def is_prime(n: int) -> bool: if n < 2: return False for i in range(2, int(math.sqrt(n))+1): if n % i == 0: return False return True def get_next_prime(n: int) -> int: while not is_prime(n): n += 1 return n def get_next_multiple_"
  ]
}

Version: 9

Tags

SharedComputeCapacityEnabled license : other task : text-generation hiddenlayerscanned author : Microsoft evaluation_compute_allow_list : ['Standard_DS4_v2', 'Standard_DS5_v2', 'Standard_DS12_v2', 'Standard_NC6s_v3', 'Standard_NC12s_v3', 'Standard_NC24s_v3', 'Standard_NC24rs_v3', 'Standard_ND40rs_v2', 'Standard_ND96asr_v4', 'Standard_ND96amsr_A100_v4'] inference_compute_allow_list : ['Standard_DS3_v2', 'Standard_DS4_v2', 'Standard_NC6s_v3', 'Standard_NC12s_v3', 'Standard_NC24s_v3', 'Standard_ND40rs_v2', 'Standard_ND96asr_v4']

View in Studio: https://ml.azure.com/registries/azureml/models/microsoft-phi-1-5/version/9

License: other

Properties

SharedComputeCapacityEnabled: True

SHA: 92557d03bb12543040c8bb5f0475cbdd9968f05f

datasets: StackOverflow, Stackv1.2, CodeContests, gpt-3.5-turbo-0301

inference-min-sku-spec: 4|0|14|28

inference-recommended-sku: Standard_DS3_v2, Standard_DS4_v2, Standard_NC6s_v3, Standard_NC12s_v3, Standard_NC24s_v3, Standard_ND40rs_v2, Standard_ND96asr_v4

languages: en

evaluation-min-sku-spec: 4|0|28|56

evaluation-recommended-sku: Standard_DS4_v2, Standard_DS5_v2, Standard_DS12_v2, Standard_NC6s_v3, Standard_NC12s_v3, Standard_NC24s_v3, Standard_NC24rs_v3, Standard_ND40rs_v2, Standard_ND96asr_v4, Standard_ND96amsr_A100_v4

⚠️ **GitHub.com Fallback** ⚠️