models mistralai Mistral 7B v01 - Azure/azureml-assets GitHub Wiki

mistralai-Mistral-7B-v01

Overview

Model Details

The Mistral-7B-v0.1 Large Language Model (LLM) is a pretrained generative text model with 7 billion parameters. Mistral-7B-v0.1 outperforms Llama 2 13B on all benchmarks tested.

For full details of this model please read paper and release blog post.

Model Architecture

Mistral-7B-v0.1 is a transformer model, with the following architecture choices:

  • Grouped-Query Attention
  • Sliding-Window Attention
  • Byte-fallback BPE tokenizer

Mistral 7B v0.1 has demonstrated remarkable performance, surpassing Llama 2 13B across all evaluated benchmarks. Notably, it outperforms Llama 1 34B in reasoning, mathematics, and code generation tasks. This achievement showcases the model's versatility and capability to handle a diverse range of language-based challenges.

Notice

Mistral 7B is a pretrained base model and therefore does not have any moderation mechanisms.

Finetuning samples

Task Use case Dataset Python sample (Notebook) CLI with YAML
Text Generation question-answering truthful_qa abstractive_qna_with_text_gen.ipynb text-generation.sh

Model Evaluation Sample

Task Use case Dataset Python sample (Notebook) CLI with YAML
Text generation Text generation cnn_dailymail evaluate-model-text-generation.ipynb evaluate-model-text-generation.yml

Inference samples

Inference type Python sample (Notebook) CLI with YAML
Real time text-generation-online-endpoint.ipynb text-generation-online-endpoint.sh
Batch text-generation-batch-endpoint.ipynb coming soon

Sample inputs and outputs (for real-time inference)

Sample input

{
    "input_data": {
        "input_string": [
            "What is your favourite condiment?",
            "Do you have mayonnaise recipes?"
        ],
        "parameters": {
            "max_new_tokens": 100,
            "do_sample": true,
            "return_full_text": false
        }
    }
}

Sample output

[
  {
    "0": "\n\nMayonnaise - can't be beat.\n\n## If you had to eat one type of food everyday for the rest of your life what would it be?\n\nMango. I'm an avid fruit and vegetable eater.\n\n## What is your favourite fruit and/or vegetable?\n\nMango! I eat an acre of these a year, which is almost two pounds a day.\n\n## What is the strangest food"
  },
  {
    "0": "\n\nWe don't have any mayonnaise recipes - they are too old fashioned!\n\n## I have seen your products in my local Co-op / Waitrose / Spar / Iceland / Marks and Spencers. Where can I buy more?\n\nIf you can't find our products in your local store, ask your Co-op / Sainsburys / Waitrose / Marks & Spencer / Morrisons / Iceland / S"
  }
]

Version: 17

Tags

Featured SharedComputeCapacityEnabled hiddenlayerscanned huggingface_model_id : mistralai/Mistral-7B-v0.1 evaluation_compute_allow_list : ['Standard_NC6s_v3', 'Standard_NC12s_v3', 'Standard_NC24s_v3', 'Standard_NC24rs_v3', 'Standard_ND40rs_v2', 'Standard_NC48ads_A100_v4', 'Standard_NC96ads_A100_v4', 'Standard_ND96amsr_A100_v4', 'Standard_ND96asr_v4'] batch_compute_allow_list : ['Standard_ND40rs_v2', 'Standard_NC24ads_A100_v4', 'Standard_NC48ads_A100_v4', 'Standard_NC96ads_A100_v4', 'Standard_ND96amsr_A100_v4', 'Standard_ND96asr_v4'] inference_compute_allow_list : ['Standard_NC12s_v3', 'Standard_NC24s_v3', 'Standard_ND40rs_v2', 'Standard_NC24ads_A100_v4', 'Standard_NC48ads_A100_v4', 'Standard_NC96ads_A100_v4', 'Standard_ND96amsr_A100_v4', 'Standard_ND96asr_v4'] finetune_compute_allow_list : ['Standard_ND40rs_v2', 'Standard_NC48ads_A100_v4', 'Standard_NC96ads_A100_v4', 'Standard_ND96amsr_A100_v4', 'Standard_ND96asr_v4'] model_specific_defaults : ordereddict({'precision': '16', 'deepspeed_stage': '2', 'apply_deepspeed': 'true', 'apply_ort': 'true', 'apply_lora': 'true', 'ignore_mismatched_sizes': 'false'}) inference_supported_envs : ['vllm', 'ds_mii'] license : apache-2.0 task : text-generation author : Mistral AI benchmark : quality

View in Studio: https://ml.azure.com/registries/azureml/models/mistralai-Mistral-7B-v01/version/17

License: apache-2.0

Properties

SharedComputeCapacityEnabled: True

SHA: 26bca36bde8333b5d7f72e9ed20ccda6a618af24

inference-min-sku-spec: 12|1|220|64

inference-recommended-sku: Standard_NC12s_v3, Standard_NC24s_v3, Standard_ND40rs_v2, Standard_NC24ads_A100_v4, Standard_NC48ads_A100_v4, Standard_NC96ads_A100_v4, Standard_ND96amsr_A100_v4, Standard_ND96asr_v4

evaluation-min-sku-spec: 6|1|112|128

evaluation-recommended-sku: Standard_NC6s_v3, Standard_NC12s_v3, Standard_NC24s_v3, Standard_NC24rs_v3, Standard_ND40rs_v2, Standard_NC48ads_A100_v4, Standard_NC96ads_A100_v4, Standard_ND96amsr_A100_v4, Standard_ND96asr_v4

finetune-min-sku-spec: 40|2|440|128

finetune-recommended-sku: Standard_ND40rs_v2, Standard_NC48ads_A100_v4, Standard_NC96ads_A100_v4, Standard_ND96amsr_A100_v4, Standard_ND96asr_v4

finetuning-tasks: text-generation, text-classification

languages: EN

⚠️ **GitHub.com Fallback** ⚠️