models tiiuae falcon 40b instruct - Azure/azureml-assets GitHub Wiki

tiiuae-falcon-40b-instruct

Overview

Description

Falcon-40B-Instruct is a large language model with 40 billion parameters, developed by TII. It is a causal decoder-only model fine-tuned on a mixture of Baize data and is released under the Apache 2.0 license. This model is optimized for inference and features FlashAttention and multiquery architectures. It is primarily designed for chat and instruct applications in English and French. However, it may not be suitable for further fine-tuning. It is available under the Apache 2.0 license.

Key Details:

Model Type: Causal decoder-only Languages: English and French License: Apache 2.0 Training Data: Fine-tuned on 150 million tokens from Bai ze mixed with 5% of RefinedWeb data Architecture: Based on GPT-3 with optimizations including rotary positional embeddings, FlashAttention, and multiquery attention Hardware: Trained on AWS SageMaker using 64 A100 40GB GPUs in P4d instances Software: Utilizes a custom distributed training codebase called Gigatron

Recommendations and Limitations:

Falcon-40B-Instruct may carry biases commonly found online due to its training data. Users are advised to implement guardrails and take precautions for production use. It's mostly suited for English and French and may not generalize well to other languages.

The above summary was generated using ChatGPT. Review the original model card to understand the data used to train the model, evaluation metrics, license, intended uses, limitations and bias before using the model.

Training Details

Training Data

Falcon-40B-Instruct was finetuned on a 150M tokens from Bai ze mixed with 5% of RefinedWeb data.

The data was tokenized with the Falcon-7B/40B tokenizer.

Training Procedure

Falcon-40B-Instruct was trained on AWS SageMaker, on 64 A100 40GB GPUs in P4d instances.

Evaluation

Paper coming soon.

See the OpenLLM Leaderboard for early results.

Model Architecture and Objective

Falcon-40B is a causal decoder-only model trained on a causal language modeling task (i.e., predict the next token).

The architecture is broadly adapted from the GPT-3 paper (Brown et al., 2020), with the following differences:

For multiquery, we are using an internal variant which uses independent key and values per tensor parallel degree.

Hyperparameter Value Comment
Layers 60
d_model 8192
head_dim 64 Reduced to optimise for FlashAttention
Vocabulary 65024
Sequence length 2048

Compute Infrastructure

Hardware

Falcon-40B-Instruct was trained on AWS SageMaker, on 64 A100 40GB GPUs in P4d instances.

Software

Falcon-40B-Instruct was trained a custom distributed training codebase, Gigatron. It uses a 3D parallelism approach combined with ZeRO and high-performance Triton kernels (FlashAttention, etc.)

License

Falcon-40B is made available under the Apache 2.0 license.

Model Evaluation Samples

Task Use case Dataset Python sample (Notebook) CLI with YAML
Text generation Text generation cnn_dailymail evaluate-model-text-generation.ipynb evaluate-model-text-generation.yml

Inference samples

Inference type Python sample (Notebook) CLI with YAML
Real time text-generation-online-endpoint-dolly.ipynb text-generation-online-endpoint-dolly.sh
Batch text-generation-batch-endpoint.ipynb coming soon

Sample input (for real-time inference)

{
  "input_data": {
      "input_string":["Develop a Python function to sort a list of integers in ascending order"]
  }
}

Sample output

[
  {
    "0": "You can use the sorted() function in Python to sort a list of integers in ascending order. Here's an example: my_list = [3,1,6,4,1,5] sorted_list = sorted(my_list) print(sorted_list) This will output: [1,1,3,4,5,6]"
  }
]

Version: 5

Tags

Featured license : apache-2.0 SharedComputeCapacityEnabled task : text-generation author : tiiuae huggingface_model_id : tiiuae/falcon-40b-instruct evaluation_compute_allow_list : ['Standard_NC24s_v3', 'Standard_NC24rs_v3', 'Standard_ND40rs_v2', 'Standard_ND96asr_v4', 'Standard_ND96amsr_A100_v4'] inference_compute_allow_list : ['Standard_ND40rs_v2', 'Standard_ND96asr_v4', 'Standard_ND96amsr_A100_v4'] inference_supported_envs : ['vllm']

View in Studio: https://ml.azure.com/registries/azureml/models/tiiuae-falcon-40b-instruct/version/5

License: apache-2.0

Properties

SharedComputeCapacityEnabled: True

SHA: ca78eac0ed45bf64445ff0687fabba1598daebf3

datasets: tiiuae/falcon-refinedweb

languages: en

evaluation-min-sku-spec: 24|4|448|2900

evaluation-recommended-sku: Standard_NC24s_v3, Standard_NC24rs_v3, Standard_ND40rs_v2, Standard_ND96asr_v4, Standard_ND96amsr_A100_v4

inference-min-sku-spec: 40|8|672|2900

inference-recommended-sku: Standard_ND40rs_v2, Standard_ND96asr_v4, Standard_ND96amsr_A100_v4

⚠️ **GitHub.com Fallback** ⚠️