models tiiuae falcon 7b - Azure/azureml-assets GitHub Wiki
Falcon-7B is a large language model with 7 billion parameters. It is a causal decoder-only model developed by TII and trained on 1,500 billion tokens of RefinedWeb dataset, which was enhanced with curated corpora. The model is available under the Apache 2.0 license. It outperforms comparable open-source models and features an architecture optimized for inference. However, it is a raw, pretrained model that should be further finetuned for most use cases.
The model is recommended for research on large language models and as a foundation for further specialization and finetuning for specific tasks. It should not be used in production without adequate assessment of risks and mitigation. The model carries biases commonly encountered online and is trained on English and French data only.
The training details of Falcon-7B include information about the training data, training procedure, and hyperparameters used. It was trained on 384 A100 40GB GPUs using a 2D parallelism strategy combined with ZeRO. The model description mentions the architectural adaptations from the GPT-3 model, such as rotary positional embeddings, multiquery attention, and FlashAttention.
The above summary was generated using ChatGPT. Review the original model card to understand the data used to train the model, evaluation metrics, license, intended uses, limitations and bias before using the model. Some of the content has been made available below.
Falcon-7B was trained on 1,500B tokens of RefinedWeb, a high-quality filtered and deduplicated web dataset which we enhanced with curated corpora. Significant components from our curated copora were inspired by The Pile (Gao et al., 2020).
Data source | Fraction | Tokens | Sources |
---|---|---|---|
RefinedWeb-English | 79% | 1,185B | massive web crawl |
Books | 7% | 110B | |
Conversations | 6% | 85B | Reddit, StackOverflow, HackerNews |
Code | 3% | 45B | |
RefinedWeb-French | 3% | 45B | massive web crawl |
Technical | 2% | 30B | arXiv, PubMed, USPTO, etc. |
The data was tokenized with the Falcon-7B/40B tokenizer.
Falcon-7B was trained on 384 A100 40GB GPUs, using a 2D parallelism strategy (PP=2, DP=192) combined with ZeRO.
Hyperparameter | Value | Comment |
---|---|---|
Precision | bfloat16 |
|
Optimizer | AdamW | |
Learning rate | 6e-4 | 4B tokens warm-up, cosine decay to 1.2e-5 |
Weight decay | 1e-1 | |
Z-loss | 1e-4 | |
Batch size | 2304 | 30B tokens ramp-up |
Training happened in early March 2023 and took about two weeks.
Paper coming soon.
See the OpenLLM Leaderboard for early results.
Falcon-7B is a causal decoder-only model trained on a causal language modeling task (i.e., predict the next token).
The architecture is broadly adapted from the GPT-3 paper (Brown et al., 2020), with the following differences:
- Positionnal embeddings: rotary (Su et al., 2021);
- Attention: multiquery (Shazeer et al., 2019) and FlashAttention (Dao et al., 2022);
- Decoder-block: parallel attention/MLP with a single layer norm.
Hyperparameter | Value | Comment |
---|---|---|
Layers | 32 | |
d_model |
4544 | Increased to compensate for multiquery |
head_dim |
64 | Reduced to optimise for FlashAttention |
Vocabulary | 65024 | |
Sequence length | 2048 |
Falcon-7B was trained on AWS SageMaker, on 384 A100 40GB GPUs in P4d instances.
Falcon-7B was trained a custom distributed training codebase, Gigatron. It uses a 3D parallelism approach combined with ZeRO and high-performance Triton kernels (FlashAttention, etc.)
Falcon-7B is made available under the Apache 2.0 license.
Task | Use case | Dataset | Python sample (Notebook) | CLI with YAML |
---|---|---|---|---|
Text Classification | Emotion Detection | Emotion | emotion-detection.ipynb | emotion-detection.sh |
Task | Use case | Dataset | Python sample (Notebook) | CLI with YAML |
---|---|---|---|---|
Text generation | Text generation | cnn_dailymail | evaluate-model-text-generation.ipynb | evaluate-model-text-generation.yml |
Inference type | Python sample (Notebook) | CLI with YAML |
---|---|---|
Real time | text-generation-online-endpoint.ipynb | text-generation-online-endpoint.sh |
Batch | text-generation-batch-endpoint.ipynb | coming soon |
{
"input_data": {
"input_string":["the meaning of life is"]
}
}
[
{
"0": "the meaning of life is to find your gift. the purpose of life is to give it away."
}
]
Version: 10
Featured
license : apache-2.0
SharedComputeCapacityEnabled
task : text-generation
hiddenlayerscanned
author : tiiuae
huggingface_model_id : tiiuae/falcon-7b
inference_compute_allow_list : ['Standard_NC6s_v3', 'Standard_NC12s_v3', 'Standard_NC24s_v3', 'Standard_ND40rs_v2', 'Standard_ND96asr_v4', 'Standard_ND96amsr_A100_v4']
finetune_compute_allow_list : ['Standard_NC24s_v3', 'Standard_ND40rs_v2', 'Standard_ND96asr_v4', 'Standard_ND96amsr_A100_v4']
evaluation_compute_allow_list : ['Standard_NC24s_v3', 'Standard_ND40rs_v2', 'Standard_ND96asr_v4', 'Standard_ND96amsr_A100_v4']
model_specific_defaults : ordereddict({'apply_lora': 'true', 'precision': '16', 'apply_deepspeed': 'true', 'ignore_mismatched_sizes': 'false'})
inference_supported_envs : ['vllm']
View in Studio: https://ml.azure.com/registries/azureml/models/tiiuae-falcon-7b/version/10
License: apache-2.0
SharedComputeCapacityEnabled: True
SHA: f7796529e36b2d49094450fb038cc7c4c86afa44
datasets: tiiuae/falcon-refinedweb
languages: en
inference-min-sku-spec: 6|1|112|736
inference-recommended-sku: Standard_NC6s_v3, Standard_NC12s_v3, Standard_NC24s_v3, Standard_ND40rs_v2, Standard_ND96asr_v4, Standard_ND96amsr_A100_v4
evaluation-min-sku-spec: 24|4|448|2900
evaluation-recommended-sku: Standard_NC24s_v3, Standard_ND40rs_v2, Standard_ND96asr_v4, Standard_ND96amsr_A100_v4
finetune-min-sku-spec: 24|4|448|2900
finetune-recommended-sku: Standard_NC24s_v3, Standard_ND40rs_v2, Standard_ND96asr_v4, Standard_ND96amsr_A100_v4
finetuning-tasks: text-classification