models MLflow Transformers documentation - Azure/azureml-assets GitHub Wiki
-
BERT is a transformers model pretrained on a large corpus of English data in a self-supervised fashion. This means it was pretrained on the raw texts only, with no humans labelling them in any way (which is why it can use lots of publicly available data) with an automatic process to generate inpu...
-
BERT is a transformers model pretrained on a large corpus of English data in a self-supervised fashion. This means it was pretrained on the raw texts only, with no humans labeling them in any way (which is why it can use lots of publicly available data) with an automatic process to generate input...
-
BERT is a transformers model pretrained on a large corpus of English data in a self-supervised fashion. This means it was pretrained on the raw texts only, with no humans labelling them in any way (which is why it can use lots of publicly available data) with an automatic process to generate inpu...
-
BERT is a transformers model pretrained on a large corpus of English data in a self-supervised fashion. This means it was pretrained on the raw texts only, with no humans labelling them in any way (which is why it can use lots of publicly available data) with an automatic process to generate inpu...
-
CamemBERT is a state-of-the-art language model for French based on the RoBERTa model.
It is now available on Hugging Face in 6 different versions with varying number of parameters, amount of pretraining data and pretraining data source domains.
OSCAR or Open...
-
Databricks'
dolly-v2-12b
, an instruction-following large language model trained on the Databricks machine learning platform that is licensed for commercial use. Based onpythia-12b
, Dolly is trained on ~15k instruction/response fine tuning records [databricks-dolly-15k
](https://github.com/d... -
seed=42
batch_size = 12
n_epochs = 4
base_LM_model = "microsoft/MiniLM-L12-H384-uncased"
max_seq_len = 384
learning_rate = 4e-5
lr_schedule = LinearWarmup
warmup_proportion = 0.2
doc_stride=128
max_query_length=64
grad_acc_steps=4
-
This is the roberta-base model, fine-tuned using the SQuAD2.0 dataset. It's been trained on question-answer pairs, including unanswerable questions, for the task of Question Answering.
-
DistilBERT, a transformers model, is designed to be smaller and quicker than BERT. It underwent pretraining on the same dataset in a self-supervised manner, utilizing the BERT base model as a reference. This entails training solely on raw texts, without human annotation, thus enabling the utiliza...
-
distilbert-base-cased-distilled-squad
The DistilBERT model was proposed in the blog post Smaller, faster, cheaper, lighter: Introducing DistilBERT, adistilled version of BERT, and the paper [DistilBERT, adistilled version of BERT: smaller, faster, cheaper and lighter](https://...
-
DistilBERT is a transformers model, smaller and faster than BERT, which was pretrained on the same corpus in a self-supervised fashion, using the BERT base model as a teacher. This means it was pretrained on the raw texts only, with no humans labelling them in any way (which is why it can use lot...
-
distilbert-base-uncased-distilled-squad
DistilBERT model was proposed in the blog post Smaller, faster, cheaper, lighter: Introducing DistilBERT, adistilled version of BERT, and the paper [DistilBERT, adistilled version of BERT: smaller, faster, cheaper and lighter](https://arxi...
-
distilbert-base-uncased-finetuned-sst-2-english
DistilBERT base uncased finetuned SST-2 model is a fine-tune checkpoint of DistilBERT-base-uncased, fine-tuned on SST-2. This model reaches an accuracy of 91.3 on the dev set (for comparison, Bert bert-base-uncased version reaches an accuracy ...
-
DistilGPT2 (short for Distilled-GPT2) is an English-language model pre-trained with the supervision of the 124 million parameter version of GPT-2. DistilGPT2, which has 82 million parameters, was developed using knowledge distillation and was designed to be a faster, li...
-
distilroberta-base is a distilled version of the RoBERTa-base model. It follows the same training procedure as DistilBERT. The code for the distillation process can be found [here](https://github.com/hugg...
-
BART is a transformer model that combines a bidirectional encoder similar to BERT with an autoregressive decoder akin to GPT. It is trained using two main techniques: (1) corrupting text with a chosen noising function, and (2) training a model to reconstruct the original text.
When fine-tuned fo...
-
finiteautomata-bertweet-base-sentiment-analysis
Repository: https://github.com/finiteautomata/pysentimiento/
Model trained with SemEval 2017 corpus (around ~40k tweets). Base model is BERTweet, a RoBERTa model trained on English tweets.
Uses `POS...
-
GPT-2 is a transformers model pretrained on a very large corpus of English data in a self-supervised fashion. This means it was pretrained on the raw texts only, with no humans labelling them in any way (which is why it can use lots of publicly available data) with an automatic process to generat...
-
GPT-2 Large is the 774M parameter version of GPT-2, a transformer-based language model created and released by OpenAI. The model is a pretrained model on English language using a causal language modeling (CLM)
See the [associated paper](https://d4mucfpksywv.cloudfront.net/bet...
-
GPT-2 Medium is the 355M parameter version of GPT-2, a transformer-based language model created and released by OpenAI. The model is a pretrained model on English language using a causal language modeling (CLM) objective.
See the [associated paper](https://d4mucfpksywv.c...
-
DeBERTa (Decoding-enhanced BERT with Disentangled Attention) improves the BERT and RoBERTa models using disentangled attention and enhanced mask decoder. With those two improvements, DeBERTa out perform RoBERTa on a majority of NLU tasks with 80GB training data...
-
DeBERTa (Decoding-enhanced BERT with Disentangled Attention) improves the BERT and RoBERTa models using disentangled attention and enhanced mask decoder. It outperforms BERT and RoBERTa on majority of NLU tasks with 80GB training data.
Please check the [offi...
-
DeBERTa (Decoding-enhanced BERT with Disentangled Attention) improves the BERT and RoBERTa models using disentangled attention and enhanced mask decoder. With those two improvements, DeBERTa out perform RoBERTa on a majority of NLU tasks with 80GB training data...
-
DeBERTa (Decoding-enhanced BERT with Disentangled Attention) improves the BERT and RoBERTa models using disentangled attention and enhanced mask decoder. It outperforms BERT and RoBERTa on majority of NLU tasks with 80GB training data.
Please check the [offi...
-
DeBERTa (Decoding-enhanced BERT with Disentangled Attention) improves the BERT and RoBERTa models using disentangled attention and enhanced mask decoder. With those two improvements, DeBERTa out perform RoBERTa on a majority of NLU tasks with 80GB training data...
-
Orca 2 is a finetuned version of LLAMA-2. Orca 2’s training data is a synthetic dataset that was created to enhance the small model’s reasoning abilities. All synthetic training data was moderated using the Microsoft Azure content filters. More details about the model can be found in the [Orca 2 ...
-
Orca 2 is a finetuned version of LLAMA-2. Orca 2’s training data is a synthetic dataset that was created to enhance the small model’s reasoning abilities. All synthetic training data was moderated using the Microsoft Azure content filters. More details about the model can be found in the [Orca 2 ...
-
RoBERTa is a transformers model pretrained on a large corpus of English data in a self-supervised fashion. This means it was pretrained on the raw texts only, with no humans labelling them in any way (which is why it can use lots of publicly available data) with an automatic process to generate i...
-
The RoBERTa base OpenAI Detector functions as a model designed to detect outputs generated by the GPT-2 model. It was created by refining a RoBERTa base model using the outputs of the 1.5B-parameter GPT-2 model. This detector is utilized to determine whether text was generated by a GPT-2 model. O...
-
RoBERTa is a transformers model pretrained on a large corpus of English data in a self-supervised fashion. This means it was pretrained on the raw texts only, with no humans labelling them in any way (which is why it can use lots of publicly available data) with an automatic process to generate i...
-
roberta-large-mnli is the RoBERTa large model fine-tuned on the Multi-Genre Natural Language Inference (MNLI) corpus. The model is a pretrained model on English language text using a masked language modeling ...
-
RoBERTa large OpenAI Detector is the GPT-2 output detector model, obtained by fine-tuning a RoBERTa large model with the outputs of the 1.5B-parameter GPT-2 model. The model can be used to predict if text was generated by a GPT-2 model. This model was released by OpenAI at the same time as Op...
-
The RoBERTa Large model is a large transformer-based language model that was developed by the Hugging Face team. It is pre-trained on masked language modeling and can be used for tasks such as sequence classification, token classification, or question answering. Its primary usage is as a fine-tun...
-
The developers of the Text-To-Text Transfer Transformer (T5) write:
With T5, we propose reframing all NLP tasks into a unified text-to-text-format where the input and output are always text strings, in contrast to B...
With T5, we propose reframing all NLP tasks into a unified text-to-text-format where the input and output are always text strings, in contrast to B...
With T5, we propose reframing all NLP tasks into a unified text-to-text-format where the input and output are always text strings, in contrast to B...