models Foundation Models documentation - Azure/azureml-assets GitHub Wiki
ALLaM is a series of powerful language models designed to advance Arabic Language Technology (ALT) developed by the National Center for Artificial Intelligence (NCAI) at the Saudi Data and AI Authority (SDAIA). ALLaM-2-7b-instruct
is traine...
-
Automated Machine Learning, or AutoML, is a process that automates the repetitive and time-consuming tasks involved in developing machine learning models. This helps data scientists, analysts, and developers to create models more efficiently and with higher quality, resulting in increased product...
-
AutoML-Image-Instance-Segmentation
Automated Machine Learning, or AutoML, is a process that automates the repetitive and time-consuming tasks involved in developing machine learning models. This helps data scientists, analysts, and developers to create models more efficiently and with higher quality, resulting in increased product...
-
Automated Machine Learning, or AutoML, is a process that automates the repetitive and time-consuming tasks involved in developing machine learning models. This helps data scientists, analysts, and developers to create models more efficiently and with higher quality, resulting in increased product...
-
AutoML-Named-Entity-Recognition
Automated Machine Learning, or AutoML, is a process that automates the repetitive and time-consuming tasks involved in developing machine learning models. This helps data scientists, analysts, and developers to create models more efficiently and with higher quality, resulting in increased product...
-
Automated Machine Learning, or AutoML, is a process that automates the repetitive and time-consuming tasks involved in developing machine learning models. This helps data scientists, analysts, and developers to create models more efficiently and with higher quality, resulting in increased product...
-
BERT is a transformers model pretrained on a large corpus of English data in a self-supervised fashion. This means it was pretrained on the raw texts only, with no humans labelling them in any way (which is why it can use lots of publicly available data) with an automatic process to generate inpu...
-
BERT is a transformers model pretrained on a large corpus of English data in a self-supervised fashion. This means it was pretrained on the raw texts only, with no humans labeling them in any way (which is why it can use lots of publicly available data) with an automatic process to generate input...
-
BERT is a transformers model pretrained on a large corpus of English data in a self-supervised fashion. This means it was pretrained on the raw texts only, with no humans labelling them in any way (which is why it can use lots of publicly available data) with an automatic process to generate inpu...
-
BERT is a transformers model pretrained on a large corpus of English data in a self-supervised fashion. This means it was pretrained on the raw texts only, with no humans labelling them in any way (which is why it can use lots of publicly available data) with an automatic process to generate inpu...
-
BiomedCLIP-PubMedBERT_256-vit_base_patch16_224
BiomedCLIP is a biomedical vision-language foundation model that is pretrained on PMC-15M, a dataset of 15 million figure-caption pairs extracted from biomedical research articles in PubMed Central, using contrastive learning. It uses PubMedBERT as the text encoder and Vision Transformer as the i...
-
CamemBERT is a state-of-the-art language model for French based on the RoBERTa model.
It is now available on Hugging Face in 6 different versions with varying number of parameters, amount of pretraining data and pretraining data source domains.
OSCAR or Open...
-
Stable Diffusion is a latent text-to-image diffusion model capable of generating photo-realistic images given any text input. The Stable-Diffusion-v1-4 checkpoint was initialized with the weights of the Stable-Diffusion-v1-2 checkpoint and subsequently fine-tuned on 225k steps at resolution 5...
-
The CXRReportGen model utilizes a multimodal architecture, integrating a BiomedCLIP image encoder with a Phi-3-Mini text encoder to help an application interpret complex medical imaging studies of chest X-rays. CXRReportGen follows the same framework as **[MAIRA-2](https://www.micros...
-
Databricks'
dolly-v2-12b
, an instruction-following large language model trained on the Databricks machine learning platform that is licensed for commercial use. Based onpythia-12b
, Dolly is trained on ~15k instruction/response fine tuning records [databricks-dolly-15k
](https://github.com/d... -
DeciDiffusion
1.0 is an 820 million parameter latent diffusion model designed for text-to-image conversion. Trained initially on the LAION-v2 dataset and fine-tuned on the LAION-ART dataset, the model's training involved advanced techniques to improve speed, training performance, and achieve su... -
seed=42
batch_size = 12
n_epochs = 4
base_LM_model = "microsoft/MiniLM-L12-H384-uncased"
max_seq_len = 384
learning_rate = 4e-5
lr_schedule = LinearWarmup
warmup_proportion = 0.2
doc_stride=128
max_query_length=64
grad_acc_steps=4
-
This is the roberta-base model, fine-tuned using the SQuAD2.0 dataset. It's been trained on question-answer pairs, including unanswerable questions, for the task of Question Answering.
-
DistilBERT, a transformers model, is designed to be smaller and quicker than BERT. It underwent pretraining on the same dataset in a self-supervised manner, utilizing the BERT base model as a reference. This entails training solely on raw texts, without human annotation, thus enabling the utiliza...
-
distilbert-base-cased-distilled-squad
The DistilBERT model was proposed in the blog post Smaller, faster, cheaper, lighter: Introducing DistilBERT, adistilled version of BERT, and the paper [DistilBERT, adistilled version of BERT: smaller, faster, cheaper and lighter](https://...
-
DistilBERT is a transformers model, smaller and faster than BERT, which was pretrained on the same corpus in a self-supervised fashion, using the BERT base model as a teacher. This means it was pretrained on the raw texts only, with no humans labelling them in any way (which is why it can use lot...
-
distilbert-base-uncased-distilled-squad
DistilBERT model was proposed in the blog post Smaller, faster, cheaper, lighter: Introducing DistilBERT, adistilled version of BERT, and the paper [DistilBERT, adistilled version of BERT: smaller, faster, cheaper and lighter](https://arxi...
-
distilbert-base-uncased-finetuned-sst-2-english
DistilBERT base uncased finetuned SST-2 model is a fine-tune checkpoint of DistilBERT-base-uncased, fine-tuned on SST-2. This model reaches an accuracy of 91.3 on the dev set (for comparison, Bert bert-base-uncased version reaches an accuracy ...
-
DistilGPT2 (short for Distilled-GPT2) is an English-language model pre-trained with the supervision of the 124 million parameter version of GPT-2. DistilGPT2, which has 82 million parameters, was developed using knowledge distillation and was designed to be a faster, li...
-
distilroberta-base is a distilled version of the RoBERTa-base model. It follows the same training procedure as DistilBERT. The code for the distillation process can be found [here](https://github.com/hugg...
-
BART is a transformer model that combines a bidirectional encoder similar to BERT with an autoregressive decoder akin to GPT. It is trained using two main techniques: (1) corrupting text with a chosen noising function, and (2) training a model to reconstruct the original text.
When fine-tuned fo...
-
facebook-deit-base-patch16-224
DeiT (Data-efficient image Transformers) is an image transformer that do not require very large amounts of data for training. This is achieved through a novel distillation procedure using teacher-student strategy, which results in high throughput and accuracy. DeiT is pre-trained and fine-tuned o...
-
facebook-dinov2-base-imagenet1k-1-layer
Vision Transformer (ViT) model trained using the DINOv2 method. It was introduced in the paper DINOv2: Learning Robust Visual Features without Supervision by Oquab et al. and first released...
-
Facebook-DinoV2-Image-Embeddings-ViT-Base
The Vision Transformer (ViT) is a transformer encoder model (BERT-like) pretrained on a large collection of images in a self-supervised fashion with the DinoV2 method.
Images are presented to the model as a sequence of fixed-size patches, which are linearly embedded. One also adds a [CLS] token ...
-
Facebook-DinoV2-Image-Embeddings-ViT-Giant
The Vision Transformer (ViT) is a transformer encoder model (BERT-like) pretrained on a large collection of images in a self-supervised fashion with the DinoV2 method.
Images are presented to the model as a sequence of fixed-size patches, which are linearly embedded. One also adds a [CLS] token ...
-
The Segment Anything Model (SAM) produces high quality object masks from input prompts such as points or boxes, and it can be used to generate masks for all objects in an image. It has been trained on a dataset of 11 million images and 1.1 bi...
-
The Segment Anything Model (SAM) produces high quality object masks from input prompts such as points or boxes, and it can be used to generate masks for all objects in an image. It has been trained on a dataset of 11 million images and 1.1 bi...
-
The Segment Anything Model (SAM) produces high quality object masks from input prompts such as points or boxes, and it can be used to generate masks for all objects in an image. It has been trained on a dataset of 11 million images and 1.1 bi...
-
Description
The adapted AI model for financial reports analysis (preview) is a state-of-the-art small language model (SLM) based on the Phi-3-small-128k architecture, designed specifically for analyzing financial reports. It has been fine-tuned on a few hundred million tokens derived ...
-
finiteautomata-bertweet-base-sentiment-analysis
Repository: https://github.com/finiteautomata/pysentimiento/
Model trained with SemEval 2017 corpus (around ~40k tweets). Base model is BERTweet, a RoBERTa model trained on English tweets.
Uses `POS...
-
The Vision Transformer (ViT) model, as introduced in the paper "An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale" by Dosovitskiy et al., underwent pre-training on ImageNet-21k with a resolution of 224x224. Su...
-
GPT-2 is a transformers model pretrained on a very large corpus of English data in a self-supervised fashion. This means it was pretrained on the raw texts only, with no humans labelling them in any way (which is why it can use lots of publicly available data) with an automatic process to generat...
-
GPT-2 Large is the 774M parameter version of GPT-2, a transformer-based language model created and released by OpenAI. The model is a pretrained model on English language using a causal language modeling (CLM)
See the [associated paper](https://d4mucfpksywv.cloudfront.net/bet...
-
GPT-2 Medium is the 355M parameter version of GPT-2, a transformer-based language model created and released by OpenAI. The model is a pretrained model on English language using a causal language modeling (CLM) objective.
See the [associated paper](https://d4mucfpksywv.c...
-
Meta has developed and publicly released the Llama 2 family of large language models (LLMs), a collection of pretrained and fine-tuned generative text models ranging in scale from 7 billion to 70 billion parameters. Our fine-tuned LLMs, called Llama-2-Chat, are optimized for dialogue use cases. L...
-
mask_rcnn_swin-t-p4-w7_fpn_1x_coco
This paper presents a new vision Transformer, called Swin Transformer, that capably serves as a general-purpose backbone for computer vision. Challenges in adapting Transformer from language to vision arise from differences between the two domains, such as large variations in the scale of visual ...
-
Most medical imaging AI today is narrowly built to detect a small set of individual findings on a single modality like chest X-rays. This training approach is data- and computationally inefficient, requiring ~6-12 months per finding1, and often fails to generalize in real world environments. By f...
-
Biomedical image analysis is fundamental for biomedical discovery in cell biology, pathology, radiology, and many other biomedical domains. MedImageParse is a biomedical foundation model for imaging parsing that can jointly conduct segmentation, detection, and recognition across 9 imaging modalit...
-
microsoft-beit-base-patch16-224-pt22k-ft22k
BEiT (Bidirectional Encoder representation from Image Transformers) is a vision transformer(ViT) pre-trained with Masked Image Modeling(MIM), which is a self-supervised pre-training inspired by BERT from NLP, followed by Intermediate fine-tuning using ImageNet-22k dataset. It is then fine-tuned f...
-
DeBERTa (Decoding-enhanced BERT with Disentangled Attention) improves the BERT and RoBERTa models using disentangled attention and enhanced mask decoder. With those two improvements, DeBERTa out perform RoBERTa on a majority of NLU tasks with 80GB training data...
-
DeBERTa (Decoding-enhanced BERT with Disentangled Attention) improves the BERT and RoBERTa models using disentangled attention and enhanced mask decoder. It outperforms BERT and RoBERTa on majority of NLU tasks with 80GB training data.
Please check the [offi...
-
DeBERTa (Decoding-enhanced BERT with Disentangled Attention) improves the BERT and RoBERTa models using disentangled attention and enhanced mask decoder. With those two improvements, DeBERTa out perform RoBERTa on a majority of NLU tasks with 80GB training data...
-
DeBERTa (Decoding-enhanced BERT with Disentangled Attention) improves the BERT and RoBERTa models using disentangled attention and enhanced mask decoder. It outperforms BERT and RoBERTa on majority of NLU tasks with 80GB training data.
Please check the [offi...
-
DeBERTa (Decoding-enhanced BERT with Disentangled Attention) improves the BERT and RoBERTa models using disentangled attention and enhanced mask decoder. With those two improvements, DeBERTa out perform RoBERTa on a majority of NLU tasks with 80GB training data...
-
microsoft-llava-med-v1.5-mistral-7b
LLaVA-Med v1.5, using mistralai/Mistral-7B-Instruct-v0.2 as LLM for a better commercial license
Large Language and Vision Assistant for bioMedicine (i.e., “LLaVA-Med”) is a large language and vision model trained using a curriculum l...
-
Orca 2 is a finetuned version of LLAMA-2. Orca 2’s training data is a synthetic dataset that was created to enhance the small model’s reasoning abilities. All synthetic training data was moderated using the Microsoft Azure content filters. More details about the model can be found in the [Orca 2 ...
-
Orca 2 is a finetuned version of LLAMA-2. Orca 2’s training data is a synthetic dataset that was created to enhance the small model’s reasoning abilities. All synthetic training data was moderated using the Microsoft Azure content filters. More details about the model can be found in the [Orca 2 ...
-
microsoft-swinv2-base-patch4-window12-192-22k
The Swin Transformer V2 model is a type of Vision Transformer, pre-trained on ImageNet-21k with a resolution of 192x192, is introduced in the research-paper titled "Swin Transformer V2: Scaling Up Capacity and Resolution" authored by ...
-
mistral-community-Mixtral-8x22B-v0-1
The Mixtral-8x22B Large Language Model (LLM) is a pretrained generative Sparse Mixture of Experts.
Mixtral-8x22B-v0.1 is a pretrained base model and therefore does not have any moderation mechanisms.
[Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/H...
-
mistralai-Mistral-7B-Instruct-v0-2
The Mistral-7B-Instruct-v0.2 Large Language Model (LLM) is an instruct fine-tuned version of the Mistral-7B-v0.2.
Mistral-7B-v0.2 has the following changes compared to Mistral-7B-v0.1:
- 32k context window (vs 8k context in v0.1)
- Rope-theta = 1e6
- No Sliding-Window Attention
For full details...
-
mistralai-Mistral-7B-Instruct-v0-3
The Mistral-7B-Instruct-v0.3 Large Language Model (LLM) is an instruct fine-tuned version of the Mistral-7B-v0.3.
Mistral-7B-v0.3 has the following changes compared to Mistral-7B-v0.2
-
Extended vocabulary to 32768 ...
-
mistralai-Mistral-7B-Instruct-v01
The Mistral-7B-Instruct-v0.1 Large Language Model (LLM) is a instruct fine-tuned version of the Mistral-7B-v0.1 generative text model using a variety of publicly available conversation datasets.
For full details of this mod...
The Mistral-7B-v0.1 Large Language Model (LLM) is a pretrained generative text model with 7 billion parameters. Mistral-7B-v0.1 outperforms Llama 2 13B on all benchmarks tested.
For full details of this model please read paper and [releas...
-
mistralai-Mixtral-8x22B-Instruct-v0-1
The Mixtral-8x22B-Instruct-v0.1 Large Language Model (LLM) is an instruct fine-tuned version of the Mixtral-8x22B-v0.1.
Inference type | Python sample (Notebook) | CLI with YAML |
---|---|---|
Real time | <a href="https://aka.ms/... |
-
The Mixtral-8x22B Large Language Model (LLM) is a pretrained generative Sparse Mixture of Experts.
Mixtral-8x22B-v0.1 is a pretrained base model and therefore does not have any moderation mechanisms.
[Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/H...
-
mistralai-Mixtral-8x7B-Instruct-v01
The Mixtral-8x7B Large Language Model (LLM) is a pretrained generative Sparse Mixture of Experts. The Mixtral-8x7B outperforms Llama 2 70B on most benchmarks with 6x faster inference.
Mixtral-8x7B-v0.1 is a decoder-only model with 8 distinct groups or the "experts". At every layer, for every tok...
The Mixtral-8x7B-v0.1 Large Language Model (LLM) is a pretrained generative text model with 7 billion parameters. Mixtral-8x7B-v0.1 outperforms Llama 2 70B on most benchmarks with 6x faster inference.
For full details of this model please read [release blog post](https://mi...
-
mmd-3x-deformable-detr_refine_twostage_r50_16xb2-50e_coco
deformable-detr_refine_twostage_r50_16xb2-50e_coco
model is from OpenMMLab's MMDetection library. DETR has been recently proposed to eliminate the need for many hand-designed components in object detection while... -
mmd-3x-mask-rcnn_swin-t-p4-w7_fpn_1x_coco
mask-rcnn_swin-t-p4-w7_fpn_1x_coco
model is from OpenMMLab's MMDetection library. This paper presents a new vision Transformer, called Swin Transformer, that capably serves as a general-purpose backbone for comp... -
mmd-3x-sparse-rcnn_r101_fpn_300-proposals_crop-ms-480-800-3x_coco
sparse-rcnn_r101_fpn_300-proposals_crop-ms-480-800-3x_coco
model is from OpenMMLab's MMDetection library. We present Sparse R-CNN, a purely sparse method for object detection in images. Existing works on object ... -
mmd-3x-sparse-rcnn_r50_fpn_300-proposals_crop-ms-480-800-3x_coco
sparse-rcnn_r50_fpn_300-proposals_crop-ms-480-800-3x_coco
model is from OpenMMLab's MMDetection library. We present Sparse R-CNN, a purely sparse method for object detection in images. Existing works on object d... -
mmd-3x-vfnet_r50-mdconv-c3-c5_fpn_ms-2x_coco
vfnet_r50-mdconv-c3-c5_fpn_ms-2x_coco
model is from OpenMMLab's MMDetection library. Accurately ranking the vast number of candidate detections is crucial for dense object detectors to achieve high performance. ... -
mmd-3x-vfnet_x101-64x4d-mdconv-c3-c5_fpn_ms-2x_coco
vfnet_x101-64x4d-mdconv-c3-c5_fpn_ms-2x_coco
model is from OpenMMLab's MMDetection library. Accurately ranking the vast number of candidate detections is crucial for dense object detectors to achieve high perfor... -
mmd-3x-yolof_r50_c5_8x8_1x_coco
yolof_r50_c5_8x8_1x_coco
model is from OpenMMLab's MMDetection library. This paper revisits feature pyramids networks (FPN) for one-stage detectors and points out that the success of FPN is due to its divide-an... -
Multimodal Early Fusion Transformer (MMEFT) is a transformer-based model tailored for processing both structured and unstructured data.
It can be used for multi-class and multi-label multimodal classification tasks, and is capable of handling datasets with features from diverse modes, includ...
-
ocsort_yolox_x_crowdhuman_mot17-private-half
ocsort_yolox_x_crowdhuman_mot17-private-half
model is from OpenMMLab's MMTracking library. Multi-Object Tracking (MOT) has rapidly progressed with the development of object detection and re-identification. Howev... -
OpenAI-CLIP-Image-Text-Embeddings-vit-base-patch32
OpenAI's CLIP (Contrastive Language–Image Pre-training) model was designed to investigate the factors that contribute to the robustness of computer vision tasks. It can seamlessly adapt to a range of image classification tasks without requiring specific training for each, demonstrating efficiency...
-
OpenAI-CLIP-Image-Text-Embeddings-ViT-Large-Patch14-336
The
CLIP
model was developed by researchers at OpenAI to learn about what contributes to robustness in computer vision tasks. The model was also developed to test the ability of models to generalize to arbitrary image classification tasks in a zero-shot manner. It was not developed for general ... -
OpenAI's CLIP (Contrastive Language–Image Pre-training) model was designed to investigate the factors that contribute to the robustness of computer vision tasks. It can seamlessly adapt to a range of image classification tasks without requiring specific training for each, demonstrating efficiency...
-
OpenAI's CLIP (Contrastive Language–Image Pre-training) model was designed to investigate the factors that contribute to the robustness of computer vision tasks. It can seamlessly adapt to a range of image classification tasks without requiring specific training for each, demonstrating efficiency...
-
The Phi-3-Medium-128K-Instruct is a 14B parameters, lightweight, state-of-the-art open model trained with the Phi-3 datasets that includes both synthetic data and the filtered publicly available websites data with a focus on high-quality and reasoning dense properties. The model belongs to the Ph...
-
The Phi-3-Medium-4K-Instruct is a 14B parameters, lightweight, state-of-the-art open model trained with the Phi-3 datasets that includes both synthetic data and the filtered publicly available websites data with a focus on high-quality and reasoning dense properties. The model belongs to the Phi-...
-
The Phi-3-Mini-128K-Instruct is a 3.8 billion-parameter, lightweight, state-of-the-art open model trained using the Phi-3 datasets. This dataset includes both synthetic data and filtered publicly available website data, with an emphasis on high-quality and reasoning-dense properties.
After initi...
-
The Phi-3-Mini-4K-Instruct is a 3.8B parameters, lightweight, state-of-the-art open model trained with the Phi-3 datasets that includes both synthetic data and the filtered publicly available websites data with a focus on high-quality and reasoning dense properties. The model belongs to the Phi-3...
-
The Phi-3-Small-128K-Instruct is a 7B parameters, lightweight, state-of-the-art open model trained with the Phi-3 datasets that includes both synthetic data and the filtered publicly available websites data with a focus on high-quality and reasoning dense properties. The model supports 128K conte...
-
The Phi-3-Small-8K-Instruct is a 7B parameters, lightweight, state-of-the-art open model trained with the Phi-3 datasets that includes both synthetic data and the filtered publicly available websites data with a focus on high-quality and reasoning dense properties. The model supports 8K context l...
-
Phi-3 Vision is a lightweight, state-of-the-art open multimodal model built upon datasets which include - synthetic data and filtered publicly available websites - with a focus on very high-quality, reasoning dense data both on text and vision. The model belongs to the Phi-3 mo...
-
Phi-3.5-mini is a lightweight, state-of-the-art open model built upon datasets used for Phi-3 - synthetic data and filtered publicly available websites - with a focus on very high-quality, reasoning dense data. The model belongs to the Phi-3 model family and supports 128K token context length. Th...
-
Phi-3.5-MoE is a lightweight, state-of-the-art open model built upon datasets used for Phi-3 - synthetic data and filtered publicly available documents - with a focus on very high-quality, reasoning dense data. The model supports multilingual and comes with 128K context length (in tokens). The mo...
-
Phi-3.5-vision is a lightweight, state-of-the-art open multimodal model built upon datasets which include - synthetic data and filtered publicly available websites - with a focus on very high-quality, reasoning dense data both on text and vision. The model belongs to the Phi-3 model family, and t...
-
Phi-4 is a state-of-the-art open model built upon a blend of synthetic datasets, data from filtered public domain websites, and acquired academic books and Q&A datasets. The goal of this approach was to ensure that small capable models were trained with data focused on high quality and advanced r...
-
PRISM is a multi-modal generative foundation model for slide-level analysis of H&E-stained histopathology images. Utilizing Virchow tile embeddings and clinical report texts for pre-training, PRISM combines these embeddings into a single slide embedding and generates a text-based diagnostic repor...
-
Digital pathology poses unique computational challenges, as a standard gigapixel slide may comprise tens of thousands of image tiles[^1],[^2],[^3]. Previous models often rely predominantly on tile-level predictions, which can overlook critical slide-level context and spatial depen...
-
RoBERTa is a transformers model pretrained on a large corpus of English data in a self-supervised fashion. This means it was pretrained on the raw texts only, with no humans labelling them in any way (which is why it can use lots of publicly available data) with an automatic process to generate i...
-
The RoBERTa base OpenAI Detector functions as a model designed to detect outputs generated by the GPT-2 model. It was created by refining a RoBERTa base model using the outputs of the 1.5B-parameter GPT-2 model. This detector is utilized to determine whether text was generated by a GPT-2 model. O...
-
RoBERTa is a transformers model pretrained on a large corpus of English data in a self-supervised fashion. This means it was pretrained on the raw texts only, with no humans labelling them in any way (which is why it can use lots of publicly available data) with an automatic process to generate i...
-
roberta-large-mnli is the RoBERTa large model fine-tuned on the Multi-Genre Natural Language Inference (MNLI) corpus. The model is a pretrained model on English language text using a masked language modeling ...
-
RoBERTa large OpenAI Detector is the GPT-2 output detector model, obtained by fine-tuning a RoBERTa large model with the outputs of the 1.5B-parameter GPT-2 model. The model can be used to predict if text was generated by a GPT-2 model. This model was released by OpenAI at the same time as Op...
-
runwayml-stable-diffusion-inpainting
Stable Diffusion Inpainting is a latent text-to-image diffusion model capable of generating photo-realistic images given any text input, with the extra capability of inpainting the pictures by using a mask.
The Stable-Diffusion-Inpainting was initialized with the weights of the Stable-Diffus...
-
runwayml-stable-diffusion-v1-5
Stable Diffusion is a latent text-to-image diffusion model capable of generating photo-realistic images given any text input. The Stable-Diffusion-v1-5 checkpoint was initialized with the weights of the Stable-Diffusion-v1-2 checkpoint and subsequently fine-tuned on 595k steps at resolution 5...
-
Salesforce-BLIP-2-opt-2-7b-image-to-text
The BLIP-2
model, utilizing OPT-2.7b (a large language model with 2.7 billion parameters), is presented in the paper titled "BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models". T...
-
Salesforce-BLIP-2-opt-2-7b-vqa
The
BLIP-2
model, utilizing OPT-2.7b (a large language model with 2.7 billion parameters), is presented in the paper titled "BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models". Th... -
Salesforce-BLIP-image-captioning-base
BLIP
(Bootstrapping Language-Image Pre-training) designed for unified vision-language understanding and generation is a new VLP framework that expands the scope of downstream tasks compared to existing methods. The framework encompasses two key contributions from both model and data perspective... -
BLIP
(Bootstrapping Language-Image Pre-training) designed for unified vision-language understanding and generation is a new VLP framework that expands the scope of downstream tasks compared to existing methods. The framework encompasses two key contributions from both model and data perspective... -
Arctic is a dense-MoE Hybrid transformer architecture pre-trained from scratch by the Snowflake AI Research Team. We are releasing model checkpoints for both the base and instruct-tuned versions of Arctic under an Apache-2.0 license. This means you can use them freely in your ow...
Arctic is a dense-MoE Hybrid transformer architecture pre-trained from scratch by the Snowflake AI Research Team. We are releasing model checkpoints for both the base and instruct-tuned versions of Arctic under an Apache-2.0 license. This means you can use them freely in your ow...
-
sparse_rcnn_r101_fpn_300_proposals_crop_mstrain_480-800_3x_coco
sparse_rcnn_r101_fpn_300_proposals_crop_mstrain_480-800_3x_coco
model is from OpenMMLab's MMDetection library. This model is reported to obtain <a href="https://github.com/open-mmlab/mmdetection/blob/e9cae2d078... -
sparse_rcnn_r50_fpn_300_proposals_crop_mstrain_480-800_3x_coco
sparse_rcnn_r50_fpn_300_proposals_crop_mstrain_480-800_3x_coco
model is from OpenMMLab's MMDetection library. This model is reported to obtain <a href="https://github.com/open-mmlab/mmdetection/blob/e9cae2d0787... -
The RoBERTa Large model is a large transformer-based language model that was developed by the Hugging Face team. It is pre-trained on masked language modeling and can be used for tasks such as sequence classification, token classification, or question answering. Its primary usage is as a fine-tun...
-
stabilityai-stable-diffusion-2-1
This
stable-diffusion-2-1
model is fine-tuned from stable-diffusion-2 (768-v-ema.ckpt
) with an additional 55k steps on the same dataset (withpunsafe=0.1
), and then fine-tuned for another 155k extra steps withpunsafe=0.98
.
The mod...
-
stabilityai-stable-diffusion-2-inpainting
This
stable-diffusion-2-inpainting
model is resumed from stable-diffusion-2-base (512-base-ema.ckpt
) and trained for another 200k steps. Follows the mask-generation strategy presented in LAMA wh... -
stabilityai-stable-diffusion-xl-base-1-0
SDXL consists of an ensemble of experts pipeline for latent diffusion: In a first step, the base model (available here: https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0) is used to generate (noisy) latents, wh...
-
stabilityai-stable-diffusion-xl-refiner-1-0
SDXL consists of an ensemble of experts pipeline for latent diffusion: In a first step, the base model (available here: https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0) is used to generate (noisy) latents, wh...
-
supply-chain-trade-regulations
Description
The adapted AI model for supply chain trade regulations analysis (preview) is a 3.8B parameter, lightweight, state-of-the-art open model, trained using synthetic supply chain domain-specific datasets, focused on trade regulations.
The model is fine-tuned on the base model...
With T5, we propose reframing all NLP tasks into a unified text-to-text-format where the input and output are always text strings, in contrast to B...
With T5, we propose reframing all NLP tasks into a unified text-to-text-format where the input and output are always text strings, in contrast to B...
With T5, we propose reframing all NLP tasks into a unified text-to-text-format where the input and output are always text strings, in contrast to B...
Falcon-40B is a large language model (LLM) developed by the Technology Innovation Institute (TII) with 40 billion parameters. It is a causal decoder-only model trained on 1 trillion tokens from the RefinedWeb dataset, enhanced with curated corpora. Falcon-40B supports English, Germa...
Falcon-40B-Instruct is a large language model with 40 billion parameters, developed by TII. It is a causal decoder-only model fine-tuned on a mixture of Baize data and is released under the Apache 2.0 license. This model is optimized for inference and features FlashAttention and mul...
Falcon-7B is a large language model with 7 billion parameters. It is a causal decoder-only model developed by TII and trained on 1,500 billion tokens of RefinedWeb dataset, which was enhanced with curated corpora. The model is available under the Apache 2.0 license. It outperforms c...
Falcon-7B-Instruct is a large language model with 7 billion parameters, developed by TII. It is a causal decoder-only model and is released under the Apache 2.0 license. This model is optimized for inference and features FlashAttention and multiquery architectures. It is primarily d...
-
vfnet_r50_fpn_mdconv_c3-c5_mstrain_2x_coco
vfnet_r50_fpn_mdconv_c3-c5_mstrain_2x_coco
model is from OpenMMLab's MMDetection library. This model is reported to obtain <a href="https://github.com/open-mmlab/mmdetection/blob/e9cae2d0787cd5c2fc6165a6061f92f... -
vfnet_x101_64x4d_fpn_mdconv_c3-c5_mstrain_2x_coco
vfnet_x101_64x4d_fpn_mdconv_c3-c5_mstrain_2x_coco
model is from OpenMMLab's MMDetection library. This model is reported to obtain <a href="https://github.com/open-mmlab/mmdetection/blob/e9cae2d0787cd5c2fc6165a6... -
Virchow is a self-supervised vision transformer pretrained using 1.5M whole slide histopathology images. The model can be used as a tile-level feature extractor (frozen or finetuned) to achieve state-of-the-art results for a wide variety of downstream computational pathology use cases.
-
Virchow2 is a self-supervised vision transformer pretrained using 3.1M whole slide histopathology images. The model can be used as a tile-level feature extractor (frozen or finetuned) to achieve state-of-the-art results for a wide variety of downstream computational pathology use cases.
-
yolof_r50_c5_8x8_1x_coco
model is from OpenMMLab's MMDetection library. This model is reported to obtain <a href="https://github.com/open-mmlab/mmdetection/blob/e9cae2d0787cd5c2fc6165a6061f92fa09e48fb1/configs/...