Models

All models

ALLaM-2-7b-instruct

Description

ALLaM is a series of powerful language models designed to advance Arabic Language Technology (ALT) developed by the National Center for Artificial Intelligence (NCAI) at the Saudi Data and AI Authority (SDAIA). ALLaM-2-7b-instruct is traine...

analyze-conversations

The "Analyze Conversations" is a standard model that utilizes Azure AI Language to perform various analyzes on text-based conversations. Azure AI language hosts pre-trained, task-oriented, and optimized conversation focused ML models, including various summarization aspects, PII entity extraction...
analyze-documents

The "Analyze Documents" is a standard model that utilizes Azure AI Language to perform various analyzes on text-based documents. Azure AI language hosts pre-trained, task-oriented, and optimized document focused ML models, such as summarization, sentiment analysis, entity extraction, etc.

...

ask-wikipedia

The "Ask Wikipedia" is a Q&A model that employs GPT3.5 to answer questions using information sourced from Wikipedia, ensuring more grounded responses. This process involves identifying the relevant Wikipedia link and extracting its contents. These contents are then used as an augmented prompt, en...
Atomica

ATOMICA is a hierarchical geometric deep learning model trained on over 2.1 million molecular interaction interfaces. It represents interaction complexes using an all-atom graph structure, where nodes correspond to atoms or grouped chemical blocks, and edges reflect both intra- and intermolecular...
Aurora

Aurora is a machine learning model that can predict general environmental variables, such as temperature and wind speed. It is a foundation model, which means that it was first generally trained on a lot of data, and then can be adapted to specialised environmental forecasting tasks with relati...
AutoML-Image-Classification

Automated Machine Learning, or AutoML, is a process that automates the repetitive and time-consuming tasks involved in developing machine learning models. This helps data scientists, analysts, and developers to create models more efficiently and with higher quality, resulting in increased product...
AutoML-Image-Instance-Segmentation

Automated Machine Learning, or AutoML, is a process that automates the repetitive and time-consuming tasks involved in developing machine learning models. This helps data scientists, analysts, and developers to create models more efficiently and with higher quality, resulting in increased product...
AutoML-Image-Object-Detection

Automated Machine Learning, or AutoML, is a process that automates the repetitive and time-consuming tasks involved in developing machine learning models. This helps data scientists, analysts, and developers to create models more efficiently and with higher quality, resulting in increased product...
AutoML-Named-Entity-Recognition

Automated Machine Learning, or AutoML, is a process that automates the repetitive and time-consuming tasks involved in developing machine learning models. This helps data scientists, analysts, and developers to create models more efficiently and with higher quality, resulting in increased product...
AutoML-Text-Classification

Automated Machine Learning, or AutoML, is a process that automates the repetitive and time-consuming tasks involved in developing machine learning models. This helps data scientists, analysts, and developers to create models more efficiently and with higher quality, resulting in increased product...
bert-base-cased

BERT is a transformers model pretrained on a large corpus of English data in a self-supervised fashion. This means it was pretrained on the raw texts only, with no humans labelling them in any way (which is why it can use lots of publicly available data) with an automatic process to generate inpu...
bert-base-uncased

BERT is a transformers model pretrained on a large corpus of English data in a self-supervised fashion. This means it was pretrained on the raw texts only, with no humans labeling them in any way (which is why it can use lots of publicly available data) with an automatic process to generate input...
bert-large-cased

BERT is a transformers model pretrained on a large corpus of English data in a self-supervised fashion. This means it was pretrained on the raw texts only, with no humans labelling them in any way (which is why it can use lots of publicly available data) with an automatic process to generate inpu...
bert-large-uncased

BERT is a transformers model pretrained on a large corpus of English data in a self-supervised fashion. This means it was pretrained on the raw texts only, with no humans labelling them in any way (which is why it can use lots of publicly available data) with an automatic process to generate inpu...
BiomedCLIP-PubMedBERT_256-vit_base_patch16_224

BiomedCLIP is a biomedical vision-language foundation model that is pretrained on PMC-15M, a dataset of 15 million figure-caption pairs extracted from biomedical research articles in PubMed Central, using contrastive learning. It uses PubMedBERT as the text encoder and Vision Transformer as the i...
Bleu-Score-Evaluator

| | | | -- | -- | | Score range | Float [0-1]: higher means better quality. | | What is this metric? | BLEU (Bilingual Evaluation Understudy) score is commonly used in natural language processing (NLP) and machine translation. It measures how closely the generated text matches the reference text....
bring-your-own-data-chat-qna

The "Bring Your Own Data Chat QnA" is a pre-trained chat model, enhanced by GPT3.5, that leverages your personally indexed data and chat history to deliver more concrete and relevant answers. It involves processing the raw query through an embedding procedure, followed by a "Vector Search" to pin...
bring-your-own-data-qna

The "Bring your own data QnA" is a pre-trained Q&A model, enhanced by GPT3.5, that leverages your personally indexed data to deliver more concrete and relevant answers. It involves processing the raw query through an embedding procedure, followed by a "Vector Search" to pinpoint the most pertinen...
bytetrack_yolox_x_crowdhuman_mot17-private-half

bytetrack_yolox_x_crowdhuman_mot17-private-half model is from OpenMMLab's MMTracking library. Multi-object tracking (MOT) aims at estimating bounding boxes and identities of objects in videos. Most methods obtai...
camembert-base

CamemBERT is a state-of-the-art language model for French based on the RoBERTa model.

It is now available on Hugging Face in 6 different versions with varying number of parameters, amount of pretraining data and pretraining data source domains.

Training Details

Training Data

OSCAR or Open...

Cell2Sentence-Embedding

Cell2Sentence (C2S) is a framework designed to apply Large Language Models (LLMs) to single-cell transcriptomics. It transforms single-cell RNA sequencing (scRNA-seq) data into a format that LLMs can natively understand. The core idea is to convert the gene expression vector of each cell into a "...
chat-quality-safety-eval

The chat quality and safety evaluation flow will evaluate the chat systems by leveraging the state-of-the-art Large Language Models (LLM) to measure the quality and safety of your LLM responses . Utilizing GPT model to assist with measurements aims to achieve a high agreement with human evaluatio...
chat-with-wikipedia

The "Chat with Wikipedia" is a pre-trained chat model with GPT3.5: it combines conversation history and information from Wikipedia to make the answer more grounded. It involves finding a relevant Wikipedia link and getting page contents for the question. It can remember previous interactions and ...
classification-accuracy-eval

The "Classification Accuracy Evaluation" is a model designed to assess the effectiveness of a data classification system. It involves matching each prediction against the ground truth, subsequently assigning a "Correct" or "Incorrect" score. The cumulative results are then leveraged to generate p...
Coherence-Evaluator

| | | | -- | -- | | Score range | Integer [1-5]: 1 is the lowest quality and 5 is the highest quality. | | What is this metric? | Measures how well the language model can produce output that flows smoothly, reads naturally, and resembles human-like language. | | How does it work? | The coherence...
compvis-stable-diffusion-v1-4

Stable Diffusion is a latent text-to-image diffusion model capable of generating photo-realistic images given any text input. The Stable-Diffusion-v1-4 checkpoint was initialized with the weights of the Stable-Diffusion-v1-2 checkpoint and subsequently fine-tuned on 225k steps at resolution 5...
Content-Safety-Evaluator

| | | | -- | -- | | Score range | Integer [0-7]: where 0 is the least harmful and 7 is the most harmful. A text label inis also provided. | | What is this metric? | Measures comprehensively the severity level of the content harm of a response, covering violence, sexual, self-harm, and hate and u...
count-cars

The "Count Cars" is a model designed for accurately quantifying the number of specific vehicles – particularly red cars – in given images. Utilizing the advanced capabilities of Azure OpenAI GPT-4 Turbo with Vision, this system meticulously analyzes each image, identifies and counts red cars, out...
CxrReportGen

Overview

The CXRReportGen model utilizes a multimodal architecture, integrating a BiomedCLIP image encoder with a Phi-3-Mini text encoder to help an application interpret complex medical imaging studies of chest X-rays. CXRReportGen follows the same framework as **[MAIRA-2](https://www.micros...

databricks-dolly-v2-12b

Databricks' dolly-v2-12b, an instruction-following large language model trained on the Databricks machine learning platform that is licensed for commercial use. Based on pythia-12b, Dolly is trained on ~15k instruction/response fine tuning records [databricks-dolly-15k](https://github.com/d...
Deci-DeciCoder-1b

The Model Card for DeciCoder 1B provides details about a 1 billion parameter decoder-only code completion model developed by Deci. The model was trained on Python, Java, and JavaScript subsets of Starcoder Training Dataset and uses Grouped Query Attention with a context window of 2048 tokens. It ...
deci-decidiffusion-v1-0

DeciDiffusion 1.0 is an 820 million parameter latent diffusion model designed for text-to-image conversion. Trained initially on the LAION-v2 dataset and fine-tuned on the LAION-ART dataset, the model's training involved advanced techniques to improve speed, training performance, and achieve su...
Deci-DeciLM-7B

DeciLM-7B is a decoder-only text generation model with 7.04 billion parameters, released by Deci under the Apache 2.0 license. It is the top-performing 7B base language model on the Open LLM Leaderboard and uses variable Grouped-Query Attention (GQA) to achieve a superior balance between accuracy...
Deci-DeciLM-7B-instruct

DeciLM-7B-instruct is a model for short-form instruction following, built by LoRA fine-tuning on the SlimOrca dataset. It is a derivative of the recently released DeciLM-7B language model, a pre-trained, high-efficiency generative text model with 7 billion parameters. DeciLM-7B-instruct is one of...
deepseek-r1-distill-llama-8b

This model is an optimized version of DeepSeek-R1-Distill-Llama-8B for local inference. Optimized models are published here in ONNX format to run on CPU and GPU across devices, including server platforms, Windows, Linux and Mac desktops, and mobile CPUs, with the precision best suited to each of ...
deepseek-r1-distill-llama-8b-cuda-gpu

This model is an optimized version of DeepSeek-R1-Distill-Llama-8B to enable local inference on CUDA GPUs. This model uses RTN quantization.

Model Description

Developed by: Microsoft
Model type: ONNX
License: MIT
Model Description: This is a conversion of the DeepSeek-R...
deepseek-r1-distill-llama-8b-generic-cpu

This model is an optimized version of DeepSeek-R1-Distill-Llama-8B to enable local inference on CPUs. This model uses RTN quantization.

Model Description

Developed by: Microsoft
Model type: ONNX
License: MIT
Model Description: This is a conversion of the DeepSeek-R1-Dis...
deepseek-r1-distill-llama-8b-generic-gpu

This model is an optimized version of DeepSeek-R1-Distill-Llama-8B to enable local inference on GPUs. This model uses RTN quantization.

Model Description

Developed by: Microsoft
Model type: ONNX
License: MIT
Model Description: This is a conversion of the DeepSeek-R1-Dis...
deepseek-r1-distill-qwen-1.5b

This model is an optimized version of DeepSeek-R1-Distill-Qwen-1.5B for local inference. Optimized models are published here in ONNX format to run on CPU, GPU, and NPU across devices, including server platforms, Windows, Linux and Mac desktops, and mobile CPUs, with the precision best suited to e...
deepseek-r1-distill-qwen-1.5b-cuda-gpu

This model is an optimized version of DeepSeek-R1-Distill-Qwen-1.5B to enable local inference on CUDA GPUs. This model uses RTN quantization.

Model Description

Developed by: Microsoft
Model type: ONNX
License: MIT
Model Description: This is a conversion of the DeepSeek-...
deepseek-r1-distill-qwen-1.5b-generic-cpu

This model is an optimized version of DeepSeek-R1-Distill-Qwen-1.5B to enable local inference on CPUs. This model uses RTN quantization.

Model Description

Developed by: Microsoft
Model type: ONNX
License: MIT
Model Description: This is a conversion of the DeepSeek-R1-Di...
deepseek-r1-distill-qwen-1.5b-generic-gpu

This model is an optimized version of DeepSeek-R1-Distill-Qwen-1.5B to enable local inference on GPUs. This model uses RTN quantization.

Model Description

Developed by: Microsoft
Model type: ONNX
License: MIT
Model Description: This is a conversion of the DeepSeek-R1-Di...
DeepSeek-R1-Distill-Qwen-1.5B-openvino-gpu

This model is an optimized version of DeepSeek-R1-Distill-Qwen-1.5B to enable local inference on Intel GPUs.

Model Description

Developed by: Microsoft
Model type: ONNX
License: MIT
Model Description: This is a conversion of the DeepSeek-R1-Distill-Qwen-1.5B for local in...
deepseek-r1-distill-qwen-1.5b-qnn-npu

This model is an optimized version of DeepSeek-R1-Distill-Qwen-1.5B to enable local inference on QNN NPUs. This model uses QuaRot and GPTQ quantization.

Model Description

Developed by: Microsoft
Model type: ONNX
License: MIT
Model Description: This is a conversion of th...
DeepSeek-R1-Distill-Qwen-1.5B-trtrtx-gpu

This model is an optimized version of DeepSeek-R1-Distill-Qwen-1.5B to enable local inference on TensorRT-RTX GPUs.

Model Description

Developed by: Microsoft
Model type: ONNX
License: MIT
Model Description: This is a conversion of the DeepSeek-R1-Distill-Qwen-1.5B for l...
deepseek-r1-distill-qwen-14b

This model is an optimized version of DeepSeek-R1-Distill-Qwen-14B for local inference. Optimized models are published here in ONNX format to run on CPU and GPU across devices, including server platforms, Windows, Linux and Mac desktops, and mobile CPUs, with the precision best suited to each of ...
deepseek-r1-distill-qwen-14b-cuda-gpu

This model is an optimized version of DeepSeek-R1-Distill-Qwen-14B to enable local inference on CUDA GPUs. This model uses RTN quantization.

Model Description

Developed by: Microsoft
Model type: ONNX
License: MIT
Model Description: This is a conversion of the DeepSeek-R...
deepseek-r1-distill-qwen-14b-generic-cpu

This model is an optimized version of DeepSeek-R1-Distill-Qwen-14B to enable local inference on CPUs. This model uses RTN quantization.

Model Description

Developed by: Microsoft
Model type: ONNX
License: MIT
Model Description: This is a conversion of the DeepSeek-R1-Dis...
deepseek-r1-distill-qwen-14b-generic-gpu

This model is an optimized version of DeepSeek-R1-Distill-Qwen-14B to enable local inference on GPUs. This model uses RTN quantization.

Model Description

Developed by: Microsoft
Model type: ONNX
License: MIT
Model Description: This is a conversion of the DeepSeek-R1-Dis...
DeepSeek-R1-Distill-Qwen-14B-openvino-gpu

This model is an optimized version of DeepSeek-R1-Distill-Qwen-14B to enable local inference on Intel GPUs.

Model Description

Developed by: Microsoft
Model type: ONNX
License: MIT
Model Description: This is a conversion of the DeepSeek-R1-Distill-Qwen-14B for local infe...
deepseek-r1-distill-qwen-14b-qnn-npu

This model is an optimized version of DeepSeek-R1-Distill-Qwen-14B to enable local inference on QNN NPUs. This model uses QuaRot and GPTQ quantization.

Model Description

Developed by: Microsoft
Model type: ONNX
License: MIT
Model Description: This is a conversion of the...
deepseek-r1-distill-qwen-14b-trtrtx-gpu

This model is an optimized version of deepseek-r1-distill-qwen-14b to enable local inference on TensorRT-RTX GPUs.

Model Description

Developed by: Microsoft
Model type: ONNX
License: MIT
Model Description: This is a conversion of the deepseek-r1-distill-qwen-14b for loc...
deepseek-r1-distill-qwen-7b

This model is an optimized version of DeepSeek-R1-Distill-Qwen-7B to enable local inference. Optimized models are published here in ONNX format to run on CPU, GPU, and NPU across devices, including server platforms, Windows, Linux and Mac desktops, and mobile CPUs, with the precision best suited ...
deepseek-r1-distill-qwen-7b-cuda-gpu

This model is an optimized version of DeepSeek-R1-Distill-Qwen-7B to enable local inference on CUDA GPUs. This model uses RTN quantization.

Model Description

Developed by: Microsoft
Model type: ONNX
License: MIT
Model Description: This is a conversion of the DeepSeek-R1...
deepseek-r1-distill-qwen-7b-generic-cpu

This model is an optimized version of DeepSeek-R1-Distill-Qwen-7B to enable local inference on CPUs. This model uses RTN quantization.

Model Description

Developed by: Microsoft
Model type: ONNX
License: MIT
Model Description: This is a conversion of the DeepSeek-R1-Dist...
deepseek-r1-distill-qwen-7b-generic-gpu

This model is an optimized version of DeepSeek-R1-Distill-Qwen-7B to enable local inference on GPUs. This model uses RTN quantization.

Model Description

Developed by: Microsoft
Model type: ONNX
License: MIT
Model Description: This is a conversion of the DeepSeek-R1-Dist...
DeepSeek-R1-Distill-Qwen-7B-openvino-gpu

This model is an optimized version of DeepSeek-R1-Distill-Qwen-7B to enable local inference on Intel GPUs.

Model Description

Developed by: Microsoft
Model type: ONNX
License: MIT
Model Description: This is a conversion of the DeepSeek-R1-Distill-Qwen-7B for local infere...
DeepSeek-R1-Distill-Qwen-7B-openvino-npu

This model is an optimized version of DeepSeek-R1-Distill-Qwen-7B to enable local inference on Intel NPUs.

Model Description

Developed by: Microsoft
Model type: ONNX
License: MIT
Model Description: This is a conversion of the DeepSeek-R1-Distill-Qwen-7B for local infere...
deepseek-r1-distill-qwen-7b-qnn-npu

This model is an optimized version of DeepSeek-R1-Distill-Qwen-7B to enable local inference on QNN NPUs. This model uses QuaRot and GPTQ quantization.

Model Description

Developed by: Microsoft
Model type: ONNX
License: MIT
Model Description: This is a conversion of the ...
DeepSeek-R1-Distill-Qwen-7B-trtrtx-gpu

This model is an optimized version of DeepSeek-R1-Distill-Qwen-7B to enable local inference on TensorRT-RTX GPUs.

Model Description

Developed by: Microsoft
Model type: ONNX
License: MIT
Model Description: This is a conversion of the DeepSeek-R1-Distill-Qwen-7B for local...
DeepSeek-R1-Distill-Qwen-7B-vitis-npu

This model is an optimized version of DeepSeek-R1-Distill-Qwen-7B to enable local inference on AMD NPUs.

Model Description

Developed by: Microsoft
Model type: ONNX
License: MIT
Model Description: This is a conversion of the DeepSeek-R1-Distill-Qwen-7B for local inferenc...
deepset-minilm-uncased-squad2

Training Details

Hyperparameters

seed=42
batch_size = 12
n_epochs = 4
base_LM_model = "microsoft/MiniLM-L12-H384-uncased"
max_seq_len = 384
learning_rate = 4e-5
lr_schedule = LinearWarmup
warmup_proportion = 0.2
doc_stride=128
max_query_length=64
grad_acc_steps=4

Evaluation Res...

deepset-roberta-base-squad2

This is the roberta-base model, fine-tuned using the SQuAD2.0 dataset. It's been trained on question-answer pairs, including unanswerable questions, for the task of Question Answering.

Training Details

Hype...

deformable_detr_twostage_refine_r50_16x2_50e_coco

deformable_detr_twostage_refine_r50_16x2_50e_coco model is from OpenMMLab's MMDetection library. This model is reported to obtain <a href="https://github.com/open-mmlab/mmdetection/blob/e9cae2d0787cd5c2fc6165a6...
detect-defects

The "Detect Defects" is a model designed for meticulous examination of images. It operates by employing GPT-4 Turbo with Vision to compare a test image against a reference image. Each analysis focuses on identifying variances or anomalies, classifying them as defects. This methodical comparison e...
distilbert-base-cased

DistilBERT, a transformers model, is designed to be smaller and quicker than BERT. It underwent pretraining on the same dataset in a self-supervised manner, utilizing the BERT base model as a reference. This entails training solely on raw texts, without human annotation, thus enabling the utiliza...
distilbert-base-cased-distilled-squad

The DistilBERT model was proposed in the blog post Smaller, faster, cheaper, lighter: Introducing DistilBERT, adistilled version of BERT, and the paper [DistilBERT, adistilled version of BERT: smaller, faster, cheaper and lighter](https://...
distilbert-base-uncased

DistilBERT is a transformers model, smaller and faster than BERT, which was pretrained on the same corpus in a self-supervised fashion, using the BERT base model as a teacher. This means it was pretrained on the raw texts only, with no humans labelling them in any way (which is why it can use lot...
distilbert-base-uncased-distilled-squad

DistilBERT model was proposed in the blog post Smaller, faster, cheaper, lighter: Introducing DistilBERT, adistilled version of BERT, and the paper [DistilBERT, adistilled version of BERT: smaller, faster, cheaper and lighter](https://arxi...
distilbert-base-uncased-finetuned-sst-2-english

DistilBERT base uncased finetuned SST-2 model is a fine-tune checkpoint of DistilBERT-base-uncased, fine-tuned on SST-2. This model reaches an accuracy of 91.3 on the dev set (for comparison, Bert bert-base-uncased version reaches an accuracy ...
distilgpt2

DistilGPT2 (short for Distilled-GPT2) is an English-language model pre-trained with the supervision of the 124 million parameter version of GPT-2. DistilGPT2, which has 82 million parameters, was developed using knowledge distillation and was designed to be a faster, li...
distilroberta-base

distilroberta-base is a distilled version of the RoBERTa-base model. It follows the same training procedure as DistilBERT. The code for the distillation process can be found [here](https://github.com/hugg...
ECI-Evaluator

Definition

Election Critical Information (ECI) refers to any content related to elections, including voting processes, candidate information, and election results. The ECI evaluator uses the Azure AI Safety Evaluation service to assess the generated responses for ECI without a disclaimer.

#...

F1Score-Evaluator

| | | | -- | -- | | Score range | Float [0-1]: higher means better quality. | | What is this metric? | F1 score measures the similarity by shared tokens between the generated text and the ground truth, focusing on both precision and recall. | | How does it work? | The F1-score computes the ratio...
facebook-bart-large-cnn

BART is a transformer model that combines a bidirectional encoder similar to BERT with an autoregressive decoder akin to GPT. It is trained using two main techniques: (1) corrupting text with a chosen noising function, and (2) training a model to reconstruct the original text.

When fine-tuned fo...

facebook-deit-base-patch16-224

DeiT (Data-efficient image Transformers) is an image transformer that do not require very large amounts of data for training. This is achieved through a novel distillation procedure using teacher-student strategy, which results in high throughput and accuracy. DeiT is pre-trained and fine-tuned o...
facebook-dinov2-base-imagenet1k-1-layer

Vision Transformer (base-sized model) trained using DINOv2

Vision Transformer (ViT) model trained using the DINOv2 method. It was introduced in the paper DINOv2: Learning Robust Visual Features without Supervision by Oquab et al. and first released...

Facebook-DinoV2-Image-Embeddings-ViT-Base

The Vision Transformer (ViT) is a transformer encoder model (BERT-like) pretrained on a large collection of images in a self-supervised fashion with the DinoV2 method.

Images are presented to the model as a sequence of fixed-size patches, which are linearly embedded. One also adds a [CLS] token ...

Facebook-DinoV2-Image-Embeddings-ViT-Giant

The Vision Transformer (ViT) is a transformer encoder model (BERT-like) pretrained on a large collection of images in a self-supervised fashion with the DinoV2 method.

Images are presented to the model as a sequence of fixed-size patches, which are linearly embedded. One also adds a [CLS] token ...

facebook-sam-vit-base

The Segment Anything Model (SAM) produces high quality object masks from input prompts such as points or boxes, and it can be used to generate masks for all objects in an image. It has been trained on a dataset of 11 million images and 1.1 bi...
facebook-sam-vit-huge

The Segment Anything Model (SAM) produces high quality object masks from input prompts such as points or boxes, and it can be used to generate masks for all objects in an image. It has been trained on a dataset of 11 million images and 1.1 bi...
facebook-sam-vit-large

The Segment Anything Model (SAM) produces high quality object masks from input prompts such as points or boxes, and it can be used to generate masks for all objects in an image. It has been trained on a dataset of 11 million images and 1.1 bi...
financial-reports-analysis

Description

The adapted AI model for financial reports analysis (preview) is a state-of-the-art small language model (SLM) based on the Phi-3-small-128k architecture, designed specifically for analyzing financial reports. It has been fine-tuned on a few hundred million tokens derived ...

financial-reports-analysis-v2

The Adapted AI model for financial reports analysis (Phi-4, preview) is a state-of-the-art small language model (SLM) based on the Phi-4 architecture, de...

finiteautomata-bertweet-base-sentiment-analysis

Repository: https://github.com/finiteautomata/pysentimiento/

Model trained with SemEval 2017 corpus (around ~40k tweets). Base model is BERTweet, a RoBERTa model trained on English tweets.

Uses `POS...

Fluency-Evaluator

| | | | -- | -- | | Score range | Integer [1-5]: 1 is the lowest quality and 5 is the highest quality. | | What is this metric? | Fluency measures the effectiveness and clarity of written communication, focusing on grammatical accuracy, vocabulary range, sentence complexity, coherence, and overa...
Gleu-Score-Evaluator

| | | | -- | -- | | Score range | Float [0-1]: higher means better quality. | | What is this metric? | The GLEU (Google-BLEU) score measures the similarity by shared n-grams between the generated text and ground truth, similar to the BLEU score, focusing on both precision and recall. But it addre...
google-vit-base-patch16-224

The Vision Transformer (ViT) model, as introduced in the paper "An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale" by Dosovitskiy et al., underwent pre-training on ImageNet-21k with a resolution of 224x224. Su...
gpt-oss-20b-generic-cpu

This model is an optimized version of gpt-oss-20b to enable local inference on CPUs. This model uses RTN quantization.

Model Description

Developed by: Microsoft
Model type: ONNX
License: Apache-2.0
License Description: Use of this model is subject to the terms of the Ap...
gpt-oss-20b-generic-gpu

This model is an optimized version of gpt-oss-20b to enable local inference on GPUs. This model uses RTN quantization.

Model Description

Developed by: Microsoft
Model type: ONNX
License: Apache-2.0
License Description: Use of this model is subject to the terms of the Ap...
gpt2

GPT-2 is a transformers model pretrained on a very large corpus of English data in a self-supervised fashion. This means it was pretrained on the raw texts only, with no humans labelling them in any way (which is why it can use lots of publicly available data) with an automatic process to generat...
gpt2-large

GPT-2 Large is the 774M parameter version of GPT-2, a transformer-based language model created and released by OpenAI. The model is a pretrained model on English language using a causal language modeling (CLM)

Training Details

See the [associated paper](https://d4mucfpksywv.cloudfront.net/bet...

gpt2-medium

GPT-2 Medium is the 355M parameter version of GPT-2, a transformer-based language model created and released by OpenAI. The model is a pretrained model on English language using a causal language modeling (CLM) objective.

Training Details

See the [associated paper](https://d4mucfpksywv.c...

Groundedness-Evaluator

| | | | -- | -- | | Score range | Integer [1-5]: 1 is the lowest quality and 5 is the highest quality. | | What is this metric? | Groundedness measures how well the generated response aligns with the given context in a retrieval-augmented generation scenario, focusing on its relevance and accura...
Groundedness-Pro-Evaluator

| | | | -- | -- | | Score range | Boolean: [true, false]: false if response is ungrounded and true if it's grounded. | | What is this metric? | Groundedness Pro (powered by Azure AI Content Safety) detects whether the generated text response is consistent or accurate with respect to the given ...
Hate-and-Unfairness-Evaluator

Definition

Hateful and unfair content refers to any language pertaining to hate toward or unfair representations of individuals and social groups along factors including but not limited to race, ethnicity, nationality, gender, sexual orientation, religion, immigration status, ability, persona...

hibou-b

Hibou-B is a foundational vision transformer developed for digital pathology, designed to generate high-quality feature representations from histology image patches. These representations can be leveraged for a range of downstream tasks, including classification, segmentation, and detection.

Bui...

hibou-l

Hibou-L is a foundational vision transformer developed for digital pathology, designed to generate high-quality feature representations from histology image patches. These representations can be leveraged for a range of downstream tasks, including classification, segmentation, and detection.

Bui...

how-to-use-functions-with-GPT-chat-API

The "Use Functions with Chat Models" is a chat model illustrates how to employ the LLM tool's Chat API with external functions, thereby expanding the capabilities of GPT models. The Chat Completion API includes an optional 'functions' parameter, which can be used to stipulate function specificati...
Indirect-Attack-Evaluator

Definition

Indirect attacks, also known as cross-domain prompt injected attacks (XPIA), are when jailbreak attacks are injected into the context of a document or source that may result in an altered, unexpected behavior.

Indirect attacks evaluations are broken down into three subcategories: ...

Jean-Baptiste-camembert-ner

Summary: camembert-ner is a NER model fine-tuned from camemBERT on the Wikiner-fr dataset and was validated on email/chat data. It shows better performance on entities that do not start with an uppercase. The model has four classes: O, MISC, PER, ORG and LOC. The model can be loaded using Hugging...
Llama-2-13b

Meta has developed and publicly released the Llama 2 family of large language models (LLMs), a collection of pretrained and fine-tuned generative text models ranging in scale from 7 billion to 70 billion parameters. Our fine-tuned LLMs, called Llama-2-Chat, are optimized for dialogue use cases. L...
Llama-2-13b-chat

Meta has developed and publicly released the Llama 2 family of large language models (LLMs), a collection of pretrained and fine-tuned generative text models ranging in scale from 7 billion to 70 billion parameters. Our fine-tuned LLMs, called Llama-2-Chat, are optimized for dialogue use cases. L...
Llama-2-70b

Meta has developed and publicly released the Llama 2 family of large language models (LLMs), a collection of pretrained and fine-tuned generative text models ranging in scale from 7 billion to 70 billion parameters. Our fine-tuned LLMs, called Llama-2-Chat, are optimized for dialogue use cases. L...
Llama-2-70b-chat

Meta has developed and publicly released the Llama 2 family of large language models (LLMs), a collection of pretrained and fine-tuned generative text models ranging in scale from 7 billion to 70 billion parameters. Our fine-tuned LLMs, called Llama-2-Chat, are optimized for dialogue use cases. L...
Llama-2-7b

Meta has developed and publicly released the Llama 2 family of large language models (LLMs), a collection of pretrained and fine-tuned generative text models ranging in scale from 7 billion to 70 billion parameters. Our fine-tuned LLMs, called Llama-2-Chat, are optimized for dialogue use cases. L...
Llama-2-7b-chat

Meta has developed and publicly released the Llama 2 family of large language models (LLMs), a collection of pretrained and fine-tuned generative text models ranging in scale from 7 billion to 70 billion parameters. Our fine-tuned LLMs, called Llama-2-Chat, are optimized for dialogue use cases. L...
mask_rcnn_swin-t-p4-w7_fpn_1x_coco

This paper presents a new vision Transformer, called Swin Transformer, that capably serves as a general-purpose backbone for computer vision. Challenges in adapting Transformer from language to vision arise from differences between the two domains, such as large variations in the scale of visual ...
MatterSim

MatterSim is a large-scale pretrained deep learning model for efficient materials emulations and property predictions.

MatterSim is a deep learning model for general materials design tasks. It supports efficient atomistic simulations at first-principles level and accurate prediction of broad mat...

MedImageInsight

Most medical imaging AI today is narrowly built to detect a small set of individual findings on a single modality like chest X-rays. This training approach is data- and computationally inefficient, requiring ~6-12 months per finding1, and often fails to generalize in real world environments. By f...
MedImageInsight-onnx

Most medical imaging AI today is narrowly built to detect a small set of individual findings on a single modality like chest X-rays. This training approach is data- and computationally inefficient, requiring ~6-12 months per finding[1], and often fails to generalize in real world environments. By...
MedImageParse

Biomedical image analysis is fundamental for biomedical discovery in cell biology, pathology, radiology, and many other biomedical domains. MedImageParse is a biomedical foundation model for imaging parsing that can jointly conduct segmentation, detection, and recognition across 9 imaging modalit...
MedImageParse3D

Biomedical image analysis is fundamental for biomedical discovery in cell biology, pathology, radiology, and many other biomedical domains. 3D medical images such...

MedSAM2

MedSAM2 is a breakthrough medical image segmentation foundation model that addresses the critical need for efficient and accurate segmentation in precision medicine. Built upon the Segment Anything Model (SAM) 2.1 architecture, it uniquely bridges the gap between 2D and 3D medical image segmentat...
MedVAE-8-4-2d

MedVAE is a family of large-scale, generalizable 2D and 3D variational autoencoders (VAEs) designed to address critical efficiency and storage challenges in medical imaging. Trained on over one million images across multiple modalities and anatomical regions, MedVAE excels at encoding high-resolu...
Meteor-Score-Evaluator

| | | | -- | -- | | Score range | Float [0-1]: higher means better quality. | | What is this metric? | METEOR score measures the similarity by shared n-grams between the generated text and the ground truth, similar to the BLEU score, focusing on precision and recall. But it addresses limitations ...
microsoft-beit-base-patch16-224-pt22k-ft22k

BEiT (Bidirectional Encoder representation from Image Transformers) is a vision transformer(ViT) pre-trained with Masked Image Modeling(MIM), which is a self-supervised pre-training inspired by BERT from NLP, followed by Intermediate fine-tuning using ImageNet-22k dataset. It is then fine-tuned f...
microsoft-deberta-base

DeBERTa (Decoding-enhanced BERT with Disentangled Attention) improves the BERT and RoBERTa models using disentangled attention and enhanced mask decoder. With those two improvements, DeBERTa out perform RoBERTa on a majority of NLU tasks with 80GB training data...
microsoft-deberta-base-mnli

DeBERTa (Decoding-enhanced BERT with Disentangled Attention) improves the BERT and RoBERTa models using disentangled attention and enhanced mask decoder. It outperforms BERT and RoBERTa on majority of NLU tasks with 80GB training data.

Please check the [offi...

microsoft-deberta-large

DeBERTa (Decoding-enhanced BERT with Disentangled Attention) improves the BERT and RoBERTa models using disentangled attention and enhanced mask decoder. With those two improvements, DeBERTa out perform RoBERTa on a majority of NLU tasks with 80GB training data...
microsoft-deberta-large-mnli

DeBERTa (Decoding-enhanced BERT with Disentangled Attention) improves the BERT and RoBERTa models using disentangled attention and enhanced mask decoder. It outperforms BERT and RoBERTa on majority of NLU tasks with 80GB training data.

Please check the [offi...

microsoft-deberta-xlarge

DeBERTa (Decoding-enhanced BERT with Disentangled Attention) improves the BERT and RoBERTa models using disentangled attention and enhanced mask decoder. With those two improvements, DeBERTa out perform RoBERTa on a majority of NLU tasks with 80GB training data...
microsoft-llava-med-v1.5-mistral-7b

LLaVA-Med v1.5, using mistralai/Mistral-7B-Instruct-v0.2 as LLM for a better commercial license

Large Language and Vision Assistant for bioMedicine (i.e., “LLaVA-Med”) is a large language and vision model trained using a curriculum l...

microsoft-Orca-2-13b

Orca 2 is a finetuned version of LLAMA-2. Orca 2’s training data is a synthetic dataset that was created to enhance the small model’s reasoning abilities. All synthetic training data was moderated using the Microsoft Azure content filters. More details about the model can be found in the [Orca 2 ...
microsoft-Orca-2-7b

Orca 2 is a finetuned version of LLAMA-2. Orca 2’s training data is a synthetic dataset that was created to enhance the small model’s reasoning abilities. All synthetic training data was moderated using the Microsoft Azure content filters. More details about the model can be found in the [Orca 2 ...
microsoft-phi-1-5

Microsoft Phi-1.5

Phi-1.5 is a Transformer-based language model with 1.3 billion parameters. It was trained on a combination of data sources, including an additional source of NLP synthetic texts. Phi-1.5 performs exceptionally well on benchmarks testing common sense, language understandi...

microsoft-phi-2

Microsoft Phi-2

The phi-2 is a language model with 2.7 billion parameters. The phi-2 model was trained using the same data sources as phi-1, augmented with a new data source that consists of various NLP synthetic texts and filtered websites (for safety and educational value). When assesse...

microsoft-rad-dino

Model Description

Model card for RAD-DINO

Model description

RAD-DINO is a vision transformer model trained to encode chest X-rays using the self-supervised learning method DINOv2.

RAD-DINO is described in detail in [RAD-DINO: Exploring Sca...

microsoft-swinv2-base-patch4-window12-192-22k

The Swin Transformer V2 model is a type of Vision Transformer, pre-trained on ImageNet-21k with a resolution of 192x192, is introduced in the research-paper titled "Swin Transformer V2: Scaling Up Capacity and Resolution" authored by ...
ministral-3-3b-instruct-2512

This model is an optimized version of Ministral-3-3B-Instruct-2512 for local inference. Optimized models are published here in ONNX format to run on CPU and GPU across devices, including server platforms, Windows, Linux and Mac desktops, and mobile CPUs, with the precision best suited to each of ...
ministral-3-3b-instruct-2512-cuda-gpu

This model is an optimized version of Ministral-3-3B-Instruct-2512 for local inference on CUDA GPUs. This optimized model is published here in ONNX format to run on CUDA-capable GPU devices, with the precision best suited to this target.

ONNX Models

Here are some of the optimized configuration...

ministral-3-3b-instruct-2512-generic-cpu

This model is an optimized version of Ministral-3-3B-Instruct-2512 for local inference on CPU and mobile devices. This optimized model is published here in ONNX format to run on CPU and mobile targets, with the precision best suited to this target.

ONNX Models

Here are some of the optimized co...

ministral-3-3b-instruct-2512-generic-gpu

This model is an optimized version of Ministral-3-3B-Instruct-2512 for local inference on WebGPU-capable devices. This optimized model is published here in ONNX format to run with the WebGPU execution provider, with the precision best suited to this target.

ONNX Models

Here are some of the opt...

Mistral-7B-Instruct-v0-2-openvino-gpu

This model is an optimized version of Mistral-7B-Instruct-v0.2 to enable local inference on Intel GPUs.

Model Description

Developed by: Microsoft
Model type: ONNX
License: apache-2.0
Model Description: This is a conversion of the Mistral-7B-Instruct-v0.2 for local infer...
Mistral-7B-Instruct-v0-2-openvino-npu

This model is an optimized version of Mistral-7B-Instruct-v0.2 to enable local inference on Intel NPUs.

Model Description

Developed by: Microsoft
Model type: ONNX
License: apache-2.0
Model Description: This is a conversion of the Mistral-7B-Instruct-v0.2 for local infer...
Mistral-7B-Instruct-v0-2-vitis-npu

This model is an optimized version of Mistral-7B-Instruct-v0.2 to enable local inference on AMD NPUs.

Model Description

Developed by: Microsoft
Model type: ONNX
License: apache-2.0
Model Description: This is a conversion of the Mistral-7B-Instruct-v0.2 for local inferen...
mistral-community-Mixtral-8x22B-v0-1

The Mixtral-8x22B Large Language Model (LLM) is a pretrained generative Sparse Mixture of Experts.

Mixtral-8x22B-v0.1 is a pretrained base model and therefore does not have any moderation mechanisms.

Evaluation Results

[Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/H...

mistral-nemo-12b-instruct

This model is an optimized version of Mistral-Nemo-Instruct-2407 for local inference. Optimized models are published here in ONNX format to run on CPU and GPU across devices, including server platforms, Windows, Linux and Mac desktops, and mobile CPUs, with the precision best suited to each of th...
mistral-nemo-12b-instruct-cuda-gpu

This model is an optimized version of Mistral-Nemo-Instruct-2407 to enable local inference on CUDA GPUs. This model uses GPTQ quantization.

Model Description

Developed by: Microsoft
Model type: ONNX
License: apache-2.0
Model Description: This is a conversion of the Mist...
mistral-nemo-12b-instruct-generic-cpu

This model is an optimized version of Mistral-Nemo-Instruct-2407 to enable local inference on CPUs. This model uses GPTQ quantization.

Model Description

Developed by: Microsoft
Model type: ONNX
License: apache-2.0
Model Description: This is a conversion of the Mistral-N...
mistral-nemo-12b-instruct-generic-gpu

This model is an optimized version of Mistral-Nemo-Instruct-2407 to enable local inference on GPUs. This model uses GPTQ quantization.

Model Description

Developed by: Microsoft
Model type: ONNX
License: apache-2.0
Model Description: This is a conversion of the Mistral-N...
mistralai-Mistral-7B-Instruct-v0-2

The Mistral-7B-Instruct-v0.2 Large Language Model (LLM) is an instruct fine-tuned version of the Mistral-7B-v0.2.

Mistral-7B-v0.2 has the following changes compared to Mistral-7B-v0.1:

32k context window (vs 8k context in v0.1)
Rope-theta = 1e6
No Sliding-Window Attention

For full details...

mistralai-Mistral-7B-Instruct-v0-2-cuda-gpu

This model is an optimized version of Mistral-7B-Instruct-v0.2 to enable local inference on CUDA GPUs. This model uses RTN quantization.

Model Description

Developed by: Microsoft
Model type: ONNX
License: apache-2.0
Model Description: This is a conversion of the Mistral...
mistralai-Mistral-7B-Instruct-v0-2-generic-cpu

This model is an optimized version of Mistral-7B-Instruct-v0.2 to enable local inference on CPUs. This model uses RTN quantization.

Model Description

Developed by: Microsoft
Model type: apache-2.0
License: MIT
Model Description: This is a conversion of the Mistral-7B-In...
mistralai-Mistral-7B-Instruct-v0-2-generic-gpu

This model is an optimized version of Mistral-7B-Instruct-v0.2 to enable local inference on GPUs. This model uses RTN quantization.

Model Description

Developed by: Microsoft
Model type: ONNX
License: apache-2.0
Model Description: This is a conversion of the Mistral-7B-I...
mistralai-Mistral-7B-Instruct-v0-2-trtrtx-gpu

This model is an optimized version of Mistral-7B-Instruct-v0.2 to enable local inference on TensorRT-RTX GPUs.

Model Description

Developed by: Microsoft
Model type: ONNX
License: apache-2.0
Model Description: This is a conversion of the Mistral-7B-Instruct-v0.2 for loc...
mistralai-Mistral-7B-Instruct-v0-3

The Mistral-7B-Instruct-v0.3 Large Language Model (LLM) is an instruct fine-tuned version of the Mistral-7B-v0.3.

Mistral-7B-v0.3 has the following changes compared to Mistral-7B-v0.2

Extended vocabulary to 32768 ...
mistralai-Mistral-7B-Instruct-v01

Model Details

The Mistral-7B-Instruct-v0.1 Large Language Model (LLM) is a instruct fine-tuned version of the Mistral-7B-v0.1 generative text model using a variety of publicly available conversation datasets.

For full details of this mod...

mistralai-Mistral-7B-v01

Model Details

The Mistral-7B-v0.1 Large Language Model (LLM) is a pretrained generative text model with 7 billion parameters. Mistral-7B-v0.1 outperforms Llama 2 13B on all benchmarks tested.

For full details of this model please read paper and [releas...

mistralai-Mixtral-8x22B-Instruct-v0-1

The Mixtral-8x22B-Instruct-v0.1 Large Language Model (LLM) is an instruct fine-tuned version of the Mixtral-8x22B-v0.1.

Inference samples

Inference type	Python sample (Notebook)	CLI with YAML
Real time	<a href="https://aka.ms/...

mistralai-Mixtral-8x22B-v0-1

The Mixtral-8x22B Large Language Model (LLM) is a pretrained generative Sparse Mixture of Experts.

Mixtral-8x22B-v0.1 is a pretrained base model and therefore does not have any moderation mechanisms.

Evaluation Results

[Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/H...

mistralai-Mixtral-8x7B-Instruct-v01

The Mixtral-8x7B Large Language Model (LLM) is a pretrained generative Sparse Mixture of Experts. The Mixtral-8x7B outperforms Llama 2 70B on most benchmarks with 6x faster inference.

Mixtral-8x7B-v0.1 is a decoder-only model with 8 distinct groups or the "experts". At every layer, for every tok...

mistralai-Mixtral-8x7B-v01

Model Details

The Mixtral-8x7B-v0.1 Large Language Model (LLM) is a pretrained generative text model with 7 billion parameters. Mixtral-8x7B-v0.1 outperforms Llama 2 70B on most benchmarks with 6x faster inference.

For full details of this model please read [release blog post](https://mi...

mmd-3x-deformable-detr_refine_twostage_r50_16xb2-50e_coco

deformable-detr_refine_twostage_r50_16xb2-50e_coco model is from OpenMMLab's MMDetection library. DETR has been recently proposed to eliminate the need for many hand-designed components in object detection while...
mmd-3x-mask-rcnn_swin-t-p4-w7_fpn_1x_coco

mask-rcnn_swin-t-p4-w7_fpn_1x_coco model is from OpenMMLab's MMDetection library. This paper presents a new vision Transformer, called Swin Transformer, that capably serves as a general-purpose backbone for comp...
mmd-3x-rtmdet-ins_x_8xb16-300e_coco

rtmdet-ins_x_8xb16-300e_coco model is from OpenMMLab's MMDetection library. In this paper, we aim to design an efficient real-time object detector that exceeds the YOLO series and is easily extensible for many o...
mmd-3x-sparse-rcnn_r101_fpn_300-proposals_crop-ms-480-800-3x_coco

sparse-rcnn_r101_fpn_300-proposals_crop-ms-480-800-3x_coco model is from OpenMMLab's MMDetection library. We present Sparse R-CNN, a purely sparse method for object detection in images. Existing works on object ...
mmd-3x-sparse-rcnn_r50_fpn_300-proposals_crop-ms-480-800-3x_coco

sparse-rcnn_r50_fpn_300-proposals_crop-ms-480-800-3x_coco model is from OpenMMLab's MMDetection library. We present Sparse R-CNN, a purely sparse method for object detection in images. Existing works on object d...
mmd-3x-vfnet_r50-mdconv-c3-c5_fpn_ms-2x_coco

vfnet_r50-mdconv-c3-c5_fpn_ms-2x_coco model is from OpenMMLab's MMDetection library. Accurately ranking the vast number of candidate detections is crucial for dense object detectors to achieve high performance. ...
mmd-3x-vfnet_x101-64x4d-mdconv-c3-c5_fpn_ms-2x_coco

vfnet_x101-64x4d-mdconv-c3-c5_fpn_ms-2x_coco model is from OpenMMLab's MMDetection library. Accurately ranking the vast number of candidate detections is crucial for dense object detectors to achieve high perfor...
mmd-3x-yolof_r50_c5_8x8_1x_coco

yolof_r50_c5_8x8_1x_coco model is from OpenMMLab's MMDetection library. This paper revisits feature pyramids networks (FPN) for one-stage detectors and points out that the success of FPN is due to its divide-an...
mmeft

Multimodal Early Fusion Transformer (MMEFT) is a transformer-based model tailored for processing both structured and unstructured data.

It can be used for multi-class and multi-label multimodal classification tasks, and is capable of handling datasets with features from diverse modes, includ...

multi-index-rerank-qna

This "Multi-Source Rerank Q&A" demonstrates Q&A application, enabled by reranking data from multiple sources and powered by GPT. It utilizes indexed files and the rerank tool from Azure Machine Learning to provide grounded answers. You can ask a wide range of questions and receive responses based...
nemotron-3.5-asr-streaming-0.6b

This model is an optimized version of nemotron-3.5-asr-streaming-0.6b for local inference. Optimized models are published here in ONNX format to run on CPU across devices, including server platforms, Windows, Linux and Mac desktops, and mobile CPUs.

Model Description

Developed by: Micros...
nemotron-3.5-asr-streaming-0.6b-cuda-gpu

This model is an optimized version of nemotron-3.5-asr-streaming-0.6b to enable local inference on GPUs. This model uses RTN quantization.

Model Description

Developed by: Microsoft
Model type: ONNX
License: MIT
Model Description: This is an optimized version of the nem...
nemotron-3.5-asr-streaming-0.6b-generic-cpu

This model is an optimized version of nemotron-3.5-asr-streaming-0.6b to enable local inference on CPUs. This model uses RTN quantization.

Model Description

Developed by: Microsoft
Model type: ONNX
License: MIT
Model Description: This is an optimized version of the nem...
nemotron-speech-streaming-en-0.6b

This model is an optimized version of nemotron-speech-streaming-en-0.6b for local inference. Optimized models are published here in ONNX format to run on CPU across devices, including server platforms, Windows, Linux and Mac desktops, and mobile CPUs.

Model Description

Developed by: Micr...
nemotron-speech-streaming-en-0.6b-cuda-gpu

This model is an optimized version of nemotron-speech-streaming-en-0.6b to enable local inference on GPUs. This model uses RTN quantization.

Model Description

Developed by: Microsoft
Model type: ONNX
License: MIT
Model Description: This is an optimized version of the n...
nemotron-speech-streaming-en-0.6b-generic-cpu

This model is an optimized version of nemotron-speech-streaming-en-0.6b to enable local inference on CPUs. This model uses RTN quantization.

Model Description

Developed by: Microsoft
Model type: ONNX
License: MIT
Model Description: This is an optimized version of the n...
nemotron-speech-streaming-es-0.6b-ft-cuda-gpu

This model is a fine-tuned and optimized derivative of nemotron-speech-streaming-en-0.6b, adapted for Spanish speech recognition. The model is optimized for local inference on GPUs.

Model Description

Developed by: Microsoft
Model type: ONNX
License: MIT
**Model Descri...
nemotron-speech-streaming-es-0.6b-ft-generic-cpu

This model is a fine-tuned and optimized derivative of nemotron-speech-streaming-en-0.6b, adapted for Spanish speech recognition. The model is optimized for local inference.

Model Description

Developed by: Microsoft
Model type: ONNX
License: MIT
Model Description:...
ocsort_yolox_x_crowdhuman_mot17-private-half

ocsort_yolox_x_crowdhuman_mot17-private-half model is from OpenMMLab's MMTracking library. Multi-Object Tracking (MOT) has rapidly progressed with the development of object detection and re-identification. Howev...
olmo-3-7b-instruct

This model is an optimized version of Olmo-3-7B-Instruct for local inference. Optimized models are published here in ONNX format to run on CPU and GPU across devices, including server platforms, Windows, Linux and Mac desktops, and mobile CPUs, with the precision best suited to each of these targ...
olmo-3-7b-instruct-cuda-gpu

This model is an optimized version of Olmo-3-7B-Instruct to enable local inference on CUDA GPUs. This model uses GPTQ quantization.

Model Description

Developed by: Microsoft
Model type: ONNX
License: apache-2.0
Model Description: This is a conversion of the Olmo-3-7B-In...
olmo-3-7b-instruct-generic-cpu

This model is an optimized version of Olmo-3-7B-Instruct to enable local inference on CPUs. This model uses GPTQ quantization.

Model Description

Developed by: Microsoft
Model type: ONNX
License: apache-2.0
Model Description: This is a conversion of the Olmo-3-7B-Instruc...
olmo-3-7b-instruct-generic-gpu

This model is an optimized version of Olmo-3-7B-Instruct to enable local inference on GPUs. This model uses GPTQ quantization.

Model Description

Developed by: Microsoft
Model type: ONNX
License: apache-2.0
Model Description: This is a conversion of the Olmo-3-7B-Instruc...
OpenAI-CLIP-Image-Text-Embeddings-vit-base-patch32

OpenAI's CLIP (Contrastive Language–Image Pre-training) model was designed to investigate the factors that contribute to the robustness of computer vision tasks. It can seamlessly adapt to a range of image classification tasks without requiring specific training for each, demonstrating efficiency...
OpenAI-CLIP-Image-Text-Embeddings-ViT-Large-Patch14-336

The CLIP model was developed by researchers at OpenAI to learn about what contributes to robustness in computer vision tasks. The model was also developed to test the ability of models to generalize to arbitrary image classification tasks in a zero-shot manner. It was not developed for general ...
openai-clip-vit-base-patch32

OpenAI's CLIP (Contrastive Language–Image Pre-training) model was designed to investigate the factors that contribute to the robustness of computer vision tasks. It can seamlessly adapt to a range of image classification tasks without requiring specific training for each, demonstrating efficiency...
openai-clip-vit-large-patch14

OpenAI's CLIP (Contrastive Language–Image Pre-training) model was designed to investigate the factors that contribute to the robustness of computer vision tasks. It can seamlessly adapt to a range of image classification tasks without requiring specific training for each, demonstrating efficiency...
openai-whisper-base

This model is an optimized version of Whisper Base for local inference. Optimized models are published here in ONNX format to run on CPU, GPU, and NPU across devices, including server platforms, desktops, and mobile, with the precision best suited to each of these targets.

Review the [original...

openai-whisper-base-cuda-gpu

This model is an optimized version of Whisper Base for local inference. Optimized models are published here in ONNX format to run on CUDA devices, including server platforms, desktops, and mobile, with the precision best suited to each of these targets.

Review the [original model card](https:/...

openai-whisper-base-generic-cpu

This model is an optimized version of Whisper Base for local inference. Optimized models are published here in ONNX format to run on CPU devices, including server platforms, desktops, and mobile, with the precision best suited to each of these targets.

Review the [original model card](https://...

openai-whisper-large

Whisper is an OpenAI pre-trained speech recognition model with potential applications for ASR solutions for developers. However, due to weak supervision and large-scale noisy data, it should be used with caution in high-risk domains. The model has been trained on 680k hours of audio data represen...
openai-whisper-large-v3

Whisper is a model that can recognize and translate speech using deep learning. It was trained on a large amount of data from different sources and languages. Whisper models can handle various tasks and domains without needing to adjust the model.

Whisper large-v3 is similar to the previous larg...

openai-whisper-large-v3-turbo

This model is an optimized version of Whisper Large V3 Turbo for local inference. Optimized models are published here in ONNX format to run on CPU, GPU, and NPU across devices, including server platforms, desktops, and mobile, with the precision best suited to each of these targets.

Review the...

openai-whisper-large-v3-turbo-cuda-gpu

Whisper Large V3 Turbo is an advanced speech recognition model, optimized for high-performance GPU inference. It is suitable for automatic speech recognition (ASR) tasks in various domains, leveraging large-scale training data for robust multilingual transcription. This model is an optimized vers...
openai-whisper-large-v3-turbo-generic-cpu

Whisper Large V3 Turbo is an advanced speech recognition model, optimized for high-performance CPU inference. It is suitable for automatic speech recognition (ASR) tasks in various domains, leveraging large-scale training data for robust multilingual transcription. This model is designed for scen...
openai-whisper-medium

This model is an optimized version of Whisper Medium for local inference. Optimized models are published here in ONNX format to run on CPU, GPU, and NPU across devices, including server platforms, desktops, and mobile, with the precision best suited to each of these targets.

Review the [origin...

openai-whisper-medium-cuda-gpu

This model is an optimized version of Whisper Medium for local inference. Optimized models are published here in ONNX format to run on CUDA devices, including server platforms, desktops, and mobile, with the precision best suited to each of these targets.

Review the [original model card](https...

openai-whisper-medium-generic-cpu

This model is an optimized version of Whisper Medium for local inference. Optimized models are published here in ONNX format to run on CPU devices, including server platforms, desktops, and mobile, with the precision best suited to each of these targets.

Review the [original model card](https:...

openai-whisper-small

This model is an optimized version of Whisper Small for local inference. Optimized models are published here in ONNX format to run on CPU, GPU, and NPU across devices, including server platforms, desktops, and mobile, with the precision best suited to each of these targets.

Review the [origina...

openai-whisper-small-cuda-gpu

This model is an optimized version of Whisper Small for local inference. Optimized models are published here in ONNX format to run on CUDA devices, including server platforms, desktops, and mobile, with the precision best suited to each of these targets.

Review the [original model card](https:...

openai-whisper-small-generic-cpu

This model is an optimized version of Whisper Small for local inference. Optimized models are published here in ONNX format to run on CPU devices, including server platforms, desktops, and mobile, with the precision best suited to each of these targets.

Review the [original model card](https:/...

openai-whisper-tiny

This model is an optimized version of Whisper Tiny for local inference. Optimized models are published here in ONNX format to run on CPU, GPU, and NPU across devices, including server platforms, desktops, and mobile, with the precision best suited to each of these targets.

Review the [original...

openai-whisper-tiny-cuda-gpu

This model is an optimized version of Whisper Tiny for local inference. Optimized models are published here in ONNX format to run on CUDA devices, including server platforms, desktops, and mobile, with the precision best suited to each of these targets.

Review the [original model card](https:/...

openai-whisper-tiny-generic-cpu

This model is an optimized version of Whisper Tiny for local inference. Optimized models are published here in ONNX format to run on CPU devices, including server platforms, desktops, and mobile, with the precision best suited to each of these targets.

Review the [original model card](https://...

parakeet-tdt-0.6b-v2

This model is an optimized version of parakeet-tdt-0.6b-v2 for local inference. Optimized models are published here in ONNX format to run on CPU and CUDA GPU across devices, including server platforms, Windows, Linux and Mac desktops, and mobile CPUs.

Model Description

Developed by: Micr...
parakeet-tdt-0.6b-v2-cuda-gpu

This model is an optimized version of parakeet-tdt-0.6b-v2 for local inference on CUDA GPUs. This optimized model is published here in ONNX format to run on CUDA-capable GPU devices, with the precision best suited to this target.

Model Description

Developed by: Microsoft
**Model type:*...
parakeet-tdt-0.6b-v2-generic-cpu

This model is an optimized version of parakeet-tdt-0.6b-v2 for local inference. Optimized models are published here in ONNX format to run on CPU across devices, including server platforms, Windows, Linux and Mac desktops, and mobile CPUs.

Model Description

Developed by: Microsoft
**Mod...
Phi-3-medium-128k-instruct

The Phi-3-Medium-128K-Instruct is a 14B parameters, lightweight, state-of-the-art open model trained with the Phi-3 datasets that includes both synthetic data and the filtered publicly available websites data with a focus on high-quality and reasoning dense properties. The model belongs to the Ph...
Phi-3-medium-4k-instruct

The Phi-3-Medium-4K-Instruct is a 14B parameters, lightweight, state-of-the-art open model trained with the Phi-3 datasets that includes both synthetic data and the filtered publicly available websites data with a focus on high-quality and reasoning dense properties. The model belongs to the Phi-...
Phi-3-mini-128k-instruct

The Phi-3-Mini-128K-Instruct is a 3.8 billion-parameter, lightweight, state-of-the-art open model trained using the Phi-3 datasets. This dataset includes both synthetic data and filtered publicly available website data, with an emphasis on high-quality and reasoning-dense properties.

After initi...

Phi-3-mini-128k-instruct-cuda-gpu

This model is an optimized version of Phi-3-Mini-128K-Instruct to enable local inference on CUDA GPUs. This model uses RTN quantization.

Model Description

Developed by: Microsoft
Model type: ONNX
License: MIT
Model Description: This is a conversion of the Phi-3-Mini-128...
Phi-3-mini-128k-instruct-generic-cpu

This model is an optimized version of Phi-3-Mini-128K-Instruct to enable local inference on CPUs. This model uses RTN quantization.

Model Description

Developed by: Microsoft
Model type: ONNX
License: MIT
Model Description: This is a conversion of the Phi-3-Mini-128K-Ins...
Phi-3-mini-128k-instruct-generic-gpu

This model is an optimized version of Phi-3-Mini-128K-Instruct to enable local inference on GPUs. This model uses RTN quantization.

Model Description

Developed by: Microsoft
Model type: ONNX
License: MIT
Model Description: This is a conversion of the Phi-3-Mini-128K-Ins...
Phi-3-mini-128k-instruct-openvino-gpu

This model is an optimized version of Phi-3-Mini-128K-Instruct to enable local inference on Intel GPUs.

Model Description

Developed by: Microsoft
Model type: ONNX
License: MIT
Model Description: This is a conversion of the Phi-3-Mini-128K-Instruct for local inference on...
phi-3-mini-128k-instruct-qnn-npu

This model is an optimized version of phi-3-mini-128k-instruct to enable local inference on QNN NPUs.

Model Description

Developed by: Microsoft
Model type: ONNX
License: MIT
Model Description: This is a conversion of the phi-3-mini-128k-instruct for local inference on Q...
phi-3-mini-128k-instruct-trtrtx-gpu

This model is an optimized version of phi-3-mini-128k-instruct to enable local inference on TensorRT-RTX GPUs.

Model Description

Developed by: Microsoft
Model type: ONNX
License: MIT
Model Description: This is a conversion of the phi-3-mini-128k-instruct for local infer...
phi-3-mini-128k-instruct-vitis-npu

This model is an optimized version of Phi-3-mini-128k-instruct to enable local inference on AMD NPUs. This model uses RTN quantization.

Model Description

Developed by: Microsoft
Model type: ONNX
License: MIT
Model Description: This is a conversion of the Phi-3-mini-128k...
Phi-3-mini-4k-instruct

The Phi-3-Mini-4K-Instruct is a 3.8B parameters, lightweight, state-of-the-art open model trained with the Phi-3 datasets that includes both synthetic data and the filtered publicly available websites data with a focus on high-quality and reasoning dense properties. The model belongs to the Phi-3...
Phi-3-mini-4k-instruct-cuda-gpu

This model is an optimized version of Phi-3-Mini-4K-Instruct to enable local inference on CUDA GPUs. This model uses RTN quantization.

Model Description

Developed by: Microsoft
Model type: ONNX
License: MIT
Model Description: This is a conversion of the Phi-3-Mini-4K-In...
Phi-3-mini-4k-instruct-generic-cpu

This model is an optimized version of Phi-3-Mini-4K-Instruct to enable local inference on CPUs. This model uses RTN quantization.

Model Description

Developed by: Microsoft
Model type: ONNX
License: MIT
Model Description: This is a conversion of the Phi-3-Mini-4K-Instruc...
Phi-3-mini-4k-instruct-generic-gpu

This model is an optimized version of Phi-3-Mini-4K-Instruct to enable local inference on GPUs. This model uses RTN quantization.

Model Description

Developed by: Microsoft
Model type: ONNX
License: MIT
Model Description: This is a conversion of the Phi-3-Mini-4K-Instruc...
Phi-3-mini-4k-instruct-openvino-gpu

This model is an optimized version of Phi-3-Mini-4K-Instruct to enable local inference on Intel GPUs.

Model Description

Developed by: Microsoft
Model type: ONNX
License: MIT
Model Description: This is a conversion of the Phi-3-Mini-4K-Instruct for local inference on Int...
Phi-3-mini-4k-instruct-openvino-npu

This model is an optimized version of Phi-3-Mini-4K-Instruct to enable local inference on Intel NPUs.

Model Description

Developed by: Microsoft
Model type: ONNX
License: MIT
Model Description: This is a conversion of the Phi-3-Mini-4K-Instruct for local inference on Int...
phi-3-mini-4k-instruct-qnn-npu

This model is an optimized version of phi-3-mini-4k-instruct to enable local inference on QNN NPUs.

Model Description

Developed by: Microsoft
Model type: ONNX
License: MIT
Model Description: This is a conversion of the phi-3-mini-4k-instruct for local inference on QNN N...
phi-3-mini-4k-instruct-trtrtx-gpu

This model is an optimized version of phi-3-mini-4k-instruct to enable local inference on TensorRT-RTX GPUs.

Model Description

Developed by: Microsoft
Model type: ONNX
License: MIT
Model Description: This is a conversion of the phi-3-mini-4k-instruct for local inference...
Phi-3-mini-4k-instruct-vitis-npu

This model is an optimized version of Phi-3-Mini-4K-Instruct to enable local inference on AMD NPUs. This model uses RTN quantization.

Model Description

Developed by: Microsoft
Model type: ONNX
License: MIT
Model Description: This is a conversion of the Phi-3-Mini-4K-Ins...
Phi-3-small-128k-instruct

The Phi-3-Small-128K-Instruct is a 7B parameters, lightweight, state-of-the-art open model trained with the Phi-3 datasets that includes both synthetic data and the filtered publicly available websites data with a focus on high-quality and reasoning dense properties. The model supports 128K conte...
Phi-3-small-8k-instruct

The Phi-3-Small-8K-Instruct is a 7B parameters, lightweight, state-of-the-art open model trained with the Phi-3 datasets that includes both synthetic data and the filtered publicly available websites data with a focus on high-quality and reasoning dense properties. The model supports 8K context l...
Phi-3-vision-128k-instruct

Model Summary

Phi-3 Vision is a lightweight, state-of-the-art open multimodal model built upon datasets which include - synthetic data and filtered publicly available websites - with a focus on very high-quality, reasoning dense data both on text and vision. The model belongs to the Phi-3 mo...

Phi-3.5-mini-instruct

Phi-3.5-mini is a lightweight, state-of-the-art open model built upon datasets used for Phi-3 - synthetic data and filtered publicly available websites - with a focus on very high-quality, reasoning dense data. The model belongs to the Phi-3 model family and supports 128K token context length. Th...
Phi-3.5-mini-instruct-cuda-gpu

This model is an optimized version of Phi-3.5-mini-instruct to enable local inference on CUDA GPUs. This model uses RTN quantization.

Model Description

Developed by: Microsoft
Model type: ONNX
License: MIT
Model Description: This is a conversion of the Phi-3.5-mini-inst...
Phi-3.5-mini-instruct-generic-cpu

This model is an optimized version of Phi-3.5-mini-instruct to enable local inference on CPUs. This model uses RTN quantization.

Model Description

Developed by: Microsoft
Model type: ONNX
License: MIT
Model Description: This is a conversion of the Phi-3.5-mini-instruct ...
Phi-3.5-mini-instruct-generic-gpu

This model is an optimized version of Phi-3.5-mini-instruct to enable local inference on GPUs. This model uses RTN quantization.

Model Description

Developed by: Microsoft
Model type: ONNX
License: MIT
Model Description: This is a conversion of the Phi-3.5-mini-instruct ...
Phi-3.5-mini-instruct-openvino-gpu

This model is an optimized version of Phi-3.5-Mini-Instruct to enable local inference on Intel GPUs.

Model Description

Developed by: Microsoft
Model type: ONNX
License: MIT
Model Description: This is a conversion of the Phi-3.5-Mini-Instruct for local inference on Intel...
phi-3.5-mini-instruct-qnn-npu

This model is an optimized version of phi-3.5-mini-instruct to enable local inference on QNN NPUs.

Model Description

Developed by: Microsoft
Model type: ONNX
License: MIT
Model Description: This is a conversion of the phi-3.5-mini-instruct for local inference on QNN NPU...
phi-3.5-mini-instruct-trtrtx-gpu

This model is an optimized version of Phi-3.5-mini-instruct to enable local inference on TensorRT-RTX GPUs.

Model Description

Developed by: Microsoft
Model type: ONNX
License: MIT
Model Description: This is a conversion of the Phi-3.5-mini-instruct for local inference o...
Phi-3.5-MoE-instruct

Phi-3.5-MoE is a lightweight, state-of-the-art open model built upon datasets used for Phi-3 - synthetic data and filtered publicly available documents - with a focus on very high-quality, reasoning dense data. The model supports multilingual and comes with 128K context length (in tokens). The mo...
Phi-3.5-vision-instruct

Phi-3.5-vision is a lightweight, state-of-the-art open multimodal model built upon datasets which include - synthetic data and filtered publicly available websites - with a focus on very high-quality, reasoning dense data both on text and vision. The model belongs to the Phi-3 model family, and t...
Phi-4

Phi-4 is a state-of-the-art open model built upon a blend of synthetic datasets, data from filtered public domain websites, and acquired academic books and Q&A datasets. The goal of this approach was to ensure that small capable models were trained with data focused on high quality and advanced r...
Phi-4-cuda-gpu

This model is an optimized version of Phi-4 to enable local inference on CUDA GPUs. This model uses RTN quantization.

Model Description

Developed by: Microsoft
Model type: ONNX
License: MIT
Model Description: This is a conversion of the Phi-4 for local inference on CUDA...
Phi-4-generic-cpu

This model is an optimized version of Phi-4 to enable local inference on CPUs. This model uses RTN quantization.

Model Description

Developed by: Microsoft
Model type: ONNX
License: MIT
Model Description: This is a conversion of the Phi-4 for local inference on CPUs.
*...
Phi-4-generic-gpu

This model is an optimized version of Phi-4 to enable local inference on GPUs. This model uses RTN quantization.

Model Description

Developed by: Microsoft
Model type: ONNX
License: MIT
Model Description: This is a conversion of the Phi-4 for local inference on GPUs.
*...
Phi-4-mini-instruct-cuda-gpu

This model is an optimized version of Phi-4-mini-instruct to enable local inference on CUDA GPUs. This model uses RTN quantization.

Model Description

Developed by: Microsoft
Model type: ONNX
License: MIT
Model Description: This is a conversion of the Phi-4-mini-instruct...
Phi-4-mini-instruct-generic-cpu

This model is an optimized version of Phi-4-mini-instruct to enable local inference on CPUs. This model uses RTN quantization.

Model Description

Developed by: Microsoft
Model type: ONNX
License: MIT
Model Description: This is a conversion of the Phi-4-mini-instruct for ...
Phi-4-mini-instruct-generic-cpu-dq44

This model is an optimized version of Phi-4-mini-instruct to enable local inference on CPUs. This model uses DiscQuant quantization.

Model Description

Developed by: Microsoft
Model type: ONNX
License: MIT
Model Description: This is a conversion of the Phi-4-mini-instruc...
Phi-4-mini-instruct-generic-gpu

This model is an optimized version of Phi-4-mini-instruct to enable local inference on GPUs. This model uses RTN quantization.

Model Description

Developed by: Microsoft
Model type: ONNX
License: MIT
Model Description: This is a conversion of the Phi-4-mini-instruct for ...
phi-4-mini-instruct-openvino-gpu

This model is an optimized version of Phi-4-mini-instruct to enable local inference on Intel GPUs.

Model Description

Developed by: Microsoft
Model type: ONNX
License: MIT
Model Description: This is a conversion of the Phi-4-mini-instruct for local inference on Intel GPU...
phi-4-mini-instruct-openvino-npu

This model is an optimized version of Phi-4-mini-instruct to enable local inference on Intel NPUs.

Model Description

Developed by: Microsoft
Model type: ONNX
License: MIT
Model Description: This is a conversion of the Phi-4-mini-instruct for local inference on Intel NPU...
phi-4-mini-instruct-vitis-npu

This model is an optimized version of Phi-4-mini-instruct to enable local inference on AMD NPUs. This model uses RTN quantization.

Model Description

Developed by: Microsoft
Model type: ONNX
License: MIT
Model Description: This is a conversion of the Phi-4-mini-instruct ...
Phi-4-mini-reasoning-cuda-gpu

This model is an optimized version of Phi-4-mini-reasoning to enable local inference on CUDA GPUs. This model uses RTN quantization.

Model Description

Developed by: Microsoft
Model type: ONNX
License: MIT
Model Description: This is a conversion of the Phi-4-mini-reasoni...
Phi-4-mini-reasoning-generic-cpu

This model is an optimized version of Phi-4-mini-reasoning to enable local inference on CPUs. This model uses RTN quantization.

Model Description

Developed by: Microsoft
Model type: ONNX
License: MIT
Model Description: This is a conversion of the Phi-4-mini-reasoning fo...
Phi-4-mini-reasoning-generic-gpu

This model is an optimized version of Phi-4-mini-reasoning to enable local inference on GPUs. This model uses RTN quantization.

Model Description

Developed by: Microsoft
Model type: ONNX
License: MIT
Model Description: This is a conversion of the Phi-4-mini-reasoning fo...
Phi-4-mini-reasoning-openvino-gpu

This model is an optimized version of Phi-4-mini-reasoning to enable local inference on Intel GPUs.

Model Description

Developed by: Microsoft
Model type: ONNX
License: MIT
Model Description: This is a conversion of the Phi-4-mini-reasoning for local inference on Intel GP...
Phi-4-mini-reasoning-openvino-npu

This model is an optimized version of Phi-4-mini-reasoning to enable local inference on Intel NPUs.

Model Description

Developed by: Microsoft
Model type: ONNX
License: MIT
Model Description: This is a conversion of the Phi-4-mini-reasoning for local inference on Intel NP...
Phi-4-mini-reasoning-qnn-npu

This model is an optimized version of Phi-4-mini-reasoning to enable local inference on QNN NPUs. This model uses QuaRot and GPTQ quantization.

Model Description

Developed by: Microsoft
Model type: ONNX
License: MIT
Model Description: This is a conversion of the Phi-4-m...
Phi-4-mini-reasoning-vitis-npu

This model is an optimized version of Phi-4-mini-reasoning to enable local inference on AMD NPUs.

Model Description

Developed by: Microsoft
Model type: ONNX
License: MIT
Model Description: This is a conversion of the Phi-4-mini-reasoning for local inference on AMD NPUs. ...
phi-4-openvino-gpu

This model is an optimized version of Phi-4 to enable local inference on Intel GPUs. This model uses RTN quantization.

Model Description

Developed by: Microsoft
Model type: ONNX
License: MIT
Model Description: This is a conversion of the Phi-4 for local inference on Int...
Phi-4-reasoning-cuda-gpu

This model is an optimized version of Phi-4-reasoning to enable local inference on CUDA GPUs. This model uses RTN quantization.

Model Description

Developed by: Microsoft
Model type: ONNX
License: MIT
Model Description: This is a conversion of the Phi-4-reasoning for loc...
Phi-4-reasoning-generic-cpu

This model is an optimized version of Phi-4-reasoning to enable local inference on CPUs. This model uses RTN quantization.

Model Description

Developed by: Microsoft
Model type: ONNX
License: MIT
Model Description: This is a conversion of the Phi-4-reasoning for local in...
Phi-4-reasoning-generic-gpu

This model is an optimized version of Phi-4-reasoning to enable local inference on GPUs. This model uses RTN quantization.

Model Description

Developed by: Microsoft
Model type: ONNX
License: MIT
Model Description: This is a conversion of the Phi-4-reasoning for local in...
Phi-4-trtrtx-gpu

This model is an optimized version of Phi-4 to enable local inference on TensorRT-RTX GPUs.

Model Description

Developed by: Microsoft
Model type: ONNX
License: MIT
Model Description: This is a conversion of the Phi-4 for local inference on TensorRT-RTX GPUs.
**Disclai...
playground-ayod-rag

This flow template is an advanced RAG flow modeled on the implementation of Azure AI Playground - on Your Data. The flow consists of tools that rewrites user query input into one or more queries based on chat history context using LLM, retrieves data for rewritten queries from the data index and ...
Prism

PRISM is a multi-modal generative foundation model for slide-level analysis of H&E-stained histopathology images. Utilizing Virchow tile embeddings and clinical report texts for pre-training, PRISM combines these embeddings into a single slide embedding and generates a text-based diagnostic repor...
projecte-aina-aguila-7b

Model Description

Aguila-7b

Click to expand

Model description
Intended uses and limitations
How to use
Limitations and bias
[Training...
projecte-aina-FLOR-1-3B

Model Description

FLOR-1.3B

Click to expand

Model description
Intended uses and limitations
How to use
Limitations and bias
[Training...
projecte-aina-FLOR-1-3B-Instructed

Model Description

FLOR-1.3B Instructed

Click to expand

Model description
Intended uses and limitations
How to use
Limitations and bias...
projecte-aina-FLOR-6-3B

Model Description

FLOR-6.3B

Click to expand

Model description
Intended uses and limitations
How to use
Limitations and bias
[Training...
projecte-aina-FLOR-6-3B-Instructed

Model Description

FLOR-6.3B Instructed

Click to expand

Model description
Intended uses and limitations
How to use
Limitations and bias...
Protected-Material-Evaluator

Definition

Protected material is any text that is under copyright, including song lyrics, recipes, and articles. Protected material evaluation leverages the Azure AI Content Safety Protected Material for Text service to perform the classification.

Labeling

Protected Material evaluations ...

Prov-GigaPath

Description

Digital pathology poses unique computational challenges, as a standard gigapixel slide may comprise tens of thousands of image tiles[^1],[^2],[^3]. Previous models often rely predominantly on tile-level predictions, which can overlook critical slide-level context and spatial depen...

QA-Evaluator

| | | | -- | -- | | Score range | Float [0-1] for F1 score evaluator: the higher, the more similar is the response with ground truth. Integer [1-5] for AI-assisted quality evaluators for question-and-answering (QA) scenarios: where 1 is bad and 5 is good | | What is this metric? | Measures compr...
qna-ada-similarity-eval

The "QnA Ada Similarity Evaluation" is a model to evaluate the Q&A Retrieval Augmented Generation systems by leveraging the state-of-the-art Large Language Models (LLM) to measure the quality and safety of your responses. Utilizing GPT-3.5 as the Language Model to assist with measurements aims to...
qna-coherence-eval

The "QnA Coherence Evaluation" is a model to evaluate the Q&A Retrieval Augmented Generation systems by leveraging the state-of-the-art Large Language Models (LLM) to measure the quality and safety of your responses. Utilizing GPT-3.5 as the Language Model to assist with measurements aims to achi...
qna-f1-score-eval

The "QnA F1 Score Evaluation" is a model to evaluate the Q&A Retrieval Augmented Generation systems using f1 score based on the word counts in predicted answer and ground truth.

Inference samples

Inference type	CLI	VS Code Extension
Real time	<a href="https://microsoft.github.io...

qna-fluency-eval

The "QnA Fluency Evaluation" is a model to evaluate the Q&A Retrieval Augmented Generation systems by leveraging the state-of-the-art Large Language Models (LLM) to measure the quality and safety of your responses. Utilizing GPT-3.5 as the Language Model to assist with measurements aims to achiev...
qna-gpt-similarity-eval

The "QnA GPT Similarity Evaluation" is a model to evaluate the Q&A Retrieval Augmented Generation systems by leveraging the state-of-the-art Large Language Models (LLM) to measure the quality and safety of your responses. Utilizing GPT-3.5 as the Language Model to assist with measurements aims to...
qna-groundedness-eval

The "QnA Groundedness Evaluation" is a model to evaluate the Q&A Retrieval Augmented Generation systems by leveraging the state-of-the-art Large Language Models (LLM) to measure the quality and safety of your responses. Utilizing GPT-3.5 as the Language Model to assist with measurements aims to a...
qna-non-rag-metrics-eval

The Q&A evaluation flow will evaluate the Q&A systems by leveraging the state-of-the-art Large Language Models (LLM) to measure the quality and safety of your responses. Utilizing GPT and GPT embedding model to assist with measurements aims to achieve a high agreement with human evaluations compa...
qna-quality-safety-eval

The Q&A quality and safety evaluation flow will evaluate the Q&A systems by leveraging the state-of-the-art Large Language Models (LLM) to measure the quality and safety of your responses. Utilizing GPT and GPT embedding model to assist with measurements aims to achieve a high agreement with huma...
qna-rag-metrics-eval

The Q&A RAG (Retrieval Augmented Generation) evaluation flow will evaluate the Q&A RAG systems by leveraging the state-of-the-art Large Language Models (LLM) to measure the quality and safety of your responses . Utilizing GPT model to assist with measurements aims to achieve a high agreement with...
qna-relevance-eval

The "QnA Relevance Evaluation" is a model to evaluate the Q&A Retrieval Augmented Generation systems by leveraging the state-of-the-art Large Language Models (LLM) to measure the quality and safety of your responses. Utilizing GPT-3.5 as the Language Model to assist with measurements aims to achi...
qna-with-your-own-data-using-faiss-index

The "QnA with Your Own Data Using Faiss Index" is a Q&A model with GPT3.5 using information from vector search to make the answer more grounded. It involves embedding user's question with LLM, and then using Faiss Index Lookup to find relevant documents based on vectors. By utilizing vector searc...
qwen2.5-0.5b-instruct

This model is an optimized version of Qwen2.5-0.5B-Instruct for local inference. Optimized models are published here in ONNX format to run on CPU and GPU across devices, including server platforms, Windows, Linux and Mac desktops, and mobile CPUs, with the precision best suited to each of these t...
qwen2.5-0.5b-instruct-cuda-gpu

This model is an optimized version of Qwen2.5-0.5B-Instruct to enable local inference on CUDA GPUs. This model uses RTN quantization.

Model Description

Developed by: Microsoft
Model type: ONNX
License: apache-2.0
Model Description: This is a conversion of the Qwen2.5-0....
qwen2.5-0.5b-instruct-generic-cpu

This model is an optimized version of Qwen2.5-0.5B-Instruct to enable local inference on CPUs. This model uses RTN quantization.

Model Description

Developed by: Microsoft
Model type: ONNX
License: apache-2.0
Model Description: This is a conversion of the Qwen2.5-0.5B-In...
qwen2.5-0.5b-instruct-generic-gpu

This model is an optimized version of Qwen2.5-0.5B-Instruct to enable local inference on GPUs. This model uses RTN quantization.

Model Description

Developed by: Microsoft
Model type: ONNX
License: apache-2.0
Model Description: This is a conversion of the Qwen2.5-0.5B-In...
qwen2.5-0.5b-instruct-openvino-gpu

This model is an optimized version of Qwen2.5-0.5B-Instruct to enable local inference on Intel GPUs. This model uses RTN quantization.

Model Description

Developed by: Microsoft
Model type: ONNX
License: apache-2.0
Model Description: This is a conversion of the Qwen2.5-0...
qwen2.5-0.5b-instruct-openvino-npu

This model is an optimized version of Qwen2.5-0.5B-Instruct to enable local inference on Intel NPUs. This model uses RTN quantization.

Model Description

Developed by: Microsoft
Model type: ONNX
License: apache-2.0
Model Description: This is a conversion of the Qwen2.5-0...
qwen2.5-0.5b-instruct-qnn-npu

This model is an optimized version of qwen2.5-0.5b-instruct to enable local inference on QNN NPUs.

Model Description

Developed by: Microsoft
Model type: ONNX
License: MIT
Model Description: This is a conversion of the qwen2.5-0.5b-instruct for local inference on QNN NPU...
qwen2.5-0.5b-instruct-trtrtx-gpu

This model is an optimized version of Qwen2.5-0.5B-Instruct to enable local inference on TensorRT-RTX GPUs. This model uses RTN quantization.

Model Description

Developed by: Microsoft
Model type: ONNX
License: apache-2.0
Model Description: This is a conversion of the Qw...
qwen2.5-0.5b-instruct-vitis-npu

This model is an optimized version of Qwen2.5-0.5B-Instruct to enable local inference on AMD NPUs. This model uses RTN quantization.

Model Description

Developed by: Microsoft
Model type: ONNX
License: apache-2.0
Model Description: This is a conversion of the Qwen2.5-0.5...
qwen2.5-1.5b-instruct

This model is an optimized version of Qwen2.5-1.5B-Instruct for local inference. Optimized models are published here in ONNX format to run on CPU and GPU across devices, including server platforms, Windows, Linux and Mac desktops, and mobile CPUs, with the precision best suited to each of these t...
qwen2.5-1.5b-instruct-cuda-gpu

This model is an optimized version of Qwen2.5-1.5B-Instruct to enable local inference on CUDA GPUs. This model uses RTN quantization.

Model Description

Developed by: Microsoft
Model type: ONNX
License: apache-2.0
Model Description: This is a conversion of the Qwen2.5-1....
qwen2.5-1.5b-instruct-generic-cpu

This model is an optimized version of Qwen2.5-1.5B-Instruct to enable local inference on CPUs. This model uses RTN quantization.

Model Description

Developed by: Microsoft
Model type: ONNX
License: apache-2.0
Model Description: This is a conversion of the Qwen2.5-1.5B-In...
qwen2.5-1.5b-instruct-generic-gpu

This model is an optimized version of Qwen2.5-1.5B-Instruct to enable local inference on GPUs. This model uses RTN quantization.

Model Description

Developed by: Microsoft
Model type: ONNX
License: apache-2.0
Model Description: This is a conversion of the Qwen2.5-1.5B-In...
qwen2.5-1.5b-instruct-openvino-gpu

This model is an optimized version of Qwen2.5-1.5B-Instruct to enable local inference on Intel GPUs.

Model Description

Developed by: Microsoft
Model type: ONNX
License: apache-2.0
Model Description: This is a conversion of the Qwen2.5-1.5B-Instruct for local inference o...
qwen2.5-1.5b-instruct-openvino-npu

This model is an optimized version of Qwen2.5-1.5B-Instruct to enable local inference on Intel NPUs.

Model Description

Developed by: Microsoft
Model type: ONNX
License: apache-2.0
Model Description: This is a conversion of the Qwen2.5-1.5B-Instruct for local inference o...
qwen2.5-1.5b-instruct-qnn-npu

This model is an optimized version of qwen2.5-1.5b-instruct to enable local inference on QNN NPUs.

Model Description

Developed by: Microsoft
Model type: ONNX
License: MIT
Model Description: This is a conversion of the qwen2.5-1.5b-instruct for local inference on QNN NPU...
qwen2.5-1.5b-instruct-trtrtx-gpu

This model is an optimized version of Qwen2.5-1.5b-instruct to enable local inference on TensorRT-RTX GPUs.

Model Description

Developed by: Microsoft
Model type: ONNX
License: MIT
Model Description: This is a conversion of the Qwen2.5-1.5b-instruct for local inference o...
qwen2.5-14b-instruct

This model is an optimized version of Qwen2.5-14B-Instruct for local inference. Optimized models are published here in ONNX format to run on CPU and GPU across devices, including server platforms, Windows, Linux and Mac desktops, and mobile CPUs, with the precision best suited to each of these ta...
qwen2.5-14b-instruct-cuda-gpu

This model is an optimized version of Qwen2.5-14B-Instruct to enable local inference on CUDA GPUs. This model uses RTN quantization.

Model Description

Developed by: Microsoft
Model type: ONNX
License: apache-2.0
Model Description: This is a conversion of the Qwen2.5-14B...
qwen2.5-14b-instruct-generic-cpu

This model is an optimized version of Qwen2.5-14B-Instruct to enable local inference on CPUs. This model uses RTN quantization.

Model Description

Developed by: Microsoft
Model type: ONNX
License: apache-2.0
Model Description: This is a conversion of the Qwen2.5-14B-Inst...
qwen2.5-14b-instruct-generic-gpu

This model is an optimized version of Qwen2.5-14B-Instruct to enable local inference on GPUs. This model uses RTN quantization.

Model Description

Developed by: Microsoft
Model type: ONNX
License: apache-2.0
Model Description: This is a conversion of the Qwen2.5-14B-Inst...
qwen2.5-14b-instruct-openvino-gpu

This model is an optimized version of Qwen2.5-14B-Instruct to enable local inference on Intel GPUs.

Model Description

Developed by: Microsoft
Model type: ONNX
License: apache-2.0
Model Description: This is a conversion of the Qwen2.5-14B-Instruct for local inference on ...
qwen2.5-14b-instruct-trtrtx-gpu

This model is an optimized version of Qwen2.5-14B-Instruct to enable local inference on TensorRT-RTX GPUs. This model uses RTN quantization.

Model Description

Developed by: Microsoft
Model type: ONNX
License: apache-2.0
Model Description: This is a conversion of the Qwe...
qwen2.5-3b-instruct

This model is an optimized version of Qwen2.5-3B-Instruct for local inference. Optimized models are published here in ONNX format to run on CPU and GPU across devices, including server platforms, Windows, Linux and Mac desktops, and mobile CPUs, with the precision best suited to each of these tar...
qwen2.5-3b-instruct-cuda-gpu

This model is an optimized version of Qwen2.5-3B-Instruct to enable local inference on CUDA GPUs. This model uses RTN quantization.

Model Description

Developed by: Microsoft
Model type: ONNX
License: apache-2.0
Model Description: This is a conversion of the Qwen2.5-3B-I...
qwen2.5-3b-instruct-generic-cpu

This model is an optimized version of Qwen2.5-3B-Instruct to enable local inference on CPUs. This model uses RTN quantization.

Model Description

Developed by: Microsoft
Model type: ONNX
License: apache-2.0
Model Description: This is a conversion of the Qwen2.5-3B-Instru...
qwen2.5-3b-instruct-generic-gpu

This model is an optimized version of Qwen2.5-3B-Instruct to enable local inference on GPUs. This model uses RTN quantization.

Model Description

Developed by: Microsoft
Model type: ONNX
License: apache-2.0
Model Description: This is a conversion of the Qwen2.5-3B-Instru...
qwen2.5-7b-instruct

This model is an optimized version of Qwen2.5-7B-Instruct for local inference. Optimized models are published here in ONNX format to run on CPU and GPU across devices, including server platforms, Windows, Linux and Mac desktops, and mobile CPUs, with the precision best suited to each of these tar...
qwen2.5-7b-instruct-cuda-gpu

This model is an optimized version of Qwen2.5-7B-Instruct to enable local inference on CUDA GPUs. This model uses RTN quantization.

Model Description

Developed by: Microsoft
Model type: ONNX
License: apache-2.0
Model Description: This is a conversion of the Qwen2.5-7B-I...
qwen2.5-7b-instruct-generic-cpu

This model is an optimized version of Qwen2.5-7B-Instruct to enable local inference on CPUs. This model uses RTN quantization.

Model Description

Developed by: Microsoft
Model type: ONNX
License: apache-2.0
Model Description: This is a conversion of the Qwen2.5-7B-Instru...
qwen2.5-7b-instruct-generic-gpu

This model is an optimized version of Qwen2.5-7B-Instruct to enable local inference on GPUs. This model uses RTN quantization.

Model Description

Developed by: Microsoft
Model type: ONNX
License: apache-2.0
Model Description: This is a conversion of the Qwen2.5-7B-Instru...
qwen2.5-7b-instruct-openvino-gpu

This model is an optimized version of Qwen2.5-7B-Instruct to enable local inference on Intel GPUs.

Model Description

Developed by: Microsoft
Model type: ONNX
License: apache-2.0
Model Description: This is a conversion of the Qwen2.5-7B-Instruct for local inference on In...
qwen2.5-7b-instruct-openvino-npu

This model is an optimized version of Qwen2.5-7B-Instruct to enable local inference on Intel NPUs.

Model Description

Developed by: Microsoft
Model type: ONNX
License: apache-2.0
Model Description: This is a conversion of the Qwen2.5-7B-Instruct for local inference on In...
qwen2.5-7b-instruct-qnn-npu

This model is an optimized version of qwen2.5-7b-instruct to enable local inference on QNN NPUs.

Model Description

Developed by: Microsoft
Model type: ONNX
License: MIT
Model Description: This is a conversion of the qwen2.5-7b-instruct for local inference on QNN NPUs. -...
qwen2.5-7b-instruct-trtrtx-gpu

This model is an optimized version of Qwen2.5-7B-Instruct to enable local inference on TensorRT-RTX GPUs.

Model Description

Developed by: Microsoft
Model type: ONNX
License: apache-2.0
Model Description: This is a conversion of the Qwen2.5-7B-Instruct for local inferenc...
qwen2.5-7b-instruct-vitis-npu

This model is an optimized version of Qwen2.5-7B-Instruct to enable local inference on AMD NPUs.

Model Description

Developed by: Microsoft
Model type: ONNX
License: apache-2.0
Model Description: This is a conversion of the Qwen2.5-7B-Instruct for local inference on AMD ...
qwen2.5-coder-0.5b-instruct

This model is an optimized version of Qwen2.5-Coder-0.5B-Instruct for local inference. Optimized models are published here in ONNX format to run on CPU and GPU across devices, including server platforms, Windows, Linux and Mac desktops, and mobile CPUs, with the precision best suited to each of t...
qwen2.5-coder-0.5b-instruct-cuda-gpu

This model is an optimized version of Qwen2.5-Coder-0.5B-Instruct to enable local inference on CUDA GPUs. This model uses RTN quantization.

Model Description

Developed by: Microsoft
Model type: ONNX
License: apache-2.0
Model Description: This is a conversion of the Qwen...
qwen2.5-coder-0.5b-instruct-generic-cpu

This model is an optimized version of Qwen2.5-Coder-0.5B-Instruct to enable local inference on CPUs. This model uses RTN quantization.

Model Description

Developed by: Microsoft
Model type: ONNX
License: apache-2.0
Model Description: This is a conversion of the Qwen2.5-C...
qwen2.5-coder-0.5b-instruct-generic-gpu

This model is an optimized version of Qwen2.5-Coder-0.5B-Instruct to enable local inference on GPUs. This model uses RTN quantization.

Model Description

Developed by: Microsoft
Model type: ONNX
License: apache-2.0
Model Description: This is a conversion of the Qwen2.5-C...
qwen2.5-coder-0.5b-instruct-openvino-gpu

This model is an optimized version of Qwen2.5-Coder-0.5B-Instruct to enable local inference on Intel GPUs.

Model Description

Developed by: Microsoft
Model type: ONNX
License: apache-2.0
Model Description: This is a conversion of the Qwen2.5-Coder-0.5B-Instruct for local...
qwen2.5-coder-0.5b-instruct-openvino-npu

This model is an optimized version of Qwen2.5-Coder-0.5B-Instruct to enable local inference on Intel NPUs.

Model Description

Developed by: Microsoft
Model type: ONNX
License: apache-2.0
Model Description: This is a conversion of the Qwen2.5-Coder-0.5B-Instruct for local...
qwen2.5-coder-0.5b-instruct-qnn-npu

This model is an optimized version of qwen2.5-coder-0.5b-instruct to enable local inference on QNN NPUs.

Model Description

Developed by: Microsoft
Model type: ONNX
License: MIT
Model Description: This is a conversion of the qwen2.5-coder-0.5b-instruct for local inferenc...
qwen2.5-coder-0.5b-instruct-trtrtx-gpu

This model is an optimized version of Qwen2.5-Coder-0.5B-Instruct to enable local inference on TensorRT-RTX GPUs. This model uses RTN quantization.

Model Description

Developed by: Microsoft
Model type: ONNX
License: apache-2.0
Model Description: This is a conversion of ...
qwen2.5-coder-0.5b-instruct-vitis-npu

This model is an optimized version of Qwen2.5-Coder-0.5B-Instruct to enable local inference on AMD NPUs.

Model Description

Developed by: Microsoft
Model type: ONNX
License: apache-2.0
Model Description: This is a conversion of the Qwen2.5-Coder-0.5B-Instruct for local in...
qwen2.5-coder-1.5b-instruct

This model is an optimized version of Qwen2.5-Coder-1.5B-Instruct for local inference. Optimized models are published here in ONNX format to run on CPU and GPU across devices, including server platforms, Windows, Linux and Mac desktops, and mobile CPUs, with the precision best suited to each of t...
qwen2.5-coder-1.5b-instruct-cuda-gpu

This model is an optimized version of Qwen2.5-Coder-1.5B-Instruct to enable local inference on CUDA GPUs. This model uses RTN quantization.

Model Description

Developed by: Microsoft
Model type: ONNX
License: apache-2.0
Model Description: This is a conversion of the Qwen...
qwen2.5-coder-1.5b-instruct-generic-cpu

This model is an optimized version of Qwen2.5-Coder-1.5B-Instruct to enable local inference on CPUs. This model uses RTN quantization.

Model Description

Developed by: Microsoft
Model type: ONNX
License: apache-2.0
Model Description: This is a conversion of the Qwen2.5-C...
qwen2.5-coder-1.5b-instruct-generic-gpu

This model is an optimized version of Qwen2.5-Coder-1.5B-Instruct to enable local inference on GPUs. This model uses RTN quantization.

Model Description

Developed by: Microsoft
Model type: ONNX
License: apache-2.0
Model Description: This is a conversion of the Qwen2.5-C...
qwen2.5-coder-1.5b-instruct-openvino-gpu

This model is an optimized version of Qwen2.5-Coder-0.5B-Instruct to enable local inference on Intel GPUs.

Model Description

Developed by: Microsoft
Model type: ONNX
License: apache-2.0
Model Description: This is a conversion of the Qwen2.5-Coder-0.5B-Instruct for local...
qwen2.5-coder-1.5b-instruct-openvino-npu

This model is an optimized version of Qwen2.5-Coder-0.5B-Instruct to enable local inference on Intel NPUs.

Model Description

Developed by: Microsoft
Model type: ONNX
License: apache-2.0
Model Description: This is a conversion of the Qwen2.5-Coder-0.5B-Instruct for local...
qwen2.5-coder-1.5b-instruct-qnn-npu

This model is an optimized version of qwen2.5-coder-1.5b-instruct to enable local inference on QNN NPUs.

Model Description

Developed by: Microsoft
Model type: ONNX
License: MIT
Model Description: This is a conversion of the qwen2.5-coder-1.5b-instruct for local inferenc...
qwen2.5-coder-1.5b-instruct-trtrtx-gpu

This model is an optimized version of Qwen2.5-Coder-1.5B-Instruct to enable local inference on TensorRT-RTX GPUs. This model uses RTN quantization.

Model Description

Developed by: Microsoft
Model type: ONNX
License: apache-2.0
Model Description: This is a conversion of ...
qwen2.5-coder-1.5b-instruct-vitis-npu

This model is an optimized version of Qwen2.5-Coder-1.5B-Instruct to enable local inference on AMD NPUs.

Model Description

Developed by: Microsoft
Model type: ONNX
License: apache-2.0
Model Description: This is a conversion of the Qwen2.5-Coder-1.5B-Instruct for local i...
qwen2.5-coder-14b-instruct

This model is an optimized version of Qwen2.5-Coder-14B-Instruct for local inference. Optimized models are published here in ONNX format to run on CPU and GPU across devices, including server platforms, Windows, Linux and Mac desktops, and mobile CPUs, with the precision best suited to each of th...
qwen2.5-coder-14b-instruct-cuda-gpu

This model is an optimized version of Qwen2.5-Coder-14B-Instruct to enable local inference on CUDA GPUs. This model uses RTN quantization.

Model Description

Developed by: Microsoft
Model type: ONNX
License: apache-2.0
Model Description: This is a conversion of the Qwen2...
qwen2.5-coder-14b-instruct-generic-cpu

This model is an optimized version of Qwen2.5-Coder-14B-Instruct to enable local inference on CPUs. This model uses RTN quantization.

Model Description

Developed by: Microsoft
Model type: ONNX
License: apache-2.0
Model Description: This is a conversion of the Qwen2.5-Co...
qwen2.5-coder-14b-instruct-generic-gpu

This model is an optimized version of Qwen2.5-Coder-14B-Instruct to enable local inference on GPUs. This model uses RTN quantization.

Model Description

Developed by: Microsoft
Model type: ONNX
License: apache-2.0
Model Description: This is a conversion of the Qwen2.5-Co...
qwen2.5-coder-14b-instruct-openvino-gpu

This model is an optimized version of Qwen2.5-Coder-14B-Instruct to enable local inference on Intel GPUs.

Model Description

Developed by: Microsoft
Model type: ONNX
License: apache-2.0
Model Description: This is a conversion of the Qwen2.5-Coder-14B-Instruct for local i...
qwen2.5-coder-14b-instruct-trtrtx-gpu

This model is an optimized version of Qwen2.5-Coder-14B-Instruct to enable local inference on TensorRT-RTX GPUs.

Model Description

Developed by: Microsoft
Model type: ONNX
License: apache-2.0
Model Description: This is a conversion of the Qwen2.5-Coder-14B-Instruct for ...
qwen2.5-coder-3b-instruct

This model is an optimized version of Qwen2.5-Coder-3B-Instruct for local inference. Optimized models are published here in ONNX format to run on CPU and GPU across devices, including server platforms, Windows, Linux and Mac desktops, and mobile CPUs, with the precision best suited to each of the...
qwen2.5-coder-3b-instruct-cuda-gpu

This model is an optimized version of Qwen2.5-Coder-3B-Instruct to enable local inference on CUDA GPUs. This model uses RTN quantization.

Model Description

Developed by: Microsoft
Model type: ONNX
License: apache-2.0
Model Description: This is a conversion of the Qwen2....
qwen2.5-coder-3b-instruct-generic-cpu

This model is an optimized version of Qwen2.5-Coder-3B-Instruct to enable local inference on CPUs. This model uses RTN quantization.

Model Description

Developed by: Microsoft
Model type: ONNX
License: apache-2.0
Model Description: This is a conversion of the Qwen2.5-Cod...
qwen2.5-coder-3b-instruct-generic-gpu

This model is an optimized version of Qwen2.5-Coder-3B-Instruct to enable local inference on GPUs. This model uses RTN quantization.

Model Description

Developed by: Microsoft
Model type: ONNX
License: apache-2.0
Model Description: This is a conversion of the Qwen2.5-Cod...
qwen2.5-coder-7b-instruct

This model is an optimized version of Qwen2.5-Coder-7B-Instruct for local inference. Optimized models are published here in ONNX format to run on CPU and GPU across devices, including server platforms, Windows, Linux and Mac desktops, and mobile CPUs, with the precision best suited to each of the...
qwen2.5-coder-7b-instruct-cuda-gpu

This model is an optimized version of Qwen2.5-Coder-7B-Instruct to enable local inference on CUDA GPUs. This model uses RTN quantization.

Model Description

Developed by: Microsoft
Model type: ONNX
License: apache-2.0
Model Description: This is a conversion of the Qwen2....
qwen2.5-coder-7b-instruct-generic-cpu

This model is an optimized version of Qwen2.5-Coder-7B-Instruct to enable local inference on CPUs. This model uses RTN quantization.

Model Description

Developed by: Microsoft
Model type: ONNX
License: apache-2.0
Model Description: This is a conversion of the Qwen2.5-Cod...
qwen2.5-coder-7b-instruct-generic-gpu

This model is an optimized version of Qwen2.5-Coder-7B-Instruct to enable local inference on GPUs. This model uses RTN quantization.

Model Description

Developed by: Microsoft
Model type: ONNX
License: apache-2.0
Model Description: This is a conversion of the Qwen2.5-Cod...
qwen2.5-coder-7b-instruct-openvino-gpu

This model is an optimized version of Qwen2.5-Coder-7B-Instruct to enable local inference on Intel GPUs.

Model Description

Developed by: Microsoft
Model type: ONNX
License: apache-2.0
Model Description: This is a conversion of the Qwen2.5-Coder-7B-Instruct for local inf...
qwen2.5-coder-7b-instruct-openvino-npu

This model is an optimized version of Qwen2.5-Coder-7B-Instruct to enable local inference on Intel NPUs.

Model Description

Developed by: Microsoft
Model type: ONNX
License: apache-2.0
Model Description: This is a conversion of the Qwen2.5-Coder-7B-Instruct for local inf...
qwen2.5-coder-7b-instruct-qnn-npu

This model is an optimized version of qwen2.5-coder-7b-instruct to enable local inference on QNN NPUs.

Model Description

Developed by: Microsoft
Model type: ONNX
License: MIT
Model Description: This is a conversion of the qwen2.5-coder-7b-instruct for local inference on...
qwen2.5-coder-7b-instruct-trtrtx-gpu

This model is an optimized version of Qwen2.5-Coder-7B-Instruct to enable local inference on TensorRT-RTX GPUs.

Model Description

Developed by: Microsoft
Model type: ONNX
License: apache-2.0
Model Description: This is a conversion of the Qwen2.5-Coder-7B-Instruct for lo...
qwen2.5-coder-7b-instruct-vitis-npu

This model is an optimized version of Qwen2.5-Coder-7B-Instruct to enable local inference on AMD NPUs.

Model Description

Developed by: Microsoft
Model type: ONNX
License: apache-2.0
Model Description: This is a conversion of the Qwen2.5-Coder-7B-Instruct for local infer...
qwen3-0.6b

This model is an optimized version of Qwen3-0.6B-Finetuned for local inference. Optimized models are published here in ONNX format to run on CPU and GPU across devices, including server platforms, Windows, Linux and Mac desktops, and mobile CPUs, with the precision best suited to each of these ta...
qwen3-0.6b-cuda-gpu

This model is an optimized version of Qwen3-0.6B to enable local inference on CUDA GPUs. This model uses KLD Gradient quantization.

Model Description

Developed by: Microsoft
Model type: ONNX
License: apache-2.0
Model Description: This is a conversion of the Qwen3-0.6B f...
qwen3-0.6b-generic-cpu

This model is an optimized version of Qwen3-0.6B to enable local inference on CPUs. This model uses KLD Gradient quantization.

Model Description

Developed by: Microsoft
Model type: ONNX
License: apache-2.0
Model Description: This is a conversion of the Qwen3-0.6B for lo...
qwen3-0.6b-generic-cpu-dq44

This model is an optimized version of Qwen3-0.6B to enable local inference on CPUs. This model uses DiscQuant quantization.

Model Description

Developed by: Microsoft
Model type: ONNX
License: apache-2.0
Model Description: This is a conversion of the Qwen3-0.6B for local...
qwen3-0.6b-generic-gpu

This model is an optimized version of Qwen3-0.6B to enable local inference on GPUs. This model uses KLD Gradient quantization.

Model Description

Developed by: Microsoft
Model type: ONNX
License: apache-2.0
Model Description: This is a conversion of the Qwen3-0.6B for lo...
qwen3-0.6b-pp-finetuned-generic-cpu

This model is an optimized version of Qwen3-0.6B to enable local inference on CPUs. This model uses DiscQuant mixed-precision quantization.

Model Description

Developed by: Microsoft
Model type: ONNX
License: apache-2.0
Model Description: This is a conversion of the Qwen...
qwen3-0.6b-pp-finetuned-mtt-generic-cpu

This model is an optimized version of Qwen3-0.6B-MTT-Finetuned to enable local inference on CPUs. This model uses DiscQuant mixed-precision quantization.

Model Description

Developed by: Microsoft
Model type: ONNX
License: apache-2.0
Model Description: This is a conversi...
qwen3-1.7b

This model is an optimized version of Qwen3-1.7B for local inference. Optimized models are published here in ONNX format to run on CPU and GPU across devices, including server platforms, Windows, Linux and Mac desktops, and mobile CPUs, with the precision best suited to each of these targets.

...

qwen3-1.7b-cuda-gpu

This model is an optimized version of Qwen3-1.7B to enable local inference on CUDA GPUs. This model uses KLD Gradient quantization.

Model Description

Developed by: Microsoft
Model type: ONNX
License: apache-2.0
Model Description: This is a conversion of the Qwen3-1.7B f...
qwen3-1.7b-generic-cpu

This model is an optimized version of Qwen3-1.7B to enable local inference on CPUs. This model uses KLD Gradient quantization.

Model Description

Developed by: Microsoft
Model type: ONNX
License: apache-2.0
Model Description: This is a conversion of the Qwen3-1.7B for lo...
qwen3-1.7b-generic-gpu

This model is an optimized version of Qwen3-1.7B to enable local inference on GPUs. This model uses KLD Gradient quantization.

Model Description

Developed by: Microsoft
Model type: ONNX
License: apache-2.0
Model Description: This is a conversion of the Qwen3-1.7B for lo...
qwen3-14b

This model is an optimized version of Qwen3-14B for local inference. Optimized models are published here in ONNX format to run on CPU and GPU across devices, including server platforms, Windows, Linux and Mac desktops, and mobile CPUs, with the precision best suited to each of these targets.

O...

qwen3-14b-cuda-gpu

This model is an optimized version of Qwen3-14B to enable local inference on CUDA GPUs. This model uses GPTQ quantization.

Model Description

Developed by: Microsoft
Model type: ONNX
License: apache-2.0
Model Description: This is a conversion of the Qwen3-14B for local i...
qwen3-14b-generic-cpu

This model is an optimized version of Qwen3-14B to enable local inference on CPUs. This model uses GPTQ quantization.

Model Description

Developed by: Microsoft
Model type: ONNX
License: apache-2.0
Model Description: This is a conversion of the Qwen3-14B for local infere...
qwen3-14b-generic-gpu

This model is an optimized version of Qwen3-14B to enable local inference on GPUs. This model uses GPTQ quantization.

Model Description

Developed by: Microsoft
Model type: ONNX
License: apache-2.0
Model Description: This is a conversion of the Qwen3-14B for local infere...
qwen3-4b

This model is an optimized version of Qwen3-4B for local inference. Optimized models are published here in ONNX format to run on CPU and GPU across devices, including server platforms, Windows, Linux and Mac desktops, and mobile CPUs, with the precision best suited to each of these targets.

ON...

qwen3-4b-cuda-gpu

This model is an optimized version of Qwen3-4B to enable local inference on CUDA GPUs. This model uses KLD Gradient quantization.

Model Description

Developed by: Microsoft
Model type: ONNX
License: apache-2.0
Model Description: This is a conversion of the Qwen3-4B for l...
qwen3-4b-generic-cpu

This model is an optimized version of Qwen3-4B to enable local inference on CPUs. This model uses KLD Gradient quantization.

Model Description

Developed by: Microsoft
Model type: ONNX
License: apache-2.0
Model Description: This is a conversion of the Qwen3-4B for local ...
qwen3-4b-generic-gpu

This model is an optimized version of Qwen3-4B to enable local inference on GPUs. This model uses KLD Gradient quantization.

Model Description

Developed by: Microsoft
Model type: ONNX
License: apache-2.0
Model Description: This is a conversion of the Qwen3-4B for local ...
qwen3-8b

This model is an optimized version of Qwen3-8B for local inference. Optimized models are published here in ONNX format to run on CPU and GPU across devices, including server platforms, Windows, Linux and Mac desktops, and mobile CPUs, with the precision best suited to each of these targets.

ON...

qwen3-8b-cuda-gpu

This model is an optimized version of Qwen3-8B to enable local inference on CUDA GPUs. This model uses KLD Gradient quantization.

Model Description

Developed by: Microsoft
Model type: ONNX
License: apache-2.0
Model Description: This is a conversion of the Qwen3-8B for l...
qwen3-8b-generic-cpu

This model is an optimized version of Qwen3-8B to enable local inference on CPUs. This model uses KLD Gradient quantization.

Model Description

Developed by: Microsoft
Model type: ONNX
License: apache-2.0
Model Description: This is a conversion of the Qwen3-8B for local ...
qwen3-8b-generic-gpu

This model is an optimized version of Qwen3-8B to enable local inference on GPUs. This model uses KLD Gradient quantization.

Model Description

Developed by: Microsoft
Model type: ONNX
License: apache-2.0
Model Description: This is a conversion of the Qwen3-8B for local ...
qwen3-embedding-0.6b

Qwen3 Embedding 0.6B

This is the parent model entry for qwen3-embedding-0.6b in the Foundry Local catalog.

Model Details

Model Type: Text Embedding
Architecture: Transformer (Qwen3)
Parameters: 0.6 billion
Context Length: 32K tokens
Embedding Dimension: Up to ...
qwen3-embedding-0.6b-cuda-gpu

Qwen3 Embedding 0.6B Cuda Gpu

This is the GPU (NVIDIA CUDA)-optimized variant of qwen3-embedding-0.6b, a text embedding model from the Qwen3 family developed by Alibaba Cloud and optimized by Microsoft.

Model Details

Model Type: Text Embedding (ONNX)
Parameters: 0.6 billion
*...
qwen3-embedding-0.6b-generic-cpu

Qwen3 Embedding 0.6B Generic Cpu

This is the CPU-optimized variant of qwen3-embedding-0.6b, a text embedding model from the Qwen3 family developed by Alibaba Cloud and optimized by Microsoft.

Model Details

Model Type: Text Embedding (ONNX)
Parameters: 0.6 billion
**Context Le...
qwen3-embedding-0.6b-generic-gpu

Qwen3 Embedding 0.6B Webgpu Gpu

This is the GPU (WebGPU)-optimized variant of qwen3-embedding-0.6b, a text embedding model from the Qwen3 family developed by Alibaba Cloud and optimized by Microsoft.

Model Details

Model Type: Text Embedding (ONNX)
Parameters: 0.6 billion
**Co...
qwen3-embedding-8b

Qwen3 Embedding 8B

This is the parent model entry for qwen3-embedding-8b in the Foundry Local catalog.

Model Details

Model Type: Text Embedding
Architecture: Transformer (Qwen3)
Parameters: 8 billion
Context Length: 32K tokens
Embedding Dimension: Up to 4096 -...
qwen3-embedding-8b-cuda-gpu

Qwen3 Embedding 8B Cuda Gpu

This is the GPU (NVIDIA CUDA)-optimized variant of qwen3-embedding-8b, a text embedding model from the Qwen3 family developed by Alibaba Cloud and optimized by Microsoft.

Model Details

Model Type: Text Embedding (ONNX)
Parameters: 8 billion
**Conte...
qwen3-embedding-8b-generic-cpu

Qwen3 Embedding 8B Generic Cpu

This is the CPU-optimized variant of qwen3-embedding-8b, a text embedding model from the Qwen3 family developed by Alibaba Cloud and optimized by Microsoft.

Model Details

Model Type: Text Embedding (ONNX)
Parameters: 8 billion
Context Length...
qwen3-embedding-8b-generic-gpu

Qwen3 Embedding 8B Webgpu Gpu

This is the GPU (WebGPU)-optimized variant of qwen3-embedding-8b, a text embedding model from the Qwen3 family developed by Alibaba Cloud and optimized by Microsoft.

Model Details

Model Type: Text Embedding (ONNX)
Parameters: 8 billion
**Context ...
qwen3-vl-2b-instruct

This model is an optimized version of Qwen3-VL-2B-Instruct for local inference. Optimized models are published here in ONNX format to run on CPU and GPU across devices, including server platforms, Windows, Linux and Mac desktops, and mobile CPUs, with the precision best suited to each of these ta...
qwen3-vl-2b-instruct-cuda-gpu

This model is an optimized version of Qwen3-VL-2B-Instruct to enable local inference on CUDA GPUs. This model uses RTN quantization.

Model Description

Developed by: Microsoft
Model type: ONNX
License: apache-2.0
Model Description: This is a conversion of the Qwen3-VL-2B...
qwen3-vl-2b-instruct-generic-cpu

This model is an optimized version of Qwen3-VL-2B-Instruct to enable local inference on CPUs. This model uses RTN quantization.

Model Description

Developed by: Microsoft
Model type: ONNX
License: apache-2.0
Model Description: This is a conversion of the Qwen3-VL-2B-Inst...
qwen3-vl-4b-instruct

This model is an optimized version of Qwen3-VL-4B-Instruct for local inference. Optimized models are published here in ONNX format to run on CPU and GPU across devices, including server platforms, Windows, Linux and Mac desktops, and mobile CPUs, with the precision best suited to each of these ta...
qwen3-vl-4b-instruct-cuda-gpu

This model is an optimized version of Qwen3-VL-4B-Instruct to enable local inference on CUDA GPUs. This model uses RTN quantization.

Model Description

Developed by: Microsoft
Model type: ONNX
License: apache-2.0
Model Description: This is a conversion of the Qwen3-VL-4B...
qwen3-vl-4b-instruct-generic-cpu

This model is an optimized version of Qwen3-VL-4B-Instruct to enable local inference on CPUs. This model uses RTN quantization.

Model Description

Developed by: Microsoft
Model type: ONNX
License: apache-2.0
Model Description: This is a conversion of the Qwen3-VL-4B-Inst...
qwen3-vl-8b-instruct

This model is an optimized version of Qwen3-VL-8B-Instruct for local inference. Optimized models are published here in ONNX format to run on CPU and GPU across devices, including server platforms, Windows, Linux and Mac desktops, and mobile CPUs, with the precision best suited to each of these ta...
qwen3-vl-8b-instruct-cuda-gpu

This model is an optimized version of Qwen3-VL-8B-Instruct to enable local inference on CUDA GPUs. This model uses RTN quantization.

Model Description

Developed by: Microsoft
Model type: ONNX
License: apache-2.0
Model Description: This is a conversion of the Qwen3-VL-8B...
qwen3-vl-8b-instruct-generic-cpu

This model is an optimized version of Qwen3-VL-8B-Instruct to enable local inference on CPUs. This model uses RTN quantization.

Model Description

Developed by: Microsoft
Model type: ONNX
License: apache-2.0
Model Description: This is a conversion of the Qwen3-VL-8B-Inst...
qwen3.5-0.8b

This model is an optimized version of Qwen3.5-0.8B for local inference. Optimized models are published here in ONNX format to run on CPU and GPU across devices, including server platforms, Windows, Linux and Mac desktops, and mobile CPUs, with the precision best suited to each of these targets.

...

qwen3.5-0.8b-cuda-gpu

This model is an optimized version of Qwen3.5-0.8B to enable local inference on CUDA GPUs. This model uses RTN quantization.

Model Description

Developed by: Microsoft
Model type: ONNX
License: apache-2.0
Model Description: This is a conversion of the Qwen3.5-0.8B for lo...
qwen3.5-0.8b-generic-cpu

This model is an optimized version of Qwen3.5-0.8B to enable local inference on CPUs. This model uses RTN quantization.

Model Description

Developed by: Microsoft
Model type: ONNX
License: apache-2.0
Model Description: This is a conversion of the Qwen3.5-0.8B for local i...
qwen3.5-0.8b-generic-gpu

This model is an optimized version of Qwen3.5-0.8B to enable local inference on GPUs. This model uses RTN quantization.

Model Description

Developed by: Microsoft
Model type: ONNX
License: apache-2.0
Model Description: This is a conversion of the Qwen3.5-0.8B for local i...
qwen3.5-2b

This model is an optimized version of Qwen3.5-2B for local inference. Optimized models are published here in ONNX format to run on CPU and GPU across devices, including server platforms, Windows, Linux and Mac desktops, and mobile CPUs, with the precision best suited to each of these targets.

...

qwen3.5-2b-cuda-gpu

This model is an optimized version of Qwen3.5-2B to enable local inference on CUDA GPUs. This model uses RTN quantization.

Model Description

Developed by: Microsoft
Model type: ONNX
License: apache-2.0
Model Description: This is a conversion of the Qwen3.5-2B for local ...
qwen3.5-2b-generic-cpu

This model is an optimized version of Qwen3.5-2B to enable local inference on CPUs. This model uses RTN quantization.

Model Description

Developed by: Microsoft
Model type: ONNX
License: apache-2.0
Model Description: This is a conversion of the Qwen3.5-2B for local infer...
qwen3.5-2b-generic-gpu

This model is an optimized version of Qwen3.5-2B to enable local inference on GPUs. This model uses RTN quantization.

Model Description

Developed by: Microsoft
Model type: ONNX
License: apache-2.0
Model Description: This is a conversion of the Qwen3.5-2B for local infer...
qwen3.5-2b-text

This model is an optimized text-only version of Qwen3.5-2B for local inference. Optimized models are published here in ONNX format to run on CPU and GPU across devices, including server platforms, Windows, Linux and Mac desktops, and mobile CPUs, with the precision best suited to each of these ta...
qwen3.5-2b-text-cuda-gpu

This model is an optimized text-only version of Qwen3.5-2B to enable local inference on GPUs with CUDA. This model uses RTN quantization.

Model Description

Developed by: Microsoft
Model type: ONNX
License: apache-2.0
Model Description: This is a conversion of the Qwen3....
qwen3.5-2b-text-generic-cpu

This model is an optimized text-only version of Qwen3.5-2B to enable local inference on CPUs. This model uses RTN quantization.

Model Description

Developed by: Microsoft
Model type: ONNX
License: apache-2.0
Model Description: This is a conversion of the Qwen3.5-2B text ...
qwen3.5-2b-text-generic-gpu

This model is an optimized text-only version of Qwen3.5-2B to enable local inference on GPUs with WebGPU. This model uses RTN quantization.

Model Description

Developed by: Microsoft
Model type: ONNX
License: apache-2.0
Model Description: This is a conversion of the Qwen...
qwen3.5-4b

This model is an optimized version of Qwen3.5-4B for local inference. Optimized models are published here in ONNX format to run on CPU and GPU across devices, including server platforms, Windows, Linux and Mac desktops, and mobile CPUs, with the precision best suited to each of these targets.

...

qwen3.5-4b-cuda-gpu

This model is an optimized version of Qwen3.5-4B to enable local inference on CUDA GPUs. This model uses RTN quantization.

Model Description

Developed by: Microsoft
Model type: ONNX
License: apache-2.0
Model Description: This is a conversion of the Qwen3.5-4B for local ...
qwen3.5-4b-generic-cpu

This model is an optimized version of Qwen3.5-4B to enable local inference on CPUs. This model uses RTN quantization.

Model Description

Developed by: Microsoft
Model type: ONNX
License: apache-2.0
Model Description: This is a conversion of the Qwen3.5-4B for local infer...
qwen3.5-4b-generic-gpu

This model is an optimized version of Qwen3.5-4B to enable local inference on GPUs. This model uses RTN quantization.

Model Description

Developed by: Microsoft
Model type: ONNX
License: apache-2.0
Model Description: This is a conversion of the Qwen3.5-4B for local infer...
qwen3.5-9b

This model is an optimized version of Qwen3.5-9B for local inference. Optimized models are published here in ONNX format to run on CPU and GPU across devices, including server platforms, Windows, Linux and Mac desktops, and mobile CPUs, with the precision best suited to each of these targets.

...

qwen3.5-9b-cuda-gpu

This model is an optimized version of Qwen3.5-9B to enable local inference on CUDA GPUs. This model uses RTN quantization.

Model Description

Developed by: Microsoft
Model type: ONNX
License: apache-2.0
Model Description: This is a conversion of the Qwen3.5-9B for local ...
qwen3.5-9b-generic-cpu

This model is an optimized version of Qwen3.5-9B to enable local inference on CPUs. This model uses RTN quantization.

Model Description

Developed by: Microsoft
Model type: ONNX
License: apache-2.0
Model Description: This is a conversion of the Qwen3.5-9B for local infer...
qwen3.5-9b-generic-gpu

This model is an optimized version of Qwen3.5-9B to enable local inference on GPUs. This model uses RTN quantization.

Model Description

Developed by: Microsoft
Model type: ONNX
License: apache-2.0
Model Description: This is a conversion of the Qwen3.5-9B for local infer...
rai-eval-ui-dag-flow

The Q&A quality and safety evaluation flow will evaluate the Q&A systems by leveraging the state-of-the-art Large Language Models (LLM) to measure the quality and safety of your responses. Utilizing GPT and GPT embedding model to assist with measurements aims to achieve a high agreement with huma...
rai-qna-quality-safety-eval

The Q&A quality and safety evaluation flow will evaluate the Q&A systems by leveraging the state-of-the-art Large Language Models (LLM) to measure the quality and safety of your responses. Utilizing GPT and GPT embedding model to assist with measurements aims to achieve a high agreement with huma...
Relevance-Evaluator

| | | | -- | -- | | Score range | Integer [1-5]: 1 is the lowest quality and 5 is the highest quality. | | What is this metric? | Coherence measures the logical and orderly presentation of ideas in a response, allowing the reader to easily follow and understand the writer's train of thought. A c...
rerank-qna

This "Index Data Rerank Q&A" demonstrates Q&A application, enabled by reranking data from vector index stores and powered by GPT. It utilizes index stores and the rerank tool from Azure Machine Learning to provide grounded answers. You can ask a wide range of questions and receive responses based...
Retrieval-Evaluator

| | | | -- | -- | | Score range | Integer [1-5]: 1 is the lowest quality and 5 is the highest quality. | | What is this metric? | Retrieval measures the quality of search without ground truth. It focuses on how relevant the context chunks (encoded as a string) are to address a query and how the ...
roberta-base

RoBERTa is a transformers model pretrained on a large corpus of English data in a self-supervised fashion. This means it was pretrained on the raw texts only, with no humans labelling them in any way (which is why it can use lots of publicly available data) with an automatic process to generate i...
roberta-base-openai-detector

The RoBERTa base OpenAI Detector functions as a model designed to detect outputs generated by the GPT-2 model. It was created by refining a RoBERTa base model using the outputs of the 1.5B-parameter GPT-2 model. This detector is utilized to determine whether text was generated by a GPT-2 model. O...
roberta-large

RoBERTa is a transformers model pretrained on a large corpus of English data in a self-supervised fashion. This means it was pretrained on the raw texts only, with no humans labelling them in any way (which is why it can use lots of publicly available data) with an automatic process to generate i...
roberta-large-mnli

roberta-large-mnli is the RoBERTa large model fine-tuned on the Multi-Genre Natural Language Inference (MNLI) corpus. The model is a pretrained model on English language text using a masked language modeling ...
roberta-large-openai-detector

RoBERTa large OpenAI Detector is the GPT-2 output detector model, obtained by fine-tuning a RoBERTa large model with the outputs of the 1.5B-parameter GPT-2 model. The model can be used to predict if text was generated by a GPT-2 model. This model was released by OpenAI at the same time as Op...
Rouge-Score-Evaluator

| | | | -- | -- | | Score range | Float [0-1]: higher means better quality. | | What is this metric? | ROUGE (Recall-Oriented Understudy for Gisting Evaluation) is a set of metrics used to evaluate automatic summarization and machine translation. It measures the overlap between generated text and...
runwayml-stable-diffusion-inpainting

Stable Diffusion Inpainting is a latent text-to-image diffusion model capable of generating photo-realistic images given any text input, with the extra capability of inpainting the pictures by using a mask.

The Stable-Diffusion-Inpainting was initialized with the weights of the Stable-Diffus...

runwayml-stable-diffusion-v1-5

Stable Diffusion is a latent text-to-image diffusion model capable of generating photo-realistic images given any text input. The Stable-Diffusion-v1-5 checkpoint was initialized with the weights of the Stable-Diffusion-v1-2 checkpoint and subsequently fine-tuned on 595k steps at resolution 5...
Salesforce-BLIP-2-opt-2-7b-image-to-text

The BLIP-2 model, utilizing OPT-2.7b (a large language model with 2.7 billion parameters), is presented in the paper titled "BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models". T...

Salesforce-BLIP-2-opt-2-7b-vqa

The BLIP-2 model, utilizing OPT-2.7b (a large language model with 2.7 billion parameters), is presented in the paper titled "BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models". Th...
Salesforce-BLIP-image-captioning-base

BLIP (Bootstrapping Language-Image Pre-training) designed for unified vision-language understanding and generation is a new VLP framework that expands the scope of downstream tasks compared to existing methods. The framework encompasses two key contributions from both model and data perspective...
Salesforce-BLIP-vqa-base

BLIP (Bootstrapping Language-Image Pre-training) designed for unified vision-language understanding and generation is a new VLP framework that expands the scope of downstream tasks compared to existing methods. The framework encompasses two key contributions from both model and data perspective...
Self-Harm-Related-Content-Evaluator

Definition

Self-harm-related content includes language pertaining to actions intended to hurt, injure, or damage one's body or kill oneself.

Severity scale

Safety evaluations annotate self-harm-related content using a 0-7 scale.

Very Low (0-1) refers to

Content that contains self-...

Sexual-Content-Evaluator

Definition

Sexual content includes language pertaining to anatomical organs and genitals, romantic relationships, acts portrayed in erotic terms, pregnancy, physical sexual acts (including assault or sexual violence), prostitution, pornography, and sexual abuse.

Severity scale

Safety eva...

Similarity-Evaluator

| | | | -- | -- | | Score range | Integer [1-5]: 1 is the lowest quality and 5 is the highest quality. | | What is this metric? | Similarity measures the degrees of similarity between the generated text and its ground truth with respect to a query. | | How does it work? | The similarity metric i...
smollm3-3b

This model is an optimized version of SmolLM3-3B for local inference. Optimized models are published here in ONNX format to run on CPU and GPU across devices, including server platforms, Windows, Linux and Mac desktops, and mobile CPUs, with the precision best suited to each of these targets.

...

smollm3-3b-cuda-gpu

This model is an optimized version of SmolLM3-3B to enable local inference on CUDA GPUs. This model uses GPTQ quantization.

Model Description

Developed by: Microsoft
Model type: ONNX
License: apache-2.0
Model Description: This is a conversion of the SmolLM3-3B for local...
smollm3-3b-generic-cpu

This model is an optimized version of SmolLM3-3B to enable local inference on CPUs. This model uses GPTQ quantization.

Model Description

Developed by: Microsoft
Model type: ONNX
License: apache-2.0
Model Description: This is a conversion of the SmolLM3-3B for local infe...
smollm3-3b-generic-gpu

This model is an optimized version of SmolLM3-3B to enable local inference on GPUs. This model uses GPTQ quantization.

Model Description

Developed by: Microsoft
Model type: ONNX
License: apache-2.0
Model Description: This is a conversion of the SmolLM3-3B for local infe...
snowflake-arctic-base

Model Overview

Arctic is a dense-MoE Hybrid transformer architecture pre-trained from scratch by the Snowflake AI Research Team. We are releasing model checkpoints for both the base and instruct-tuned versions of Arctic under an Apache-2.0 license. This means you can use them freely in your ow...

snowflake-arctic-instruct

Model Overview

sparse_rcnn_r101_fpn_300_proposals_crop_mstrain_480-800_3x_coco

sparse_rcnn_r101_fpn_300_proposals_crop_mstrain_480-800_3x_coco model is from OpenMMLab's MMDetection library. This model is reported to obtain <a href="https://github.com/open-mmlab/mmdetection/blob/e9cae2d078...
sparse_rcnn_r50_fpn_300_proposals_crop_mstrain_480-800_3x_coco

sparse_rcnn_r50_fpn_300_proposals_crop_mstrain_480-800_3x_coco model is from OpenMMLab's MMDetection library. This model is reported to obtain <a href="https://github.com/open-mmlab/mmdetection/blob/e9cae2d0787...
sshleifer-distilbart-cnn-12-6

The RoBERTa Large model is a large transformer-based language model that was developed by the Hugging Face team. It is pre-trained on masked language modeling and can be used for tasks such as sequence classification, token classification, or question answering. Its primary usage is as a fine-tun...
stabilityai-stable-diffusion-2-1

This stable-diffusion-2-1 model is fine-tuned from stable-diffusion-2 (768-v-ema.ckpt) with an additional 55k steps on the same dataset (with punsafe=0.1), and then fine-tuned for another 155k extra steps with punsafe=0.98.

The mod...

stabilityai-stable-diffusion-2-inpainting

This stable-diffusion-2-inpainting model is resumed from stable-diffusion-2-base (512-base-ema.ckpt) and trained for another 200k steps. Follows the mask-generation strategy presented in LAMA wh...
stabilityai-stable-diffusion-xl-base-1-0

SDXL consists of an ensemble of experts pipeline for latent diffusion: In a first step, the base model (available here: https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0) is used to generate (noisy) latents, wh...
stabilityai-stable-diffusion-xl-refiner-1-0

SDXL consists of an ensemble of experts pipeline for latent diffusion: In a first step, the base model (available here: https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0) is used to generate (noisy) latents, wh...
supply-chain-trade-regulations

Description

The adapted AI model for supply chain trade regulations analysis (preview) is a 3.8B parameter, lightweight, state-of-the-art open model, trained using synthetic supply chain domain-specific datasets, focused on trade regulations.

The model is fine-tuned on the base model...

supply-chain-trade-regulations-v2

The adapted AI model for supply chain trade regulations analysis (preview) is a 14B parameters, lightweight, state-of-the-art open model trained using synthet...

t5-base

The developers of the Text-To-Text Transfer Transformer (T5) write:

With T5, we propose reframing all NLP tasks into a unified text-to-text-format where the input and output are always text strings, in contrast to B...

t5-large

The developers of the Text-To-Text Transfer Transformer (T5) write:

With T5, we propose reframing all NLP tasks into a unified text-to-text-format where the input and output are always text strings, in contrast to B...

t5-small

The developers of the Text-To-Text Transfer Transformer (T5) write:

With T5, we propose reframing all NLP tasks into a unified text-to-text-format where the input and output are always text strings, in contrast to B...

TamGen

The TamGen is a 100 million-parameter model that can generate compounds based on the input protein information. TamGen is pre-trained on 10 million compounds from PubChem and fine-tuned on CrossDocked and PDB datasets. We evaluate TamGen on existing benchmarks and achieve top performance. Further...
template-chat-flow

The "Template Chat Flow" is a chat model using GPT3.5 that generates the next message based on the conversation history and the latest chat content.

Inference samples

Inference type	CLI	VS Code Extension
Real time	<a href="https://microsoft.github.io/promptflow/how-to-guides/dep...

template-eval-flow

The "Template Evaluation Flow" is a evaluate model to measure how well the output matches the expected criteria and goals.

Inference samples

Inference type	CLI	VS Code Extension
Real time	<a href="https://microsoft.github.io/promptflow/how-to-guides/deploy-a-flow/index.html" tar...

template-standard-flow

The "Template Standard Flow" is a model using GPT3.5 to generate a joke based on user input.

Inference samples

Inference type	CLI	VS Code Extension
Real time	deploy-promptflow...

tiiuae-falcon-40b

Description

Falcon-40B is a large language model (LLM) developed by the Technology Innovation Institute (TII) with 40 billion parameters. It is a causal decoder-only model trained on 1 trillion tokens from the RefinedWeb dataset, enhanced with curated corpora. Falcon-40B supports English, Germa...

tiiuae-falcon-40b-instruct

Description

Falcon-40B-Instruct is a large language model with 40 billion parameters, developed by TII. It is a causal decoder-only model fine-tuned on a mixture of Baize data and is released under the Apache 2.0 license. This model is optimized for inference and features FlashAttention and mul...

tiiuae-falcon-7b

Description

Falcon-7B is a large language model with 7 billion parameters. It is a causal decoder-only model developed by TII and trained on 1,500 billion tokens of RefinedWeb dataset, which was enhanced with curated corpora. The model is available under the Apache 2.0 license. It outperforms c...

tiiuae-falcon-7b-instruct

Description

Falcon-7B-Instruct is a large language model with 7 billion parameters, developed by TII. It is a causal decoder-only model and is released under the Apache 2.0 license. This model is optimized for inference and features FlashAttention and multiquery architectures. It is primarily d...

vfnet_r50_fpn_mdconv_c3-c5_mstrain_2x_coco

vfnet_r50_fpn_mdconv_c3-c5_mstrain_2x_coco model is from OpenMMLab's MMDetection library. This model is reported to obtain <a href="https://github.com/open-mmlab/mmdetection/blob/e9cae2d0787cd5c2fc6165a6061f92f...
vfnet_x101_64x4d_fpn_mdconv_c3-c5_mstrain_2x_coco

vfnet_x101_64x4d_fpn_mdconv_c3-c5_mstrain_2x_coco model is from OpenMMLab's MMDetection library. This model is reported to obtain <a href="https://github.com/open-mmlab/mmdetection/blob/e9cae2d0787cd5c2fc6165a6...
Violent-Content-Evaluator

Definition

Violent content includes language pertaining to physical actions intended to hurt, injure, damage, or kill someone or something. It also includes descriptions of weapons and guns (and related entities such as manufacturers and associations).

Severity scale

Safety evaluations ...

Virchow

Virchow is a self-supervised vision transformer pretrained using 1.5M whole slide histopathology images. The model can be used as a tile-level feature extractor (frozen or finetuned) to achieve state-of-the-art results for a wide variety of downstream computational pathology use cases.

Model ...

Virchow2

Virchow2 is a self-supervised vision transformer pretrained using 3.1M whole slide histopathology images. The model can be used as a tile-level feature extractor (frozen or finetuned) to achieve state-of-the-art results for a wide variety of downstream computational pathology use cases.

Mode...

web-classification

The "Web Classification" is a model demonstrating multi-class classification with LLM. Given an url, it will classify the url into one web category with just a few shots, simple summarization and classification prompts.

Inference samples

Inference type	CLI	VS Code Extension
Real...

yolof_r50_c5_8x8_1x_coco

yolof_r50_c5_8x8_1x_coco model is from OpenMMLab's MMDetection library. This model is reported to obtain <a href="https://github.com/open-mmlab/mmdetection/blob/e9cae2d0787cd5c2fc6165a6061f92fa09e48fb1/configs/...

models documentation - Azure/azureml-assets GitHub Wiki

Models

Categories

All models

Description

...

Training Details

Training Data

Overview

Model Description

Model Description

Model Description

Model Description

Model Description

Model Description

Model Description

Model Description

Model Description

Model Description

Model Description

Model Description

Model Description

Model Description

Model Description

Model Description

Model Description

Model Description

Model Description

Model Description

Model Description

Model Description

Model Description

Training Details

Hyperparameters

Evaluation Res...

Training Details

Hype...

Definition

Vision Transformer (base-sized model) trained using DINOv2

Model Description

Model Description

Training Details

Training Details

Definition

Definition

LLaVA-Med v1.5, using mistralai/Mistral-7B-Instruct-v0.2 as LLM for a better commercial license

Microsoft Phi-1.5

Microsoft Phi-2

Model Description

Model card for RAD-DINO

Model description

ONNX Models

ONNX Models

ONNX Models

Model Description

Model Description

Model Description

Evaluation Results

Model Description

Model Description

Model Description

Model Description

Model Description

Model Description

Model Description

Model Details

Model Details

Inference samples

Evaluation Results

Model Details

Model Description

Model Description

Model Description

Model Description

Model Description

Model Description

Model Description

Model Description

Model Description

Model Description