components documentation - Azure/azureml-assets GitHub Wiki
-
action_analyzer_correlation_test
Perform correlation test on different groups to generate actions.
-
action_analyzer_identify_problem_traffic
Separate bad queries into different groups.
-
action_analyzer_metrics_calculation
Calculate futher metrics for generating actions.
-
action_analyzer_output_actions
Merge and output actions.
-
Pipeline component for proxy fine-tuning with AOAI
-
Upload data to Azure OpenAI resource, finetune model and delete data
-
Inference component for AutoML Forecasting.
-
automl_hts_automl_training_step
-
automl_hts_data_aggregation_step
-
Enables inference for hts components.
-
automl_hts_inference_collect_step
-
automl_hts_inference_setup_step
-
-
Enables AutoML Training for hts components.
-
automl_hts_training_collect_step
-
automl_hts_training_setup_step
-
Inference components for AutoML many model.
-
automl_many_models_inference_collect_step
-
automl_many_models_inference_setup_step
-
automl_many_models_inference_step
-
Enables AutoML many models training.
-
automl_many_models_training_collection_step
-
automl_many_models_training_setup_step
-
automl_many_models_training_step
-
automl_tabular_data_partitioning
Enables dataset partitioning for AutoML many models and hierarchical timeseries solution accelerators using spark.
-
batch_benchmark_config_generator
Generates the config for the batch score component.
-
-
Component for benchmarking an embedding model via MTEB.
-
chat_completion_datapreprocess
Component to preprocess data for chat completion task. See docs to learn more.
-
Component to finetune Hugging Face pretrained models for chat completion task. The component supports optimizations such as LoRA, Deepspeed and ONNXRuntime for performance enhancement. See docs to learn more.
-
Component to import PyTorch / MLFlow model. See docs to learn more.
-
Pipeline Component to finetune Hugging Face pretrained models for chat completion task. The component supports optimizations such as LoRA, Deepspeed and ONNXRuntime for performance enhancement. See docs to learn more.
-
Component converts models from supported frameworks to MLflow model packaging format
-
Delete data file from Azure OpenAI resource
-
Compute data drift metrics given a baseline and a deployment's model data input.
-
Computes the data drift between a baseline and production data assets.
-
Compute data quality metrics leveraged by the data quality monitor.
-
Compute data statistics leveraged by the data quality monitor.
-
Join baseline and target data quality metrics into a single output.
-
Computes the data quality of a target dataset with reference to a baseline.
-
Component to upload user's data from AzureML workspace to Azure OpenAI resource
-
feature_attribution_drift_compute_metrics
Feature attribution drift using model monitoring.
-
feature_attribution_drift_signal_monitor
Computes the feature attribution between a baseline and production data assets.
-
Feature importance for model monitoring.
-
Retrieval component to be used to retrieve offline features from feature store.
-
Component to submit FT job to Azure OpenAI resource
-
Component to validate the finetune job against Validation Service
-
Component to convert the finetune job output to pytorch and mlflow model
-
Filters the raw span log based on the window provided, and aggregates it to trace level.
-
genai_token_statistics_compute_metrics
Compute token statistics metrics.
-
genai_token_statistics_signal_monitor
Computes the token and cost metrics over LLM outputs.
-
generation_safety_quality_signal_monitor
Computes the content generation safety metrics over LLM outputs.
-
gsq_annotation_compute_histogram
Compute annotation histogram given a deployment's model data input.
-
gsq_annotation_compute_metrics
Compute annotation metrics given a deployment's model data input.
-
Adapt data to fit into GSQ component.
-
Import a model into a workspace or a registry
-
llm_ingest_dataset_to_acs_basic
Single job pipeline to chunk data from AzureML data asset, and create ACS embeddings index
-
llm_ingest_dataset_to_acs_user_id
Single job pipeline to chunk data from AzureML data asset, and create ACS embeddings index
-
llm_ingest_dataset_to_faiss_basic
Single job pipeline to chunk data from AzureML data asset, and create FAISS embeddings index
-
llm_ingest_dataset_to_faiss_user_id
Single job pipeline to chunk data from AzureML data asset, and create FAISS embeddings index
-
Creates chunks no larger than
chunk_size
frominput_data
, extracted document titles are prepended to each chunk
LLM models have token limits for the prompts passed to them, this is a limiting factor at embedding time and even more limiting at prompt completion time as only so much context ca...
-
llm_rag_crack_and_chunk_and_embed
Creates chunks no larger than
chunk_size
frominput_data
, extracted document titles are prepended to each chunk
LLM models have token limits for the prompts passed to them, this is a limiting factor at embedding time and even more limiting at prompt completion time as only so much context ca...
-
llm_rag_crack_chunk_embed_index_and_register
Creates chunks no larger than
chunk_size
frominput_data
, extracted document titles are prepended to each chunk\n\n
LLM models have token limits for the prompts passed to them, this is a limiting factor at embedding time and even more limiting at prompt completion time as only so much contex...
-
Crawls the given URL and nested links to
max_crawl_depth
. Data is stored tooutput_path
. -
Creates a FAISS index from embeddings. The index will be saved to the output folder. The index will be registered as a Data Asset named
asset_name
ifregister_output
is set toTrue
. -
This component is used to create a RAG flow based on your mlindex data and best prompts. The flow will look into your indexed data and give answers based on your own data context. The flow also provides the capability to bulk test with any built-in or custom evaluation flows.
-
Collects documents from Azure Cognitive Search Index, extracts their contents, saves them to a uri folder, and creates an MLIndex yaml file to represent the search index.
Documents collected can then be used in other components without having to query the ACS index again, allowing for a consiste...
-
Generates embeddings vectors for data chunks read from
chunks_source
.
chunks_source
is expected to contain csv
files containing two columns:
- "Chunk" - Chunk of text to be embedded
- "Metadata" - JSON object containing metadata for the chunk
If embeddings_container
is supplied, input c...
-
llm_rag_generate_embeddings_parallel
Generates embeddings vectors for data chunks read from
chunks_source
.
chunks_source
is expected to contain csv
files containing two columns:
- "Chunk" - Chunk of text to be embedded
- "Metadata" - JSON object containing metadata for the chunk
If previous_embeddings
is supplied, input ch...
-
Clones a git repository to output_data path
-
Embeds input images and stores it in Azure Cognitive Search index with metadata using Florence embedding resource. MLIndex is stored to
output_path
. -
Generates a test dataset of questions and answers based on the input documents.
A chunk of text is read from each input document and sent to the specified LLM with a prompt to create a question and answer based on that text. These question, answer, and context sets are saved as either a csv or j...
-
llm_rag_register_mlindex_asset
Registers a MLIndex yaml and supporting files as an AzureML data asset
-
llm_rag_register_qa_data_asset
Registers a QA data csv or json and supporting files as an AzureML data asset
-
Uploads
embeddings
into Azure Cognitive Search instance specified inacs_config
. The Index will be created if it doesn't exist.
The Index will have the following fields populated:
-
"id", String, key=True
-
"content", String
-
"contentVector", Collection(Single)
-
"category", String
-
"url",...
-
llm_rag_update_cosmos_mongo_vcore_index
Uploads
embeddings
into Azure Cosmos Mongo vCore collection/index specified inazure_cosmos_mongo_vcore_config
. The collection/index will be created if it doesn't exist.
The collection/index will have the following fields populated:
-
"_id", String, key=True
-
"content", String
-
"contentVec...
-
Uploads
embeddings
into Milvus collection/index specified inmilvus_config
. The collection/index will be created if it doesn't exist.
The collection/index will have the following fields populated:
-
"id", String, key=True
-
"content", String
-
"contentVector", Collection(Single)
-
"url", Str...
-
Uploads
embeddings
into Pinecone index specified inpinecone_config
. The Index will be created if it doesn't exist.
Each record in the Index will have the following metadata populated:
-
"id", String
-
"content", String
-
"url", String
-
"filepath", String
-
"title", String
-
"metadata_json_...
-
Validates that completion model, embedding model, and Azure Cognitive Search resource deployments is successful and connections works. For default AOAI, it attempts to create the deployments if not valid or present. This validation is done only if customer is using Azure Open AI models or creatin...
-
medical_image_embedding_datapreprocessing
To generate embeddings for medical images.
-
medimage_embedding_adapter_merge
Integrate labels and generates classification model
-
Pipeline Component to finetune MedImageInsight Model.
-
Component to finetune the model using the medical image data
-
Component to finetune the model using the medical image data
-
mmdetection_image_objectdetection_instancesegmentation_pipeline
Pipeline component for image object detection and instance segmentation using MMDetection models.
-
model_data_collector_preprocessor
Filters the data based on the window provided.
-
Generate and output actions to the default datastore.
-
Generate and output actions
-
model_monitor_azmon_metric_publisher
Azure Monitor Publisher for the computed model monitor metrics.
-
model_monitor_compute_histogram
Compute a histogram given an input data and associated histogram buckets.
-
model_monitor_compute_histogram_buckets
Compute histogram buckets given up to two datasets.
-
Creates the model monitor metric manifest.
-
Joins two data assets on the given columns for model monitor.
-
model_monitor_evaluate_metrics_threshold
Evaluate signal metrics against the threshold provided in the monitoring signal.
-
model_monitor_feature_selector
Selects features to compute signal metrics on.
-
model_monitor_metric_outputter
Output the computed model monitor metrics.
-
Output the computed model monitor metrics to the default datastore.
-
model_performance_compute_metrics
Compute model performance metrics leveraged by the model performance monitor.
-
model_performance_signal_monitor
Computes the model performance
-
Generate predictions on a given mlflow model for supported tasks.
-
model_prediction_with_container
Optimized Distributed inference component for LLMs.
-
-
nlp_multiclass_datapreprocessing
Component to preprocess data for automl nlp multiclass classification task
-
nlp_multilabel_datapreprocessing
Component to preprocess data for automl nlp multilabel classification task
-
nlp_textclassification_multiclass
Pipeline component for AutoML NLP Multiclass Text classification
-
nlp_textclassification_multilabel
Pipeline component for AutoML NLP Multilabel Text classification
-
FTaaS component to finetune model for Chat Completion task
-
FTaaS Pipeline component for chat completion
-
oss_distillation_batchscoring_datagen_pipeline
Component to generate data from teacher model endpoint by invoking it in batch.
-
oss_distillation_data_generation_batch_scoring_selector
Component to select the Batch Scoring Selector based on the task type
-
oss_distillation_data_generation_file_selector
Component to select the Batch Scoring Selector based on the task type
-
oss_distillation_data_generation_validation_file_checker
Component to Check if the validation file is present or not
-
oss_distillation_generate_data
Component to generate data from teacher model enpoint
-
oss_distillation_generate_data_batch_postprocess
Component to prepare data returned from teacher model enpoint in batch
-
oss_distillation_generate_data_batch_preprocess
Component to prepare data to invoke teacher model enpoint in batch
-
oss_distillation_seq_scoring_pipeline
Component to generate data from teacher model enpoint(sequentially) and finetune student model on generated dataset
-
oss_distillation_validate_pipeline
Component to validate inputs to the distillation pipeline
-
oss_text_generation_data_import
FTaaS component to copy user training data to output
-
FTaaS component to finetune model for Text Generation task
-
FTaaS Pipeline component for text generation
-
prediction_drift_signal_monitor
Computes the prediction drift between a baseline and a target data assets.
-
question_answering_datapreprocess
Component to preprocess data for question answering task. See docs to learn more.
-
Component to finetune Hugging Face pretrained models for extractive question answering task. The component supports optimizations such as LoRA, Deepspeed and ONNXRuntime for performance enhancement. See docs to learn more.
-
question_answering_model_import
Component to import PyTorch / MLFlow model. See docs to learn more.
-
Pipeline Component to finetune Hugging Face pretrained models for extractive question answering task. The component supports optimizations such as LoRA, Deepspeed and ONNXRuntime for performance enhancement. See docs to learn more.
-
Add Causal to RAI Insights Dashboard Learn More
-
Add Counterfactuals to RAI Insights Dashboard Learn More
-
Add Error Analysis to RAI Insights Dashboard Learn More
-
Add Explanation to RAI Insights Dashboard Learn More
-
rai_tabular_insight_constructor
RAI Insights Dashboard Constructor Learn More
-
Gather RAI Insights Dashboard Learn More
-
Generate rai insight score card Learn More
-
-
-
Component to preprocess data for summarization task. See docs to learn more.
-
Component to finetune Hugging Face pretrained models for summarization task. The component supports optimizations such as LoRA, Deepspeed and ONNXRuntime for performance enhancement. See docs to learn more.
-
Component to import PyTorch / MLFlow model. See docs to learn more.
-
Pipeline Component to finetune Hugging Face pretrained models for summarization task. The component supports optimizations such as LoRA, Deepspeed and ONNXRuntime for performance enhancement. See docs to learn more.
-
text_classification_datapreprocess
Component to preprocess data for single label classification task. See docs to learn more.
-
Component to finetune Hugging Face pretrained models for text classification task. The component supports optimizations such as LoRA, Deepspeed and ONNXRuntime for performance enhancement. See docs to learn more.
-
text_classification_model_import
Component to import PyTorch / MLFlow model. See docs to learn more.
-
Pipeline component to finetune Hugging Face pretrained models for text classification task. The component supports optimizations such as LoRA, Deepspeed and ONNXRuntime for performance enhancement. See docs to learn more.
-
text_generation_datapreprocess
Component to preprocess data for text generation task
-
Component to finetune model for Text Generation task
-
Import PyTorch / MLFlow model
-
Pipeline component for text generation
-
text_generation_pipeline_singularity_basic_high
Pipeline component for text generation
-
text_generation_pipeline_singularity_basic_low
Pipeline component for text generation
-
text_generation_pipeline_singularity_basic_medium
Pipeline component for text generation
-
text_generation_pipeline_singularity_premium_high
Pipeline component for text generation
-
text_generation_pipeline_singularity_premium_low
Pipeline component for text generation
-
text_generation_pipeline_singularity_premium_medium
Pipeline component for text generation
-
text_generation_pipeline_singularity_standard_high
Pipeline component for text generation
-
text_generation_pipeline_singularity_standard_low
Pipeline component for text generation
-
text_generation_pipeline_singularity_standard_medium
Pipeline component for text generation
-
token_classification_datapreprocess
Component to preprocess data for token classification task. See docs to learn more.
-
Component to finetune Hugging Face pretrained models for token classification task. The component supports optimizations such as LoRA, Deepspeed and ONNXRuntime for performance enhancement. See docs to learn more.
-
token_classification_model_import
Component to import PyTorch / MLFlow model. See docs to learn more.
-
Pipeline component to finetune Hugging Face pretrained models for token classification task. The component supports optimizations such as LoRA, Deepspeed and ONNXRuntime for performance enhancement. See docs to learn more.
-
token_statistics_compute_metrics
Compute token statistics metrics.
-
transformers_image_classification_pipeline
Pipeline component for image classification using HuggingFace transformers models.
-
Component to preprocess data for translation task. See docs to learn more.
-
Component to finetune Hugging Face pretrained models for translation task. The component supports optimizations such as LoRA, Deepspeed and ONNXRuntime for performance enhancement. See docs to learn more.
-
Component to import PyTorch / MLFlow model. See docs to learn more.
-
Pipeline component to finetune Hugging Face pretrained models for translation task. The component supports optimizations such as LoRA, Deepspeed and ONNXRuntime for performance enhancement. See docs to learn more.
-
Component for enabling validation of import pipeline.