Retrieval Augmented Generation

Components in this category

llm_ingest_dataset_to_acs_basic

Single job pipeline to chunk data from AzureML data asset, and create ACS embeddings index
llm_ingest_dataset_to_acs_user_id

Single job pipeline to chunk data from AzureML data asset, and create ACS embeddings index
llm_ingest_dataset_to_faiss_basic

Single job pipeline to chunk data from AzureML data asset, and create FAISS embeddings index
llm_ingest_dataset_to_faiss_user_id

Single job pipeline to chunk data from AzureML data asset, and create FAISS embeddings index
llm_rag_crack_and_chunk

Creates chunks no larger than chunk_size from input_data, extracted document titles are prepended to each chunk

LLM models have token limits for the prompts passed to them, this is a limiting factor at embedding time and even more limiting at prompt completion time as only so much context ca...

llm_rag_crack_and_chunk_and_embed

Creates chunks no larger than chunk_size from input_data, extracted document titles are prepended to each chunk

LLM models have token limits for the prompts passed to them, this is a limiting factor at embedding time and even more limiting at prompt completion time as only so much context ca...

llm_rag_crack_chunk_embed_index_and_register

Creates chunks no larger than chunk_size from input_data, extracted document titles are prepended to each chunk\n\n

LLM models have token limits for the prompts passed to them, this is a limiting factor at embedding time and even more limiting at prompt completion time as only so much contex...

llm_rag_crawl_url

Crawls the given URL and nested links to max_crawl_depth. Data is stored to output_path.
llm_rag_create_faiss_index

Creates a FAISS index from embeddings. The index will be saved to the output folder. The index will be registered as a Data Asset named asset_name if register_output is set to True.
llm_rag_create_promptflow

This component is used to create a RAG flow based on your mlindex data and best prompts. The flow will look into your indexed data and give answers based on your own data context. The flow also provides the capability to bulk test with any built-in or custom evaluation flows.
llm_rag_data_import_acs

Collects documents from Azure Cognitive Search Index, extracts their contents, saves them to a uri folder, and creates an MLIndex yaml file to represent the search index.

Documents collected can then be used in other components without having to query the ACS index again, allowing for a consiste...

llm_rag_generate_embeddings

Generates embeddings vectors for data chunks read from chunks_source.

chunks_source is expected to contain csv files containing two columns:

"Chunk" - Chunk of text to be embedded
"Metadata" - JSON object containing metadata for the chunk

If embeddings_container is supplied, input c...

llm_rag_generate_embeddings_parallel

Generates embeddings vectors for data chunks read from chunks_source.

chunks_source is expected to contain csv files containing two columns:

"Chunk" - Chunk of text to be embedded
"Metadata" - JSON object containing metadata for the chunk

If previous_embeddings is supplied, input ch...

llm_rag_git_clone

Clones a git repository to output_data path
llm_rag_image_embed_index

Embeds input images and stores it in Azure Cognitive Search index with metadata using Florence embedding resource. MLIndex is stored to output_path.
llm_rag_qa_data_generation

Generates a test dataset of questions and answers based on the input documents.

A chunk of text is read from each input document and sent to the specified LLM with a prompt to create a question and answer based on that text. These question, answer, and context sets are saved as either a csv or j...

llm_rag_register_mlindex_asset

Registers a MLIndex yaml and supporting files as an AzureML data asset
llm_rag_register_qa_data_asset

Registers a QA data csv or json and supporting files as an AzureML data asset
llm_rag_update_acs_index

Uploads embeddings into Azure Cognitive Search instance specified in acs_config. The Index will be created if it doesn't exist.

The Index will have the following fields populated:

"id", String, key=True
"content", String
"contentVector", Collection(Single)
"category", String
"url",...
llm_rag_update_cosmos_mongo_vcore_index

Uploads embeddings into Azure Cosmos Mongo vCore collection/index specified in azure_cosmos_mongo_vcore_config. The collection/index will be created if it doesn't exist.

The collection/index will have the following fields populated:

"_id", String, key=True
"content", String
"contentVec...
llm_rag_update_milvus_index

Uploads embeddings into Milvus collection/index specified in milvus_config. The collection/index will be created if it doesn't exist.

The collection/index will have the following fields populated:

"id", String, key=True
"content", String
"contentVector", Collection(Single)
"url", Str...
llm_rag_update_pinecone_index

Uploads embeddings into Pinecone index specified in pinecone_config. The Index will be created if it doesn't exist.

Each record in the Index will have the following metadata populated:

"id", String
"content", String
"url", String
"filepath", String
"title", String
"metadata_json_...
llm_rag_validate_deployments

Validates that completion model, embedding model, and Azure Cognitive Search resource deployments is successful and connections works. For default AOAI, it attempts to create the deployments if not valid or present. This validation is done only if customer is using Azure Open AI models or creatin...

components Retrieval Augmented Generation documentation - Azure/azureml-assets GitHub Wiki

Retrieval Augmented Generation

Components in this category

⚠️ GitHub.com Fallback ⚠️

components Retrieval Augmented Generation documentation - Azure/azureml-assets GitHub Wiki

Retrieval Augmented Generation

Components in this category

⚠️ **GitHub.com Fallback** ⚠️

⚠️ GitHub.com Fallback ⚠️