components Retrieval Augmented Generation documentation - Azure/azureml-assets GitHub Wiki
-
llm_dbcopilot_create_promptflow
-
-
-
llm_dbcopilot_grounding_ground_samples
-
llm_ingest_dataset_to_acs_basic
Single job pipeline to chunk data from AzureML data asset, and create ACS embeddings index
-
llm_ingest_dataset_to_acs_user_id
Single job pipeline to chunk data from AzureML data asset, and create ACS embeddings index
-
llm_ingest_dataset_to_faiss_basic
Single job pipeline to chunk data from AzureML data asset, and create FAISS embeddings index
-
llm_ingest_dataset_to_faiss_user_id
Single job pipeline to chunk data from AzureML data asset, and create FAISS embeddings index
-
Single job pipeline to chunk data from AzureML sql data store, and create ACS embeddings index
-
Single job pipeline to chunk data from AzureML sql data store, and create FAISS embeddings index
-
Single job pipeline to chunk data from AzureML DB Datastore and create acs embeddings index
-
llm_ingest_dbcopilot_faiss_e2e
Single job pipeline to chunk data from AzureML DB Datastore and create faiss embeddings index
-
Creates chunks no larger than
chunk_size
frominput_data
, extracted document titles are prepended to each chunk
LLM models have token limits for the prompts passed to them, this is a limiting factor at embedding time and even more limiting at prompt completion time as only so much context ca...
-
llm_rag_crack_and_chunk_and_embed
Creates chunks no larger than
chunk_size
frominput_data
, extracted document titles are prepended to each chunk
LLM models have token limits for the prompts passed to them, this is a limiting factor at embedding time and even more limiting at prompt completion time as only so much context ca...
-
llm_rag_crack_chunk_embed_index_and_register
Creates chunks no larger than
chunk_size
frominput_data
, extracted document titles are prepended to each chunk\n\n
LLM models have token limits for the prompts passed to them, this is a limiting factor at embedding time and even more limiting at prompt completion time as only so much contex...
-
Crawls the given URL and nested links to
max_crawl_depth
. Data is stored tooutput_path
. -
Creates a FAISS index from embeddings. The index will be saved to the output folder. The index will be registered as a Data Asset named
asset_name
ifregister_output
is set toTrue
. -
This component is used to create a RAG flow based on your mlindex data and best prompts. The flow will look into your indexed data and give answers based on your own data context. The flow also provides the capability to bulk test with any built-in or custom evaluation flows.
-
Collects documents from Azure Cognitive Search Index, extracts their contents, saves them to a uri folder, and creates an MLIndex yaml file to represent the search index.
Documents collected can then be used in other components without having to query the ACS index again, allowing for a consiste...
-
Generates embeddings vectors for data chunks read from
chunks_source
.
chunks_source
is expected to contain csv
files containing two columns:
- "Chunk" - Chunk of text to be embedded
- "Metadata" - JSON object containing metadata for the chunk
If embeddings_container
is supplied, input c...
-
llm_rag_generate_embeddings_parallel
Generates embeddings vectors for data chunks read from
chunks_source
.
chunks_source
is expected to contain csv
files containing two columns:
- "Chunk" - Chunk of text to be embedded
- "Metadata" - JSON object containing metadata for the chunk
If previous_embeddings
is supplied, input ch...
-
Clones a git repository to output_data path
-
Embeds input images and stores it in Azure Cognitive Search index with metadata using Florence embedding resource. MLIndex is stored to
output_path
. -
Generates a test dataset of questions and answers based on the input documents.
A chunk of text is read from each input document and sent to the specified LLM with a prompt to create a question and answer based on that text. These question, answer, and context sets are saved as either a csv or j...
-
llm_rag_register_mlindex_asset
Registers a MLIndex yaml and supporting files as an AzureML data asset
-
llm_rag_register_qa_data_asset
Registers a QA data csv or json and supporting files as an AzureML data asset
-
Uploads
embeddings
into Azure Cognitive Search instance specified inacs_config
. The Index will be created if it doesn't exist.
The Index will have the following fields populated:
-
"id", String, key=True
-
"content", String
-
"contentVector", Collection(Single)
-
"category", String
-
"url",...
-
llm_rag_update_cosmos_mongo_vcore_index
Uploads
embeddings
into Azure Cosmos Mongo vCore collection/index specified inazure_cosmos_mongo_vcore_config
. The collection/index will be created if it doesn't exist.
The collection/index will have the following fields populated:
-
"_id", String, key=True
-
"content", String
-
"contentVec...
-
Uploads
embeddings
into Milvus collection/index specified inmilvus_config
. The collection/index will be created if it doesn't exist.
The collection/index will have the following fields populated:
-
"id", String, key=True
-
"content", String
-
"contentVector", Collection(Single)
-
"url", Str...
-
Uploads
embeddings
into Pinecone index specified inpinecone_config
. The Index will be created if it doesn't exist.
Each record in the Index will have the following metadata populated:
-
"id", String
-
"content", String
-
"url", String
-
"filepath", String
-
"title", String
-
"metadata_json_...
-
Validates that completion model, embedding model, and Azure Cognitive Search resource deployments is successful and connections works. For default AOAI, it attempts to create the deployments if not valid or present. This validation is done only if customer is using Azure Open AI models or creatin...