components llm_ingest_db_to_faiss - Azure/azureml-assets GitHub Wiki

LLM - SQL Datastore to FAISS Pipeline

llm_ingest_db_to_faiss

Overview

Single job pipeline to chunk data from AzureML sql data store, and create FAISS embeddings index

Version: 0.0.97

Tags

Preview

View in Studio: https://ml.azure.com/registries/azureml/components/llm_ingest_db_to_faiss/version/0.0.97

Inputs

Name Description Type Default Optional Enum
db_datastore database datastore uri in the format of 'azureml://datastores/{datastore_name}' string
embeddings_model The model used to generate embeddings. 'azure_open_ai://endpoint/{endpoint_name}/deployment/{deployment_name}/model/{model_name}' string
chat_aoai_deployment_name The name of the chat AOAI deployment string True
embedding_aoai_deployment_name The name of the embedding AOAI deployment string
embeddings_dataset_name The name of the faiss index string
max_tables integer True
max_columns integer True
max_rows integer True
max_sampling_rows integer True
max_text_length integer True
max_knowledge_pieces integer True
selected_tables string True
column_settings string True
llm_config The name of the llm config string True
serverless_instance_count integer 1 True
serverless_instance_type string Standard_DS3_v2 True
embedding_connection Azure OpenAI workspace connection ARM ID for embeddings string True
llm_connection Azure OpenAI workspace connection ARM ID for LLM string True
runtime The name of the runtime string False
sample_data Sample data to be used for data ingestion. format: 'azureml:samples-test:1' uri_folder True

path: "azureml:samples-test:1" data ingest setting

Name Description Type Default Optional Enum
include_builtin_examples boolean True True
tools The name of the tools for dbcopilot. Supported tools: "tsql", "python". Format: ["tsql", "python"] string True
knowledge_pieces The list of knowledge pieces to be used for grounding. string True
include_views Whether to turn on views. boolean True
instruct_template The instruct template for the LLM. string True
managed_identity_enabled Whether to connect using managed identity. boolean False True

Outputs

Name Description Type
grounding_index uri_folder
db_context uri_folder
⚠️ **GitHub.com Fallback** ⚠️