components llm_rag_generate_embeddings - Azure/azureml-assets GitHub Wiki

LLM - Generate Embeddings

llm_rag_generate_embeddings

Overview

Generates embeddings vectors for data chunks read from chunks_source.

chunks_source is expected to contain csv files containing two columns:

  • "Chunk" - Chunk of text to be embedded
  • "Metadata" - JSON object containing metadata for the chunk

If embeddings_container is supplied, input chunks are compared to existing chunks in the Embeddings Container and only changed/new chunks are embedded, existing chunks being reused.

Version: 0.0.67

Tags

Preview

View in Studio: https://ml.azure.com/registries/azureml/components/llm_rag_generate_embeddings/version/0.0.67

Inputs

Name Description Type Default Optional Enum
chunks_source Folder containing chunks to be embedded. uri_folder

If adding to previously generated Embeddings

Name Description Type Default Optional Enum
embeddings_container Folder containing previously generated embeddings. Should be parent folder of the 'embeddings' output path used for for this component. Will compare input data to existing embeddings and only embed changed/new data, reusing existing chunks. uri_folder True

Embeddings settings

Name Description Type Default Optional Enum
embeddings_model The model to use to embed data. E.g. 'hugging_face://model/sentence-transformers/all-mpnet-base-v2' or 'azure_open_ai://deployment/{deployment_name}/model/{model_name}' string True
batch_size Batch size to use when embedding data integer 100
num_workers Number of workers to use when embedding data. -1 means use half all available CPUs integer -1
deployment_validation Uri file containing information on if the Azure OpenAI deployments, if used, have been validated uri_file True

Outputs

Name Description Type
embeddings Where to save data with embeddings. This should be a subfolder of previous embeddings if supplied, typically named using '${name}'. e.g. /my/prev/embeddings/${name} uri_folder

Environment

azureml:llm-rag-embeddings@latest

⚠️ **GitHub.com Fallback** ⚠️