LLM - Generate Embeddings

llm_rag_generate_embeddings

Overview

Generates embeddings vectors for data chunks read from chunks_source.

chunks_source is expected to contain csv files containing two columns:

"Chunk" - Chunk of text to be embedded
"Metadata" - JSON object containing metadata for the chunk

If embeddings_container is supplied, input chunks are compared to existing chunks in the Embeddings Container and only changed/new chunks are embedded, existing chunks being reused.

Version: 0.0.78

Inputs

Name	Description	Type	Default	Optional	Enum
chunks_source	Folder containing chunks to be embedded.	uri_folder

If adding to previously generated Embeddings

Name	Description	Type	Default	Optional	Enum
embeddings_container	Folder containing previously generated embeddings. Should be parent folder of the 'embeddings' output path used for for this component. Will compare input data to existing embeddings and only embed changed/new data, reusing existing chunks.	uri_folder		True

Embeddings settings

Name	Description	Type	Default	Optional
embeddings_model	The model to use to embed data. E.g. 'hugging_face://model/sentence-transformers/all-mpnet-base-v2' or 'azure_open_ai://deployment/{deployment_name}/model/{model_name}'	string		True
batch_size	Batch size to use when embedding data	integer	100
num_workers	Number of workers to use when embedding data. -1 means use half all available CPUs	integer	-1
deployment_validation	Uri file containing information on if the Azure OpenAI deployments, if used, have been validated	uri_file		True

Outputs

Name	Description	Type
embeddings	Where to save data with embeddings. This should be a subfolder of previous embeddings if supplied, typically named using '${name}'. e.g. /my/prev/embeddings/${name}	uri_folder

Environment

azureml:llm-rag-embeddings:76

components llm_rag_generate_embeddings - Azure/azureml-assets GitHub Wiki

LLM - Generate Embeddings

llm_rag_generate_embeddings

Overview

Tags

Inputs

Outputs

Environment

⚠️ GitHub.com Fallback ⚠️

components llm_rag_generate_embeddings - Azure/azureml-assets GitHub Wiki

LLM - Generate Embeddings

llm_rag_generate_embeddings

Overview

Tags

Inputs

Outputs

Environment

⚠️ **GitHub.com Fallback** ⚠️

⚠️ GitHub.com Fallback ⚠️