components batch_benchmark_inference_claude - Azure/azureml-assets GitHub Wiki

Batch Benchmark Inference with claude support

batch_benchmark_inference_claude

Overview

Components for batch endpoint inference

Version: 0.0.2

View in Studio: https://ml.azure.com/registries/azureml/components/batch_benchmark_inference_claude/version/0.0.2

Inputs

Name Description Type Default Optional Enum
input_dataset Input jsonl dataset that contains prompt. For the performance test, this one will be neglected. uri_folder True
model_type Type of model. Can be one of ('aoai', 'oss', 'vision_oss', 'claude') string True
batch_input_pattern The string for the batch input pattern. The input should be the payload format with substitution for the key for the value put in the ###<key>. For example, one can use the following format for a llama text-gen model with a input dataset has prompt for the payload and _batch_request_metadata storing the corresponding ground truth. { "input_data": { "input_string": ["###"], "parameters": { "temperature": 0.6, "max_new_tokens": 100, "do_sample": true } }, "_batch_request_metadata": ###<_batch_request_metadata> } For AOAI model, the following pattern can be used, { "messages": [ {"role": "user", "content": "###" } ], "temperature": 0.7, "top_p": 0.95, "frequency_penalty": 0, "presence_penalty": 0, "max_tokens": 800, "stop": null } For Vision OSS, the input should be as follows { "image": "image1", "text": "label1, label2, label3" } For Claude model, another pattern should be used { "prompt": "Prompt text \n\nHuman:\n### Question: Question text\n###Answer:\n\nAssistant:", "prompt_length": 775, "completion": "The correct answer" } string False
endpoint_url The endpoint url. string False
is_performance_test If true, the performance test will be run and the input dataset will be neglected. boolean False
deployment_name The deployment name. Only needed for managed OSS deployment. string True
connections_name Connections name for the endpoint. string False
label_column_name The label column name. string True
n_samples The number of top samples send to endpoint. When performance test is enabled, this will be the number of repeated samples send to the endpoint. integer True
handle_response_failure The way that the formatter handles the failed response. string use_fallback False ['use_fallback', 'neglect']
fallback_value The fallback value that can be used when request payload failed. If not provided, the fallback value will be an empty string. string True
additional_headers A stringified json expressing additional headers to be added to each request. string True
ensure_ascii If ensure_ascii is true, the output is guaranteed to have all incoming non-ASCII characters escaped. If ensure_ascii is false, these characters will be output as-is. More detailed information can be found at https://docs.python.org/3/library/json.html boolean False False
max_retry_time_interval The maximum time (in seconds) spent retrying a payload. If unspecified, payloads are retried unlimited times. integer True
mini_batch_size The mini batch size for parallel run. string 100KB True
initial_worker_count The initial number of workers to use for scoring. integer 5 False
max_worker_count Overrides initial_worker_count if necessary integer 200 False
instance_count Number of nodes in a compute cluster we will run the train step on. integer 1
max_concurrency_per_instance Number of processes that will be run concurrently on any given node. This number should not be larger than 1/2 of the number of cores in an individual node in the specified cluster. integer 1
debug_mode Enable debug mode will print all the debug logs in the score step. boolean False False

Outputs

Name Description Type
predictions The prediction data. uri_file
performance_metadata The performance data. uri_file
ground_truth The ground truth data that has a one-to-one mapping with the prediction data. uri_file
⚠️ **GitHub.com Fallback** ⚠️