LiteLLM:option - chunhualiao/public-docs GitHub Wiki
When using LiteLLM, you can configure hyperparameters at both the provider and model-specific levels, and for general fine-tuning. Available hyperparameters include standard API parameters like and , provider-specific settings (e.g., ), and fine-tuning parameters such as and . [1, 2, 3, 4]
Common configuration parameters
• Standard API parameters: These are parameters that influence the model's response directly.
• temperature : Controls randomness in the output.
• max_tokens : Limits the length of the generated response.
• stream : Enables or disables streaming responses.
• Model-specific parameters: These are often defined when adding a new model to the and can include provider-specific settings.
• : The user-facing alias for the model.
• : A dictionary for LiteLLM-specific configurations, such as for SageMaker models.
• : A provider-specific parameter, as shown in [LiteLLM's documentation](https://docs.litellm.ai/docs/completion/provider_specific_params) (https://docs.litellm.ai/docs/completion/provider_specific_params).
• Fine-tuning parameters: These are used for training a model and are specified separately, as detailed in LiteLLM's fine-tuning documentation (https://docs.litellm.ai/docs/fine_tuning).
• : The number of examples in each training batch.
• : The number of full training cycles.
• : A scaling factor for the learning rate. [[1](https://docs.litellm.ai/docs/completion/input), [2](https://docs.litellm.ai/docs/fine_tuning), [3](https://docs.litellm.ai/docs/completion/provider_specific_params), [4](https://docs.litellm.ai/docs/providers/hyperbolic), [5](https://docs.litellm.ai/docs/proxy/model_management#:~:text=The%20litellm_params%20parameter%20is%20a%20dictionary%20that,to%20provide%20additional%20information%20about%20the%20model)]
Where to configure
• Input/API calls: You can pass standard parameters directly in your API call, such as or . • configuration: When setting up models, you can add provider-specific parameters and model-specific configurations within the in your file. • Fine-tuning: For fine-tuning jobs, hyperparameters are passed as part of the request body for the specific fine-tuning job. [2, 3, 4, 5, [6](https://amaiya.github.io/onprem/#:~:text=Other%20parameters%20(e.g.%2C%20api_key%20%2C%20device_map%20%2C,by%20supplying%20it%20to%20LLM.%20prompt%20.)]
AI responses may include mistakes.
[1] https://docs.litellm.ai/docs/completion/input [2] https://docs.litellm.ai/docs/fine_tuning [3] https://docs.litellm.ai/docs/completion/provider_specific_params [4] https://docs.litellm.ai/docs/providers/hyperbolic [5] https://docs.litellm.ai/docs/proxy/model_management [6] [https://amaiya.github.io/onprem/](https://amaiya.github.io/onprem/#:~:text=Other%20parameters%20(e.g.%2C%20api_key%20%2C%20device_map%20%2C,by%20supplying%20it%20to%20LLM.%20prompt%20.)
LiteLLM provides an OpenAI-compatible API server (often called the LiteLLM Proxy) that acts as a unified gateway to over 100 LLM providers (OpenAI, Azure, Anthropic, Cohere, Ollama, Hugging Face, Bedrock, Vertex AI, etc.). The exposed hyperparameters/options fall into two categories:
-
Per-request options — Passed in the
/chat/completions,/completions, or/embeddingsrequest body (exactly like the OpenAI API). -
Per-model defaults — Set in the proxy's
config.yamlunderlitellm_paramsfor each model (or globally via provider configs).
LiteLLM accepts all standard OpenAI Chat Completion parameters and intelligently maps or passes them to the underlying provider. If a parameter is not supported by the target model/provider, it can be automatically dropped (enable drop_params: true in config or per-call).
| Parameter | Type | Description | Supported by All Models? |
|---|---|---|---|
temperature |
float (0-2) | Controls randomness (higher = more creative) | Yes (most providers) |
top_p |
float (0-1) | Nucleus sampling | Yes |
top_k |
int | Top-K sampling (used by Anthropic, Cohere, Ollama, etc.) | Provider-specific |
max_tokens / max_completion_tokens
|
int | Maximum tokens to generate | Yes |
stop |
str/array | Stop sequences | Yes |
presence_penalty |
float (-2 to 2) | Penalize new tokens that already appeared | Mostly OpenAI-like |
frequency_penalty |
float (-2 to 2) | Penalize tokens based on frequency in text so far | Mostly OpenAI-like |
logit_bias |
dict | Modify likelihood of specific tokens | OpenAI, Azure, some others |
seed |
int | For reproducible outputs | OpenAI, Azure, Cohere |
tools / functions
|
array | Function calling / tool use | Varies widely |
tool_choice / function_call
|
str/object | Force tool use | Varies |
response_format |
object | e.g., { "type": "json_object" }
|
OpenAI, Gemini, some others |
n |
int | Number of completions | Mostly OpenAI-like |
stream |
bool | Streaming response | Yes |
extra_headers |
dict | Custom headers sent to provider | Yes |
metadata / user
|
str/dict | Tracking / end-user identifier | Yes (LiteLLM internal) |
You can also pass any provider-specific parameter directly in the request body — LiteLLM will forward it unchanged to the target provider (e.g., top_k for Claude, num_predict for Ollama, keep_alive for Ollama, repetition_penalty for many open-source models).
Example request (curl):
curl http://localhost:4000/v1/chat/completions \
-H "Authorization: Bearer sk-1234" \
-d '{
"model": "ollama/llama3.1",
"messages": [{"role": "user", "content": "Hello"}],
"temperature": 0.1,
"num_predict": 512,
"repeat_penalty": 1.15,
"top_k": 40
}'In the proxy config you can set defaults under litellm_params for each model entry:
model_list:
- model_name: my-llama3
litellm_params:
model: ollama/llama3.1
api_base: http://host.docker.internal:11434
temperature: 0.7
max_tokens: 8192
top_p: 0.95
repeat_penalty: 1.1
num_predict: -1
keep_alive: "30m" # Ollama-specificThese act as defaults but can be overridden per-request.
Yes, very much so:
- LiteLLM tries to accept everything and map it where possible (e.g.,
max_tokens→max_new_tokensfor Hugging Face,num_predictfor Ollama). - If a parameter has no equivalent, it is either ignored or (with
drop_params: true) stripped before sending to the provider. - Some parameters only work on certain backends (e.g.,
logit_biasonly on OpenAI/Azure,mirostatonly on llama.cpp-based servers like Ollama/vLLM). - You can see exactly which parameters are supported for a given model on your proxy by calling
/v1/model/infoor using the CLIlitellm --model <model> --debug.
In short: LiteLLM gives you the full OpenAI parameter set + any extra provider-specific params, and intelligently routes/drops them based on the selected model. This makes it extremely flexible compared to running Ollama directly. For the absolute latest list, check the official docs at https://docs.litellm.ai/docs/completion/input and the provider-specific pages.