LiteLLM:option - chunhualiao/public-docs GitHub Wiki

Gemini

When using LiteLLM, you can configure hyperparameters at both the provider and model-specific levels, and for general fine-tuning. Available hyperparameters include standard API parameters like and , provider-specific settings (e.g., ), and fine-tuning parameters such as and . [1, 2, 3, 4]
Common configuration parameters

• Standard API parameters: These are parameters that influence the model's response directly.

• temperature : Controls randomness in the output. 
• max_tokens : Limits the length of the generated response. 
• stream : Enables or disables streaming responses. 

• Model-specific parameters: These are often defined when adding a new model to the and can include provider-specific settings.

• : The user-facing alias for the model. 
• : A dictionary for LiteLLM-specific configurations, such as  for SageMaker models. 
• : A provider-specific parameter, as shown in [LiteLLM's documentation](https://docs.litellm.ai/docs/completion/provider_specific_params) (https://docs.litellm.ai/docs/completion/provider_specific_params). 

• Fine-tuning parameters: These are used for training a model and are specified separately, as detailed in LiteLLM's fine-tuning documentation (https://docs.litellm.ai/docs/fine_tuning).

• : The number of examples in each training batch. 
• : The number of full training cycles. 
• : A scaling factor for the learning rate. [[1](https://docs.litellm.ai/docs/completion/input), [2](https://docs.litellm.ai/docs/fine_tuning), [3](https://docs.litellm.ai/docs/completion/provider_specific_params), [4](https://docs.litellm.ai/docs/providers/hyperbolic), [5](https://docs.litellm.ai/docs/proxy/model_management#:~:text=The%20litellm_params%20parameter%20is%20a%20dictionary%20that,to%20provide%20additional%20information%20about%20the%20model)]  

Where to configure

• Input/API calls: You can pass standard parameters directly in your API call, such as or . • configuration: When setting up models, you can add provider-specific parameters and model-specific configurations within the in your file. • Fine-tuning: For fine-tuning jobs, hyperparameters are passed as part of the request body for the specific fine-tuning job. [2, 3, 4, 5, [6](https://amaiya.github.io/onprem/#:~:text=Other%20parameters%20(e.g.%2C%20api_key%20%2C%20device_map%20%2C,by%20supplying%20it%20to%20LLM.%20prompt%20.)]

AI responses may include mistakes.

[1] https://docs.litellm.ai/docs/completion/input [2] https://docs.litellm.ai/docs/fine_tuning [3] https://docs.litellm.ai/docs/completion/provider_specific_params [4] https://docs.litellm.ai/docs/providers/hyperbolic [5] https://docs.litellm.ai/docs/proxy/model_management [6] [https://amaiya.github.io/onprem/](https://amaiya.github.io/onprem/#:~:text=Other%20parameters%20(e.g.%2C%20api_key%20%2C%20device_map%20%2C,by%20supplying%20it%20to%20LLM.%20prompt%20.)

Grok

LiteLLM provides an OpenAI-compatible API server (often called the LiteLLM Proxy) that acts as a unified gateway to over 100 LLM providers (OpenAI, Azure, Anthropic, Cohere, Ollama, Hugging Face, Bedrock, Vertex AI, etc.). The exposed hyperparameters/options fall into two categories:

  1. Per-request options — Passed in the /chat/completions, /completions, or /embeddings request body (exactly like the OpenAI API).
  2. Per-model defaults — Set in the proxy's config.yaml under litellm_params for each model (or globally via provider configs).

Per-Request Hyperparameters (OpenAI-Compatible)

LiteLLM accepts all standard OpenAI Chat Completion parameters and intelligently maps or passes them to the underlying provider. If a parameter is not supported by the target model/provider, it can be automatically dropped (enable drop_params: true in config or per-call).

Parameter Type Description Supported by All Models?
temperature float (0-2) Controls randomness (higher = more creative) Yes (most providers)
top_p float (0-1) Nucleus sampling Yes
top_k int Top-K sampling (used by Anthropic, Cohere, Ollama, etc.) Provider-specific
max_tokens / max_completion_tokens int Maximum tokens to generate Yes
stop str/array Stop sequences Yes
presence_penalty float (-2 to 2) Penalize new tokens that already appeared Mostly OpenAI-like
frequency_penalty float (-2 to 2) Penalize tokens based on frequency in text so far Mostly OpenAI-like
logit_bias dict Modify likelihood of specific tokens OpenAI, Azure, some others
seed int For reproducible outputs OpenAI, Azure, Cohere
tools / functions array Function calling / tool use Varies widely
tool_choice / function_call str/object Force tool use Varies
response_format object e.g., { "type": "json_object" } OpenAI, Gemini, some others
n int Number of completions Mostly OpenAI-like
stream bool Streaming response Yes
extra_headers dict Custom headers sent to provider Yes
metadata / user str/dict Tracking / end-user identifier Yes (LiteLLM internal)

You can also pass any provider-specific parameter directly in the request body — LiteLLM will forward it unchanged to the target provider (e.g., top_k for Claude, num_predict for Ollama, keep_alive for Ollama, repetition_penalty for many open-source models).

Example request (curl):

curl http://localhost:4000/v1/chat/completions \
  -H "Authorization: Bearer sk-1234" \
  -d '{
    "model": "ollama/llama3.1",
    "messages": [{"role": "user", "content": "Hello"}],
    "temperature": 0.1,
    "num_predict": 512,
    "repeat_penalty": 1.15,
    "top_k": 40
  }'

Per-Model Default Hyperparameters (in config.yaml)

In the proxy config you can set defaults under litellm_params for each model entry:

model_list:
  - model_name: my-llama3
    litellm_params:
      model: ollama/llama3.1
      api_base: http://host.docker.internal:11434
      temperature: 0.7
      max_tokens: 8192
      top_p: 0.95
      repeat_penalty: 1.1
      num_predict: -1
      keep_alive: "30m"      # Ollama-specific

These act as defaults but can be overridden per-request.

Are They Model-Specific?

Yes, very much so:

  • LiteLLM tries to accept everything and map it where possible (e.g., max_tokensmax_new_tokens for Hugging Face, num_predict for Ollama).
  • If a parameter has no equivalent, it is either ignored or (with drop_params: true) stripped before sending to the provider.
  • Some parameters only work on certain backends (e.g., logit_bias only on OpenAI/Azure, mirostat only on llama.cpp-based servers like Ollama/vLLM).
  • You can see exactly which parameters are supported for a given model on your proxy by calling /v1/model/info or using the CLI litellm --model <model> --debug.

In short: LiteLLM gives you the full OpenAI parameter set + any extra provider-specific params, and intelligently routes/drops them based on the selected model. This makes it extremely flexible compared to running Ollama directly. For the absolute latest list, check the official docs at https://docs.litellm.ai/docs/completion/input and the provider-specific pages.

⚠️ **GitHub.com Fallback** ⚠️