LiteLLM:option - chunhualiao/public-docs GitHub Wiki

Gemini

When using LiteLLM, you can configure hyperparameters at both the provider and model-specific levels, and for general fine-tuning. Available hyperparameters include standard API parameters like and , provider-specific settings (e.g., ), and fine-tuning parameters such as and . [1, 2, 3, 4]
Common configuration parameters

• Standard API parameters: These are parameters that influence the model's response directly.

• temperature : Controls randomness in the output. 
• max_tokens : Limits the length of the generated response. 
• stream : Enables or disables streaming responses.

• Model-specific parameters: These are often defined when adding a new model to the and can include provider-specific settings.

• : The user-facing alias for the model. 
• : A dictionary for LiteLLM-specific configurations, such as  for SageMaker models. 
• : A provider-specific parameter, as shown in [LiteLLM's documentation](https://docs.litellm.ai/docs/completion/provider_specific_params) (https://docs.litellm.ai/docs/completion/provider_specific_params).

• Fine-tuning parameters: These are used for training a model and are specified separately, as detailed in LiteLLM's fine-tuning documentation (https://docs.litellm.ai/docs/fine_tuning).

• : The number of examples in each training batch. 
• : The number of full training cycles. 
• : A scaling factor for the learning rate. [[1](https://docs.litellm.ai/docs/completion/input), [2](https://docs.litellm.ai/docs/fine_tuning), [3](https://docs.litellm.ai/docs/completion/provider_specific_params), [4](https://docs.litellm.ai/docs/providers/hyperbolic), [5](https://docs.litellm.ai/docs/proxy/model_management#:~:text=The%20litellm_params%20parameter%20is%20a%20dictionary%20that,to%20provide%20additional%20information%20about%20the%20model)]

Where to configure

• Input/API calls: You can pass standard parameters directly in your API call, such as or . • configuration: When setting up models, you can add provider-specific parameters and model-specific configurations within the in your file. • Fine-tuning: For fine-tuning jobs, hyperparameters are passed as part of the request body for the specific fine-tuning job. [2, 3, 4, 5, [6](https://amaiya.github.io/onprem/#:~:text=Other%20parameters%20(e.g.%2C%20api_key%20%2C%20device_map%20%2C,by%20supplying%20it%20to%20LLM.%20prompt%20.)]

AI responses may include mistakes.

[1] https://docs.litellm.ai/docs/completion/input [2] https://docs.litellm.ai/docs/fine_tuning [3] https://docs.litellm.ai/docs/completion/provider_specific_params [4] https://docs.litellm.ai/docs/providers/hyperbolic [5] https://docs.litellm.ai/docs/proxy/model_management [6] [https://amaiya.github.io/onprem/](https://amaiya.github.io/onprem/#:~:text=Other%20parameters%20(e.g.%2C%20api_key%20%2C%20device_map%20%2C,by%20supplying%20it%20to%20LLM.%20prompt%20.)

Grok

LiteLLM provides an OpenAI-compatible API server (often called the LiteLLM Proxy) that acts as a unified gateway to over 100 LLM providers (OpenAI, Azure, Anthropic, Cohere, Ollama, Hugging Face, Bedrock, Vertex AI, etc.). The exposed hyperparameters/options fall into two categories:

Per-request options — Passed in the /chat/completions, /completions, or /embeddings request body (exactly like the OpenAI API).
Per-model defaults — Set in the proxy's config.yaml under litellm_params for each model (or globally via provider configs).

Per-Request Hyperparameters (OpenAI-Compatible)

LiteLLM accepts all standard OpenAI Chat Completion parameters and intelligently maps or passes them to the underlying provider. If a parameter is not supported by the target model/provider, it can be automatically dropped (enable drop_params: true in config or per-call).

Parameter	Type	Description	Supported by All Models?
`temperature`	float (0-2)	Controls randomness (higher = more creative)	Yes (most providers)
`top_p`	float (0-1)	Nucleus sampling	Yes
`top_k`	int	Top-K sampling (used by Anthropic, Cohere, Ollama, etc.)	Provider-specific
`max_tokens` / `max_completion_tokens`	int	Maximum tokens to generate	Yes
`stop`	str/array	Stop sequences	Yes
`presence_penalty`	float (-2 to 2)	Penalize new tokens that already appeared	Mostly OpenAI-like
`frequency_penalty`	float (-2 to 2)	Penalize tokens based on frequency in text so far	Mostly OpenAI-like
`logit_bias`	dict	Modify likelihood of specific tokens	OpenAI, Azure, some others
`seed`	int	For reproducible outputs	OpenAI, Azure, Cohere
`tools` / `functions`	array	Function calling / tool use	Varies widely
`tool_choice` / `function_call`	str/object	Force tool use	Varies
`response_format`	object	e.g., `{ "type": "json_object" }`	OpenAI, Gemini, some others
`n`	int	Number of completions	Mostly OpenAI-like
`stream`	bool	Streaming response	Yes
`extra_headers`	dict	Custom headers sent to provider	Yes
`metadata` / `user`	str/dict	Tracking / end-user identifier	Yes (LiteLLM internal)

You can also pass any provider-specific parameter directly in the request body — LiteLLM will forward it unchanged to the target provider (e.g., top_k for Claude, num_predict for Ollama, keep_alive for Ollama, repetition_penalty for many open-source models).

Example request (curl):

curl http://localhost:4000/v1/chat/completions \
  -H "Authorization: Bearer sk-1234" \
  -d '{
    "model": "ollama/llama3.1",
    "messages": [{"role": "user", "content": "Hello"}],
    "temperature": 0.1,
    "num_predict": 512,
    "repeat_penalty": 1.15,
    "top_k": 40
  }'

Per-Model Default Hyperparameters (in config.yaml)

In the proxy config you can set defaults under litellm_params for each model entry:

model_list:
  - model_name: my-llama3
    litellm_params:
      model: ollama/llama3.1
      api_base: http://host.docker.internal:11434
      temperature: 0.7
      max_tokens: 8192
      top_p: 0.95
      repeat_penalty: 1.1
      num_predict: -1
      keep_alive: "30m"      # Ollama-specific

These act as defaults but can be overridden per-request.

Are They Model-Specific?

Yes, very much so:

LiteLLM tries to accept everything and map it where possible (e.g., max_tokens → max_new_tokens for Hugging Face, num_predict for Ollama).
If a parameter has no equivalent, it is either ignored or (with drop_params: true) stripped before sending to the provider.
Some parameters only work on certain backends (e.g., logit_bias only on OpenAI/Azure, mirostat only on llama.cpp-based servers like Ollama/vLLM).
You can see exactly which parameters are supported for a given model on your proxy by calling /v1/model/info or using the CLI litellm --model <model> --debug.

In short: LiteLLM gives you the full OpenAI parameter set + any extra provider-specific params, and intelligently routes/drops them based on the selected model. This makes it extremely flexible compared to running Ollama directly. For the absolute latest list, check the official docs at https://docs.litellm.ai/docs/completion/input and the provider-specific pages.