ollama:option - chunhualiao/public-docs GitHub Wiki

Full List of Available Hyperparameters (Options)

These are the current supported options (as of late 2025; Ollama follows llama.cpp updates closely):

Parameter	Type	Default	Description
`num_ctx`	int	2048 or model-specific	Context window size in tokens (how much history the model "sees"). Larger values use more VRAM/RAM.
`temperature`	float	0.8	Randomness/creativity (0 = deterministic, higher = more creative).
`top_k`	int	40	Limits sampling to the top K most probable tokens.
`top_p`	float	0.9	Nucleus sampling: considers the smallest set of tokens whose cumulative probability exceeds p.
`min_p`	float	0.0	Minimum probability threshold relative to the most likely token (newer addition for diversity).
`repeat_penalty`	float	1.1	Penalizes repetition of recent tokens.
`repeat_last_n`	int	64	How many recent tokens to consider for repeat_penalty (-1 = entire context).
`num_predict`	int	-1 (unlimited) / -2 (fill context)	Maximum tokens to generate (-1 means "keep going until stop token or context full", but Ollama caps it internally for safety).
`num_keep`	int	0	Number of initial tokens to always keep (useful for forced prefixes).
`seed`	int	random	Random seed for reproducible outputs (use same seed + temperature 0 for fully deterministic).
`stop`	string/array	model-specific	Stop sequences (the model stops generating when it hits one).
`tfs_z`	float	1.0	Tail Free Sampling zeta – reduces probability of low-probability tokens.
`typical_p`	float	1.0	Local typicality sampling.
`presence_penalty`	float	0.0	Penalizes tokens that already appeared anywhere in the context.
`frequency_penalty`	float	0.0	Penalizes tokens based on how frequently they appeared in the context.
`mirostat`	int	0 (disabled)	Mirostat sampling mode (0=disabled, 1=original, 2=v2).
`mirostat_tau`	float	5.0	Target surprise/perplexity for Mirostat.
`mirostat_eta`	float	0.1	Learning rate for Mirostat.
`num_thread`	int	auto	Number of CPU threads to use (usually leave auto).
`num_gpu`	int	auto	Number of GPU layers to offload (or -1 = all possible).
`main_gpu`	int	0	Primary GPU index when using multiple GPUs.
`low_vram`	bool	false	Enable techniques to reduce VRAM usage (slower).
`num_batch`	int	512	Batch size for prompt processing.
`num_keep`	int	0	Tokens to always keep from the prompt.
`logit_bias`	map	none	Token ID → bias value to adjust likelihood of specific tokens.

There are also some less-common/advanced options like grammar, json_mode, etc., that appear in newer Ollama/llama.cpp versions.

How to Set Them

1. In a Modelfile (permanent defaults)

FROM llama3.2
PARAMETER temperature 0.7
PARAMETER num_ctx 8192
PARAMETER top_p 0.95
PARAMETER repeat_penalty 1.15
PARAMETER stop "<|eot_id|>"

Then run ollama create my-llama3 -f Modelfile

The model `my-llama3` will now always use those defaults.

**2. At runtime via CLI**
```bash
ollama run llama3 --num_ctx 8192 --temperature 0.7
# or for a single prompt when already running the model
# Why is the sky blue? /options num_ctx=16384 temperature=0.2

3. Via the REST API (most common for apps)

{
  "model": "llama3",
  "prompt": "Explain quantum computing",
  "options": {
    "temperature": 0.2,
    "num_ctx": 8192,
    "top_p": 0.95,
    "num_predict": 512
  }
}

4. In client libraries (Python, JS, etc.)

from ollama import chat
chat(
  model='llama3',
  messages=[{ 'role': 'user', 'content': 'Hello' }],
  options={
    'temperature': 0.1,
    'num_ctx': 16384
  }
)

These options give you full control over sampling, context length, repetition handling, reproducibility, and hardware usage when running any model through Ollama. The exact defaults can vary slightly per base model (e.g., Llama 3.2 defaults to 128k context if the quantization supports it, but Ollama often ships with a lower default like 8k or 4k to save memory). You can always override them at runtime. For the very latest list, check the official Ollama GitHub repo → docs/modelfile.md and docs/api.md or run ollama show --modelfile <model> to see what a particular model is using.

ollama:option - chunhualiao/public-docs GitHub Wiki

Full List of Available Hyperparameters (Options)

How to Set Them

⚠️ **GitHub.com Fallback** ⚠️

⚠️ GitHub.com Fallback ⚠️