Temperature and Sampling Parameters
Control creativity vs determinism with temperature, top-p, and other sampling parameters.
What Is Temperature?
Temperature controls the randomness of model outputs. At temperature 0, the model picks the most probable token every time — deterministic and consistent. At higher temperatures, it samples from a broader distribution — more creative but less predictable.
Temperature is not a creativity dial — it's a reliability dial. High temperature = high variance.
The Practical Temperature Guide
| Temperature | Use Case | |-------------|----------| | 0 | Classification, extraction, factual Q&A, code generation | | 0.3 | Structured summaries, technical writing | | 0.7 | General writing, balanced creativity | | 1.0 | Brainstorming, ideation, poetry | | 1.5+ | Experimental; outputs can become incoherent |
Default for most production use cases: 0.2–0.5.
Top-P (Nucleus Sampling)
Top-p restricts sampling to the smallest set of tokens whose cumulative probability reaches P. At top-p 0.9, the model samples from tokens covering 90% of the probability mass, excluding low-probability tail tokens.
Use top-p and temperature together or choose one — both constrain the sampling space. Most practitioners use temperature and leave top-p at 1.0 unless specifically needed.
Top-K
Top-k limits the candidate tokens to the K most probable at each step. Less commonly used than temperature/top-p, but useful for very constrained outputs.
Practical Advice
For production systems handling real user data, start at temperature 0 and increase only when outputs feel too rigid. A/B test temperature values against your eval criteria before deploying.
Never use high temperatures in systems where accuracy matters more than variety — medical, legal, financial, or security-sensitive applications.