Temperature and Sampling Parameters

What Is Temperature?

Temperature controls the randomness of model outputs. At temperature 0, the model picks the most probable token every time — deterministic and consistent. At higher temperatures, it samples from a broader distribution — more creative but less predictable.

Temperature is not a creativity dial — it's a reliability dial. High temperature = high variance.

The Practical Temperature Guide

Temperature	Use Case
0	Classification, extraction, factual Q&A, code generation
0.3	Structured summaries, technical writing
0.7	General writing, balanced creativity
1.0	Brainstorming, ideation, poetry
1.5+	Experimental; outputs can become incoherent

Default for most production use cases: 0.2–0.5.

Top-P (Nucleus Sampling)

Top-p restricts sampling to the smallest set of tokens whose cumulative probability reaches P. At top-p 0.9, the model samples from tokens covering 90% of the probability mass, excluding low-probability tail tokens.

Use top-p and temperature together or choose one — both constrain the sampling space. Most practitioners use temperature and leave top-p at 1.0 unless specifically needed.

Top-K

Top-k limits the candidate tokens to the K most probable at each step. Less commonly used than temperature/top-p, but useful for very constrained outputs.

Practical Advice

For production systems handling real user data, start at temperature 0 and increase only when outputs feel too rigid. A/B test temperature values against your eval criteria before deploying.

Never use high temperatures in systems where accuracy matters more than variety — medical, legal, financial, or security-sensitive applications.

What Is Temperature?

The Practical Temperature Guide

Top-P (Nucleus Sampling)

Top-K

Practical Advice

Check your understanding

Prompt Engineering