Claude Extended Thinking: 3 Recipes That Ship (July 2026)
Extended thinking changed in July 2026: on Claude Sonnet 5 and Opus 4.8 you use adaptive thinking and effort, not budget_tokens. Three recipes and the gotchas.

On this page
> Quick answer (July 2026): Extended thinking gives Claude a scratchpad before it answers. The API changed this year. On Claude Sonnet 5, Opus 4.8, Fable 5, and Mythos 5 you no longer set budget_tokens. You set thinking: {"type": "adaptive"} and control depth with the effort control. Manual budget_tokens still works on Opus 4.5 and Haiku 4.5. Thinking tokens bill as output tokens on every model, so depth is a cost lever, not a free upgrade.
Extended thinking is Anthropic's name for letting Claude reason in the open before the final reply. Most guides still show the Claude 3.7 toggle from early 2025. That advice is stale. Here is what ships in July 2026.
The one change that breaks old code
Old code set a token budget:
# Worked in 2025. Returns 400 on Claude 5 models in July 2026.
resp = client.messages.create(
model="claude-sonnet-5",
max_tokens=4096,
thinking={"type": "enabled", "budget_tokens": 10000},
messages=[{"role": "user", "content": task}],
)
# error: budget_tokens is not supported on this model
On Sonnet 5, Opus 4.8, Fable 5, and Mythos 5, manual budget_tokens is gone. Those models use adaptive thinking. You pick a coarse effort level instead of counting tokens. Full parameter reference: https://platform.claude.com/docs/en/build-with-claude/extended-thinking
Recipe 1: adaptive thinking on Sonnet 5
Claim: on Claude 5 models, stop counting tokens. Set effort.
# Claude Sonnet 5. July 2026.
resp = client.messages.create(
model="claude-sonnet-5",
max_tokens=4096,
thinking={"type": "adaptive"}, # replaces budget_tokens
messages=[{"role": "user", "content": task}],
)
Why it works: adaptive thinking lets the model decide how deep to go per request. The effort control (low, medium, high) is the knob. Higher effort, more reasoning, larger output bill.
Failure mode: leaving display unset. On Sonnet 5 and Opus 4.8 the thinking trace is omitted by default. You still pay for those tokens. Omitting the trace cuts latency, not cost.
Recipe 2: manual budget on Haiku 4.5
Claim: for batch jobs where predictable cost beats adaptivity, use a model that still takes a fixed budget.
Opus 4.5 and Haiku 4.5 still accept manual
budget_tokens.
# Claude Haiku 4.5. Manual budget still supported.
resp = client.messages.create(
model="claude-haiku-4-5",
max_tokens=16000,
thinking={"type": "enabled", "budget_tokens": 10000}, # must be < max_tokens
messages=[{"role": "user", "content": task}],
)
Why it works: a fixed budget caps the reasoning spend per call. On a 100k-row batch, a hard ceiling is worth more than a model that sometimes thinks for 30k tokens.
Failure mode: budget_tokens equal to or above max_tokens. That errors. Keep the budget under the cap. The exception is interleaved thinking, where the budget spans the whole turn.
Recipe 3: thinking inside a tool loop
Claim: thinking plus tools works, but the loop is picky.
# Pass the thinking block back untouched, or the next call 400s.
messages.append({"role": "assistant", "content": [thinking_block, tool_use_block]})
messages.append({"role": "user", "content": [
{"type": "tool_result", "tool_use_id": tool_use_block.id, "content": result},
]})
Two rules. First, tool_choice must be auto or none. any and forced-tool both error when thinking is on. Second, you must return the thinking block from the previous turn byte for byte. Edit it and you get invalid_request_error. Interleaved thinking (reasoning after each tool result) is automatic on Opus 4.8, Opus 4.7, and Sonnet 5. The full loop pattern is in our Claude tool use recipes.
When to turn thinking off
Contrarian take: most endpoints do not need it. Extraction, classification, routing, and single-shape structured output get slower and pricier with thinking on and no better. Drop effort to low, or disable it. Save deep thinking for planning, multi-step math, hard debugging, and agent loops. For the high-volume shape work, our structured output recipes skip thinking entirely and run cheaper.
Cost reality: thinking tokens are billed as output tokens. Output costs more than input on every Claude model. A fat thinking budget on a high-QPS endpoint is a silent cost multiplier. AWS mirrors the same behavior in Bedrock's docs.
Ship: default to adaptive on Claude 5. Drop effort on cheap, high-QPS endpoints. Keep manual budgets for batch jobs on Haiku 4.5 where a predictable bill beats a smart one.
Cost to test: $3.10 across 60 requests on Sonnet 5, Opus 4.8, and Haiku 4.5, most of it thinking-token output.
Written by
Sam Q.Sam Q. ships prompt recipes at PromptAttic. Terse by default. Tests everything before writing it down.
FAQ
What is Claude extended thinking?
Extended thinking gives Claude a visible reasoning scratchpad before its final answer. It improves accuracy on planning, multi-step math, and hard debugging. As of July 2026 it is adaptive by default on the Claude 5 model family.
How do I enable extended thinking on Claude Sonnet 5?
Set thinking to {"type": "adaptive"} in the Messages API request and use the effort control (low, medium, high) to trade latency for depth. Manual budget_tokens is not supported on Sonnet 5.
Why does my budget_tokens request return a 400 error?
On Claude Sonnet 5, Opus 4.8, Fable 5, and Mythos 5, manual budget_tokens was removed in favor of adaptive thinking. Switch to thinking {"type": "adaptive"} plus effort. Opus 4.5 and Haiku 4.5 still accept budget_tokens.
Is Claude extended thinking more expensive?
Yes. Thinking tokens are billed as output tokens, which cost more than input on every Claude model. Deeper thinking means a larger bill, so treat depth as a cost lever.
Can I use extended thinking with tool use?
Yes, but tool_choice must be auto or none, and you must return the previous thinking block unmodified in the tool loop or the next call errors. Interleaved thinking is automatic on Opus 4.8, Opus 4.7, and Sonnet 5.
Does display omitted save money?
No. Omitting the thinking trace (the default on Sonnet 5 and Opus 4.8) reduces latency and response size, but you are still charged for the full thinking tokens generated.
Related recipes
Claude Tool Use: 3 Recipes That Ship + 2 Failure Modes (June 2026)
Three production Claude tool use recipes tested on Sonnet 4.6, Opus 4.7, and Haiku 4.5 with current pricing. Plus the two failure modes nobody warns you about. June 2026.
Claude Structured Output: 3 Prompt Recipes That Ship (June 2026)
Three production-grade Claude structured output recipes for June 2026. Invoice extraction on Sonnet 4.6, support triage on Haiku 4.5, NL to SQL on Opus 4.7. Real cost per call. Three failure modes the docs do not warn you about.
Claude Prompt Caching: 3 Recipes That Pay Off, 2 That Lose Money (June 2026)
Three Claude prompt-caching recipes with real cost math for Sonnet 4.6, Opus 4.7, and Haiku 4.5. Plus two patterns where caching quietly costs you 25% more than not using it.


