Demonstrates and tests the four context management techniques used internally by Claude Code, reproduced using the plain Anthropic Python API. All tests use real API calls against claude-sonnet-4-6.
| # | Technique | How |
|---|---|---|
| 1 | Full Compaction | Send history + summarization prompt → replace history with summary |
| 2 | Time-Based Micro Compaction | Client-side: if gap since last assistant message > 60 min (cache TTL), replace old tool result bodies with [Old tool result content cleared] |
| 3 | API Thinking Clearing | context-management-2025-06-27 beta + clear_thinking_20251015 directive |
| 4 | API Tool Clearing | context-management-2025-06-27 beta + clear_tool_uses_20250919 directive |
Note: Token savings are measured only for API Tool Clearing and API Thinking Clearing (techniques 3 & 4). These are the only techniques that reduce
input_tokensreported by the API and are therefore directly measurable in a before/after comparison.Full Compaction and Time-Based Micro Compaction are tested for correctness only (right messages replaced, facts preserved) — not token savings. Full compaction resets context entirely so a simple before/after token count is not meaningful; time-based MC is pure client-side string replacement with no API call involved.
Turn: 1 2 3 4 5 6 7 8 9 10
Baseline: 1141 1712 2283 2854 3425 3996 4567 5138 5709 6280
Cleared: 1141 1712 2283 2357 2431 2505 2579 2653 2727 2801
| Cumulative tokens | Final turn tokens | |
|---|---|---|
| Baseline | 37,105 | 6,280 |
| Tool clearing | 23,189 | 2,801 |
| Saving | 37.5% | 55.4% |
Turn: 1 2 3 4 5 6 7 8
Baseline: 640 763 984 1307 1519 1747 1992 2220
Cleared: 640 763 890 1143 1292 1249 1392 1534
| Cumulative tokens | Final turn tokens | |
|---|---|---|
| Baseline | 11,172 | 2,220 |
| Thinking clearing | 8,903 | 1,534 |
| Saving | 20.3% | 30.9% |
Turn: 1 2 3 4 5 6 7 8
Baseline: 640 1763 1985 4346 4511 6753 6930 9166
Tool only: 640 1763 2058 4329 4519 4758 4967 5206
Think only:640 1763 1912 4153 4292 6535 6705 8929
Combined: 640 1763 1851 4092 4184 4428 4557 4795
| Strategy | Cumulative | Saving |
|---|---|---|
| Baseline | 36,094 | — |
| Tool clearing only | 28,240 | 21.8% |
| Thinking clearing only | 34,929 | 3.2% |
| Combined | 26,310 | 27.1% |
- Savings are roughly additive — combined (27.1%) ≈ tool-only (21.8%) + thinking-only (3.2%), with a small positive interaction effect.
- Tool clearing dominates when tool results are large. Thinking blocks contribute a smaller share unless
budget_tokensis very high. clear_thinking_20251015must be first in theeditsarray when combining both directives — the API returns 400 otherwise.tool_choice: "any"is incompatible with thinking mode — the API rejects it. Drive tool use via prompt instead.- Time-based MC is purely client-side — no special API features needed. The 60-minute threshold matches the Anthropic prompt cache TTL.
cp .env.example .env
# add your ANTHROPIC_API_KEY
uv run pytest test_compaction.py -v -sRequires Python 3.11+. Dependencies installed automatically by uv.