What is prompt caching?
Prompt caching lets the API reuse previously processed tokens across requests. Instead of re-processing your system prompt every time, the model reads it from cache at 90% less cost.
How much can you save?
| Scenario | Without caching | With caching | Savings |
|---|---|---|---|
| 4K system prompt × 100 calls | $1.20 | $0.13 | 89% |
| 10K system prompt × 500 calls | $15.00 | $1.65 | 89% |
| RAG context × 1000 calls | $45.00 | $4.95 | 89% |
How it works
- First request: Full processing — normal cost
- Subsequent requests: Cached tokens at 10% of input price
- Cache lifetime: ~5 minutes of inactivity
- Minimum size: 1,024+ cacheable tokens
Python implementation
import anthropic
client = anthropic.Anthropic(
api_key="izzi-YOUR_KEY",
base_url="https://api.izziapi.com"
)
SYSTEM_PROMPT = """You are an expert code reviewer.
... (long system prompt with rules, examples, etc.)
"""
# The system prompt gets cached after the first call
def review_code(code: str) -> str:
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=4096,
system=[{
"type": "text",
"text": SYSTEM_PROMPT,
"cache_control": {"type": "ephemeral"} # Enable caching
}],
messages=[{"role": "user", "content": f"Review this code:\n{code}"}]
)
return response.content[0].text
# First call: full price
review_code("def hello(): print('world')")
# All subsequent calls: 90% cheaper on the system prompt
for file in code_files:
review_code(file)Node.js implementation
import Anthropic from '@anthropic-ai/sdk';
const client = new Anthropic({
apiKey: 'izzi-YOUR_KEY',
baseURL: 'https://api.izziapi.com',
});
const SYSTEM_PROMPT = `You are an expert code reviewer...`;
const response = await client.messages.create({
model: 'claude-sonnet-4-20250514',
max_tokens: 4096,
system: [{
type: 'text',
text: SYSTEM_PROMPT,
cache_control: { type: 'ephemeral' }
}],
messages: [{ role: 'user', content: 'Review this code...' }],
});Best practices
- 🎯 Long system prompts — put your detailed instructions in the system message
- 🎯 RAG context — cache your retrieved documents
- 🎯 Few-shot examples — provide examples in a cached block
- 🎯 Keep cache warm — send requests within 5 minutes to maintain cache
- 🎯 Check usage — monitor
cache_creation_input_tokensvscache_read_input_tokens
Models that support caching
- ✅ Claude Opus 4
- ✅ Claude Sonnet 4
- ✅ Claude Haiku 4.5
- ✅ All Claude models via Izzi API
Start caching today
If you're making repeated API calls with the same system prompt or context, prompt caching is the single biggest cost optimization you can implement. Sign up for Izzi API and start saving immediately.
