IzziAPI
TipsApr 5, 20267 min read

How Prompt Caching Cuts Your API Costs by 90%

Implement prompt caching to slash input token costs. Python and Node.js examples with real cost comparisons.

Izzi API Team
Engineering & DevRel
prompt-cachingcost-optimizationtipsclaude
How Prompt Caching Cuts Your API Costs by 90%

What is prompt caching?

Prompt caching lets the API reuse previously processed tokens across requests. Instead of re-processing your system prompt every time, the model reads it from cache at 90% less cost.

How much can you save?

ScenarioWithout cachingWith cachingSavings
4K system prompt × 100 calls$1.20$0.1389%
10K system prompt × 500 calls$15.00$1.6589%
RAG context × 1000 calls$45.00$4.9589%

How it works

  1. First request: Full processing — normal cost
  2. Subsequent requests: Cached tokens at 10% of input price
  3. Cache lifetime: ~5 minutes of inactivity
  4. Minimum size: 1,024+ cacheable tokens

Python implementation

Python
import anthropic

client = anthropic.Anthropic(
    api_key="izzi-YOUR_KEY",
    base_url="https://api.izziapi.com"
)

SYSTEM_PROMPT = """You are an expert code reviewer.
... (long system prompt with rules, examples, etc.)
"""

# The system prompt gets cached after the first call
def review_code(code: str) -> str:
    response = client.messages.create(
        model="claude-sonnet-4-20250514",
        max_tokens=4096,
        system=[{
            "type": "text",
            "text": SYSTEM_PROMPT,
            "cache_control": {"type": "ephemeral"}  # Enable caching
        }],
        messages=[{"role": "user", "content": f"Review this code:\n{code}"}]
    )
    return response.content[0].text

# First call: full price
review_code("def hello(): print('world')")

# All subsequent calls: 90% cheaper on the system prompt
for file in code_files:
    review_code(file)

Node.js implementation

Python
import Anthropic from '@anthropic-ai/sdk';

const client = new Anthropic({
  apiKey: 'izzi-YOUR_KEY',
  baseURL: 'https://api.izziapi.com',
});

const SYSTEM_PROMPT = `You are an expert code reviewer...`;

const response = await client.messages.create({
  model: 'claude-sonnet-4-20250514',
  max_tokens: 4096,
  system: [{
    type: 'text',
    text: SYSTEM_PROMPT,
    cache_control: { type: 'ephemeral' }
  }],
  messages: [{ role: 'user', content: 'Review this code...' }],
});

Best practices

  • 🎯 Long system prompts — put your detailed instructions in the system message
  • 🎯 RAG context — cache your retrieved documents
  • 🎯 Few-shot examples — provide examples in a cached block
  • 🎯 Keep cache warm — send requests within 5 minutes to maintain cache
  • 🎯 Check usage — monitor cache_creation_input_tokens vs cache_read_input_tokens

Models that support caching

  • ✅ Claude Opus 4
  • ✅ Claude Sonnet 4
  • ✅ Claude Haiku 4.5
  • ✅ All Claude models via Izzi API

Start caching today

If you're making repeated API calls with the same system prompt or context, prompt caching is the single biggest cost optimization you can implement. Sign up for Izzi API and start saving immediately.

Ready to start building?

Access 38+ AI models through a single API. Free tier available — no credit card required.

MORE

Related articles