IzziAPI
TutorialApr 11, 20267 min read

How to Stream AI Responses with SSE in Python

Build a real-time AI streaming API with Server-Sent Events in Python. FastAPI and Flask examples with production patterns.

Izzi API Team
Engineering & DevRel
streamingssepythonfastapireal-time
How to Stream AI Responses with SSE in Python

Why streaming matters

Without streaming, users wait 3-8 seconds staring at a blank screen. With streaming, the first token appears in 200ms. That difference determines whether users stay or leave.

FastAPI streaming implementation

Python
from fastapi import FastAPI
from fastapi.responses import StreamingResponse
from openai import OpenAI
import json

app = FastAPI()

client = OpenAI(
    api_key="izzi-YOUR_KEY_HERE",
    base_url="https://api.izziapi.com/v1"
)

@app.post("/chat/stream")
async def stream_chat(request: dict):
    message = request.get("message", "")
    model = request.get("model", "claude-sonnet-4-20250514")
    
    async def generate():
        stream = client.chat.completions.create(
            model=model,
            messages=[{"role": "user", "content": message}],
            stream=True
        )
        
        for chunk in stream:
            content = chunk.choices[0].delta.content or ""
            if content:
                yield f"data: {json.dumps({'content': content})}\n\n"
        
        yield "data: [DONE]\n\n"
    
    return StreamingResponse(
        generate(),
        media_type="text/event-stream",
        headers={
            "Cache-Control": "no-cache",
            "Connection": "keep-alive",
        }
    )

Frontend: consuming the stream

TypeScript
async function streamChat(message: string) {
  const response = await fetch("/chat/stream", {
    method: "POST",
    headers: { "Content-Type": "application/json" },
    body: JSON.stringify({ message }),
  });

  const reader = response.body!.getReader();
  const decoder = new TextDecoder();
  let fullResponse = "";

  while (true) {
    const { done, value } = await reader.read();
    if (done) break;

    const text = decoder.decode(value);
    const lines = text.split("\n\n").filter(Boolean);

    for (const line of lines) {
      if (line === "data: [DONE]") return fullResponse;
      if (line.startsWith("data: ")) {
        const { content } = JSON.parse(line.slice(6));
        fullResponse += content;
        updateUI(fullResponse); // Update your UI here
      }
    }
  }
  return fullResponse;
}

Performance comparison

MetricNon-streamingStreaming (SSE)
Time to first token3-8 seconds200-500ms
Perceived latencyVery slowInstant
User engagement40% drop-off5% drop-off
Token costSameSame

Error handling in streams

Python
async def generate_with_error_handling():
    try:
        stream = client.chat.completions.create(
            model="claude-sonnet-4-20250514",
            messages=messages,
            stream=True,
            timeout=30
        )
        for chunk in stream:
            content = chunk.choices[0].delta.content or ""
            if content:
                yield f"data: {json.dumps({'content': content})}\n\n"
    except Exception as e:
        yield f"data: {json.dumps({'error': str(e)})}\n\n"
    finally:
        yield "data: [DONE]\n\n"

What's next

Ready to start building?

Access 38+ AI models through a single API. Free tier available — no credit card required.

MORE

Related articles