Streaming

Low-latency output

Omixa supports Server-Sent Events for chat and compatible reasoning responses while keeping billing and usage capture consistent.

Server-Sent Events

Set `stream` to true and read each `data:` frame until `[DONE]`. The event body stays compatible with OpenAI-style clients where the selected route supports it.

Example

const response = await fetch('/api/v1/chat/completions', {
  method: 'POST',
  headers: { Authorization: `Bearer ${process.env.OMIXA_KEY}`, 'Content-Type': 'application/json' },
  body: JSON.stringify({ model: 'gpt-5', stream: true, messages: [{ role: 'user', content: 'Stream this.' }] })
});

Reasoning streams

Reasoning-capable models may expose summaries or thinking deltas when enabled. Use these fields for developer tools and observability, not for end-user guarantees.

Use `reasoning_effort` only on model families that support it.
Use `show_thinking` when you need model-native reasoning summaries.
Always handle normal assistant text and reasoning chunks separately.