Models Catalog

Developer documentation

Models Catalog

Every active Omixa model family, grouped by workflow, with endpoint shape, pricing context, fields, and copy-ready Markdown.

Model Reference

Chat and reasoning models

Language, code, reasoning, multimodal chat, tool calling, and streaming responses. Endpoint: http://omixa.cloud/api/v1/chat/completions

AQA

aqa

AQA for text generation, reasoning, tool calling, and live streaming responses.

Chat Context window: 7,168 tokens Max output: 1,024 tokens
minimum hold $0.010000
Integration docs

Antigravity Agent Preview

antigravity-agent-preview

Antigravity Agent Preview for text generation, reasoning, tool calling, and live streaming responses.

Chat Streaming Tools Streaming supported Tool/function calling supported
input per 1m tokens $1.250000
output per 1m tokens $10.000000
minimum hold $0.010000
Integration docs

Azure Computer Use Preview

azure-computer-use-preview

Azure Computer Use Preview for text generation, reasoning, tool calling, and streaming responses.

Chat Streaming Tools Context window: 8,192 tokens Max output: 1,024 tokens
input per 1m tokens $2.000000
cached input per 1m tokens $0.500000
output per 1m tokens $8.000000
Integration docs

Claude Haiku 4.5

claude-haiku-4-5

Claude Haiku 4.5 for text generation, reasoning, tool calling, and live streaming responses.

Chat Streaming Tools Context window: 200,000 tokens Max output: 64,000 tokens
input per 1m tokens $1.000000
cached input per 1m tokens $0.100000
output per 1m tokens $5.000000
Integration docs

Claude Opus 4.1

claude-opus-4-1

Claude Opus 4.1 for text generation, reasoning, tool calling, and live streaming responses.

Chat Streaming Tools Context window: 200,000 tokens Max output: 32,000 tokens
input per 1m tokens $15.000000
cached input per 1m tokens $1.500000
output per 1m tokens $75.000000
Integration docs

Claude Opus 4.5

claude-opus-4-5

Claude Opus 4.5 for text generation, reasoning, tool calling, and live streaming responses.

Chat Streaming Tools Context window: 200,000 tokens Max output: 64,000 tokens
input per 1m tokens $5.000000
cached input per 1m tokens $0.500000
output per 1m tokens $25.000000
Integration docs

Claude Opus 4.6

claude-opus-4-6

Claude Opus 4.6 for text generation, reasoning, tool calling, and live streaming responses.

Chat Streaming Tools Context window: 1,000,000 tokens Max output: 128,000 tokens
input per 1m tokens $5.000000
cached input per 1m tokens $0.500000
output per 1m tokens $25.000000
Integration docs

Claude Opus 4.7

claude-opus-4-7

Claude Opus 4.7 for text generation, reasoning, tool calling, and live streaming responses.

Chat Streaming Tools Context window: 1,000,000 tokens Max output: 128,000 tokens
input per 1m tokens $5.000000
cached input per 1m tokens $0.500000
output per 1m tokens $25.000000
Integration docs

Claude Opus 4.8

claude-opus-4-8

Claude Opus 4.8 for text generation, reasoning, tool calling, and live streaming responses.

Chat Streaming Tools Context window: 1,000,000 tokens Max output: 128,000 tokens
input per 1m tokens $5.000000
cached input per 1m tokens $0.500000
output per 1m tokens $25.000000
Integration docs

Claude Sonnet 4.5

claude-sonnet-4-5

Claude Sonnet 4.5 for text generation, reasoning, tool calling, and live streaming responses.

Chat Streaming Tools Context window: 200,000 tokens Max output: 64,000 tokens
input per 1m tokens $3.000000
cached input per 1m tokens $0.300000
output per 1m tokens $15.000000
Integration docs

Claude Sonnet 4.6

claude-sonnet-4-6

Claude Sonnet 4.6 for text generation, reasoning, tool calling, and live streaming responses.

Chat Streaming Tools Context window: 1,000,000 tokens Max output: 128,000 tokens
input per 1m tokens $3.000000
cached input per 1m tokens $0.300000
output per 1m tokens $15.000000
Integration docs

Codex Mini

codex-mini

Codex Mini for text generation, reasoning, tool calling, and live streaming responses.

Chat Streaming Tools Context window: 200,000 tokens Max output: 100,000 tokens
input per 1m tokens $1.500000
cached input per 1m tokens $0.375000
output per 1m tokens $6.000000
Integration docs

Cohere Command A

Cohere-command-a

Cohere Command A for text generation, reasoning, tool calling, and live streaming responses.

Chat Streaming Tools Context window: 131,072 tokens Max output: 8,182 tokens
input per 1m tokens $2.500000
output per 1m tokens $10.000000
minimum hold $0.010000
Integration docs

Computer Use Preview

computer-use-preview

Computer Use Preview for text generation, reasoning, tool calling, and live streaming responses.

Chat Streaming Tools Streaming supported Tool/function calling supported
input per 1m tokens $1.250000
output per 1m tokens $10.000000
minimum hold $0.010000
Integration docs

DeepSeek OCR

DeepSeek-OCR

DeepSeek OCR through Google Vertex AI Model Garden/MaaS with Omixa routing and streaming.

Chat Streaming Context window: 32,768 tokens Max output: 8,192 tokens
input per 1m tokens $0.560000
output per 1m tokens $1.680000
minimum hold $0.010000
Integration docs

DeepSeek R1

DeepSeek-R1

DeepSeek R1 for text generation, reasoning, tool calling, and live streaming responses.

Chat Streaming Context window: 163,840 tokens Max output: 163,840 tokens
input per 1m tokens $1.350000
output per 1m tokens $5.400000
minimum hold $0.010000
Integration docs

DeepSeek R1 0528

DeepSeek-R1-0528

DeepSeek R1 0528 through Google Vertex AI Model Garden/MaaS with Omixa routing and streaming.

Chat Streaming Tools Context window: 163,840 tokens Max output: 32,768 tokens
input per 1m tokens $1.350000
output per 1m tokens $5.400000
minimum hold $0.010000
Integration docs

DeepSeek V3 0324

DeepSeek-V3-0324

DeepSeek V3 0324 for text generation, reasoning, tool calling, and live streaming responses.

Chat Streaming Tools Context window: 131,072 tokens Max output: 131,072 tokens
input per 1m tokens $1.140000
output per 1m tokens $4.560000
minimum hold $0.010000
Integration docs

DeepSeek V3.1

DeepSeek-V3.1

DeepSeek V3.1 through Google Vertex AI Model Garden/MaaS with Omixa routing and streaming.

Chat Streaming Tools Context window: 131,072 tokens Max output: 32,768 tokens
input per 1m tokens $1.230000
output per 1m tokens $4.940000
minimum hold $0.010000
Integration docs

DeepSeek V3.2

DeepSeek-V3.2

DeepSeek V3.2 through Google Vertex AI Model Garden/MaaS with Omixa routing and streaming.

Chat Streaming Tools Context window: 163,840 tokens Max output: 65,536 tokens
input per 1m tokens $0.560000
cached input per 1m tokens $0.056000
output per 1m tokens $1.680000
Integration docs

DeepSeek V3.2 Speciale

DeepSeek-V3.2-Speciale

DeepSeek V3.2 Speciale for text generation, reasoning, tool calling, and live streaming responses.

Chat Streaming Context window: 128,000 tokens Max output: 128,000 tokens
input per 1m tokens $0.580000
output per 1m tokens $1.680000
minimum hold $0.010000
Integration docs

DeepSeek V4 Flash

DeepSeek-V4-Flash

DeepSeek V4 Flash for text generation, reasoning, tool calling, and live streaming responses.

Chat Streaming Context window: 1,000,000 tokens Max output: 384,000 tokens
input per 1m tokens $0.190000
output per 1m tokens $0.510000
minimum hold $0.010000
Integration docs

DeepSeek V4 Pro

DeepSeek-V4-Pro

DeepSeek V4 Pro for text generation, reasoning, tool calling, and live streaming responses.

Chat Streaming Context window: 1,000,000 tokens Max output: 384,000 tokens
input per 1m tokens $1.740000
output per 1m tokens $3.480000
minimum hold $0.010000
Integration docs

GLM 4.7

glm-4.7

GLM 4.7 through Google Vertex AI Model Garden/MaaS with Omixa routing and streaming.

Chat Streaming Tools Context window: 200,000 tokens Max output: 128,000 tokens
input per 1m tokens $1.000000
cached input per 1m tokens $0.100000
output per 1m tokens $3.200000
Integration docs

GLM 5

glm-5

GLM 5 through Google Vertex AI Model Garden/MaaS with Omixa routing and streaming.

Chat Streaming Tools Context window: 200,000 tokens Max output: 128,000 tokens
input per 1m tokens $1.000000
cached input per 1m tokens $0.100000
output per 1m tokens $3.200000
Integration docs

GPT Chat Latest

gpt-chat-latest

GPT Chat Latest for text generation, reasoning, tool calling, and live streaming responses.

Chat Streaming Tools Context window: 128,000 tokens Max output: 16,384 tokens
input per 1m tokens $5.000000
cached input per 1m tokens $0.500000
output per 1m tokens $30.000000
Integration docs

GPT OSS 120B

gpt-oss-120b

GPT OSS 120B through Google Vertex AI Model Garden/MaaS with Omixa routing and streaming.

Chat Streaming Tools Context window: 131,072 tokens Max output: 65,536 tokens
input per 1m tokens $0.150000
output per 1m tokens $0.600000
minimum hold $0.010000
Integration docs

GPT OSS 20B

gpt-oss-20b

GPT OSS 20B through Google Vertex AI Model Garden/MaaS with Omixa routing and streaming.

Chat Streaming Tools Context window: 131,072 tokens Max output: 65,536 tokens
input per 1m tokens $0.070000
output per 1m tokens $0.300000
minimum hold $0.010000
Integration docs

GPT-4.1

gpt-4.1

GPT-4.1 for text generation, reasoning, tool calling, and live streaming responses.

Chat Streaming Tools Context window: 1,047,576 tokens Max output: 32,768 tokens
input per 1m tokens $2.000000
cached input per 1m tokens $0.500000
output per 1m tokens $8.000000
Integration docs

GPT-4.1 Mini

gpt-4.1-mini

GPT-4.1 Mini for text generation, reasoning, tool calling, and live streaming responses.

Chat Streaming Tools Context window: 1,047,576 tokens Max output: 32,768 tokens
input per 1m tokens $0.400000
cached input per 1m tokens $0.100000
output per 1m tokens $1.600000
Integration docs

GPT-4.1 Nano

gpt-4.1-nano

GPT-4.1 Nano for text generation, reasoning, tool calling, and live streaming responses.

Chat Streaming Tools Context window: 1,047,576 tokens Max output: 32,768 tokens
input per 1m tokens $0.100000
cached input per 1m tokens $0.025000
output per 1m tokens $0.400000
Integration docs

GPT-4o

gpt-4o

GPT-4o for text generation, reasoning, tool calling, and live streaming responses.

Chat Streaming Tools Context window: 128,000 tokens Max output: 4,096 tokens
input per 1m tokens $2.500000
cached input per 1m tokens $1.250000
output per 1m tokens $10.000000
Integration docs

GPT-4o Mini

gpt-4o-mini

GPT-4o Mini for text generation, reasoning, tool calling, and live streaming responses.

Chat Streaming Tools Context window: 128,000 tokens Max output: 16,384 tokens
input per 1m tokens $0.150000
cached input per 1m tokens $0.075000
output per 1m tokens $0.600000
Integration docs

GPT-5

gpt-5

GPT-5 for text generation, reasoning, tool calling, and live streaming responses.

Chat Streaming Tools Context window: 400,000 tokens Max output: 128,000 tokens
input per 1m tokens $1.250000
cached input per 1m tokens $0.125000
output per 1m tokens $10.000000
Integration docs

GPT-5 Chat

gpt-5-chat

GPT-5 Chat for text generation, reasoning, tool calling, and live streaming responses.

Chat Streaming Tools Context window: 128,000 tokens Max output: 16,384 tokens
input per 1m tokens $1.250000
cached input per 1m tokens $0.125000
output per 1m tokens $10.000000
Integration docs

GPT-5 Codex

gpt-5-codex

GPT-5 Codex for text generation, reasoning, tool calling, and live streaming responses.

Chat Streaming Tools Context window: 400,000 tokens Max output: 128,000 tokens
input per 1m tokens $1.250000
cached input per 1m tokens $0.125000
output per 1m tokens $10.000000
Integration docs

GPT-5 Mini

gpt-5-mini

GPT-5 Mini for text generation, reasoning, tool calling, and live streaming responses.

Chat Streaming Tools Context window: 400,000 tokens Max output: 128,000 tokens
input per 1m tokens $0.250000
cached input per 1m tokens $0.025000
output per 1m tokens $2.000000
Integration docs

GPT-5 Nano

gpt-5-nano

GPT-5 Nano for text generation, reasoning, tool calling, and live streaming responses.

Chat Streaming Tools Context window: 400,000 tokens Max output: 128,000 tokens
input per 1m tokens $0.050000
cached input per 1m tokens $0.005000
output per 1m tokens $0.400000
Integration docs

GPT-5 Pro

gpt-5-pro

GPT-5 Pro for text generation, reasoning, tool calling, and live streaming responses.

Chat Tools Context window: 400,000 tokens Max output: 128,000 tokens
input per 1m tokens $15.000000
output per 1m tokens $120.000000
minimum hold $0.010000
Integration docs

GPT-5.1

gpt-5.1

GPT-5.1 for text generation, reasoning, tool calling, and live streaming responses.

Chat Streaming Tools Context window: 400,000 tokens Max output: 128,000 tokens
input per 1m tokens $1.250000
cached input per 1m tokens $0.125000
output per 1m tokens $10.000000
Integration docs

GPT-5.1 Chat

gpt-5.1-chat

GPT-5.1 Chat for text generation, reasoning, tool calling, and live streaming responses.

Chat Streaming Tools Context window: 128,000 tokens Max output: 16,384 tokens
input per 1m tokens $1.250000
cached input per 1m tokens $0.125000
output per 1m tokens $10.000000
Integration docs

GPT-5.1 Codex

gpt-5.1-codex

GPT-5.1 Codex for text generation, reasoning, tool calling, and live streaming responses.

Chat Streaming Tools Context window: 400,000 tokens Max output: 128,000 tokens
input per 1m tokens $1.250000
cached input per 1m tokens $0.125000
output per 1m tokens $10.000000
Integration docs

GPT-5.1 Codex Max

gpt-5.1-codex-max

GPT-5.1 Codex Max for text generation, reasoning, tool calling, and live streaming responses.

Chat Streaming Tools Context window: 400,000 tokens Max output: 128,000 tokens
input per 1m tokens $1.250000
cached input per 1m tokens $0.125000
output per 1m tokens $10.000000
Integration docs

GPT-5.1 Codex Mini

gpt-5.1-codex-mini

GPT-5.1 Codex Mini for text generation, reasoning, tool calling, and live streaming responses.

Chat Streaming Tools Context window: 400,000 tokens Max output: 128,000 tokens
input per 1m tokens $0.250000
cached input per 1m tokens $0.025000
output per 1m tokens $2.000000
Integration docs

GPT-5.2

gpt-5.2

GPT-5.2 for text generation, reasoning, tool calling, and live streaming responses.

Chat Streaming Tools Context window: 400,000 tokens Max output: 128,000 tokens
input per 1m tokens $1.750000
cached input per 1m tokens $0.175000
output per 1m tokens $14.000000
Integration docs

GPT-5.2 Chat

gpt-5.2-chat

GPT-5.2 Chat for text generation, reasoning, tool calling, and live streaming responses.

Chat Streaming Tools Context window: 128,000 tokens Max output: 16,384 tokens
input per 1m tokens $1.750000
cached input per 1m tokens $0.175000
output per 1m tokens $14.000000
Integration docs

GPT-5.2 Codex

gpt-5.2-codex

GPT-5.2 Codex for text generation, reasoning, tool calling, and live streaming responses.

Chat Streaming Tools Context window: 400,000 tokens Max output: 128,000 tokens
input per 1m tokens $1.750000
cached input per 1m tokens $0.175000
output per 1m tokens $14.000000
Integration docs

GPT-5.3 Chat

gpt-5.3-chat

GPT-5.3 Chat for text generation, reasoning, tool calling, and live streaming responses.

Chat Streaming Tools Context window: 128,000 tokens Max output: 16,384 tokens
input per 1m tokens $1.750000
cached input per 1m tokens $0.175000
output per 1m tokens $14.000000
Integration docs

GPT-5.3 Codex

gpt-5.3-codex

GPT-5.3 Codex for text generation, reasoning, tool calling, and live streaming responses.

Chat Streaming Tools Context window: 400,000 tokens Max output: 128,000 tokens
input per 1m tokens $1.750000
cached input per 1m tokens $0.175000
output per 1m tokens $14.000000
Integration docs

GPT-5.4

gpt-5.4

GPT-5.4 for text generation, reasoning, tool calling, and live streaming responses.

Chat Streaming Tools Context window: 922,000 tokens Max output: 128,000 tokens
input per 1m tokens $2.500000
cached input per 1m tokens $0.250000
output per 1m tokens $15.000000
Integration docs

GPT-5.4 Mini

gpt-5.4-mini

GPT-5.4 Mini for text generation, reasoning, tool calling, and live streaming responses.

Chat Streaming Tools Context window: 272,000 tokens Max output: 128,000 tokens
input per 1m tokens $0.750000
cached input per 1m tokens $0.075000
output per 1m tokens $4.500000
Integration docs

GPT-5.4 Nano

gpt-5.4-nano

GPT-5.4 Nano for text generation, reasoning, tool calling, and live streaming responses.

Chat Streaming Tools Context window: 272,000 tokens Max output: 128,000 tokens
input per 1m tokens $0.200000
cached input per 1m tokens $0.020000
output per 1m tokens $1.250000
Integration docs

GPT-5.4 Pro

gpt-5.4-pro

GPT-5.4 Pro for text generation, reasoning, tool calling, and live streaming responses.

Chat Streaming Tools Context window: 1,050,000 tokens Max output: 128,000 tokens
input per 1m tokens $30.000000
output per 1m tokens $180.000000
minimum hold $0.010000
Integration docs

GPT-5.5

gpt-5.5

GPT-5.5 for language generation, reasoning, tool calling, and streaming chat responses.

Chat Streaming Tools Context window: 922,000 tokens Max output: 128,000 tokens
input per 1m tokens $5.000000
cached input per 1m tokens $0.500000
output per 1m tokens $30.000000
Integration docs

Gemini 2.0 Flash

gemini-2.0-flash

Gemini 2.0 Flash for text generation, reasoning, tool calling, and live streaming responses.

Chat Streaming Tools Context window: 1,048,576 tokens Max output: 8,192 tokens
input per 1m tokens $0.150000
output per 1m tokens $0.600000
minimum hold $0.010000
Integration docs

Gemini 2.0 Flash-Lite

gemini-2.0-flash-lite

Gemini 2.0 Flash-Lite for text generation, reasoning, tool calling, and live streaming responses.

Chat Streaming Tools Context window: 1,048,576 tokens Max output: 8,192 tokens
input per 1m tokens $0.075000
output per 1m tokens $0.300000
minimum hold $0.010000
Integration docs

Gemini 2.5 Flash

gemini-2.5-flash

Gemini 2.5 Flash for text generation, reasoning, tool calling, and live streaming responses.

Chat Streaming Tools Context window: 1,048,576 tokens Max output: 65,536 tokens
input per 1m tokens $0.300000
cached input per 1m tokens $0.030000
output per 1m tokens $2.500000
Integration docs

Gemini 2.5 Flash-Lite

gemini-2.5-flash-lite

Gemini 2.5 Flash-Lite for text generation, reasoning, tool calling, and live streaming responses.

Chat Streaming Tools Context window: 1,048,576 tokens Max output: 65,536 tokens
input per 1m tokens $0.100000
cached input per 1m tokens $0.010000
output per 1m tokens $0.400000
Integration docs

Gemini 2.5 Pro

gemini-2.5-pro

Gemini 2.5 Pro for text generation, reasoning, tool calling, and live streaming responses.

Chat Streaming Tools Context window: 1,048,576 tokens Max output: 65,536 tokens
input per 1m tokens $1.250000
cached input per 1m tokens $0.125000
output per 1m tokens $10.000000
Integration docs

Gemini 3 Flash Preview

gemini-3-flash-preview

Gemini 3 Flash Preview for text generation, reasoning, tool calling, and live streaming responses.

Chat Streaming Tools Context window: 1,048,576 tokens Max output: 65,536 tokens
input per 1m tokens $0.500000
cached input per 1m tokens $0.050000
output per 1m tokens $3.000000
Integration docs

Gemini 3.1 Flash-Lite

gemini-3.1-flash-lite

Gemini 3.1 Flash-Lite for text generation, reasoning, tool calling, and live streaming responses.

Chat Streaming Tools Context window: 1,048,576 tokens Max output: 65,536 tokens
input per 1m tokens $0.250000
cached input per 1m tokens $0.025000
output per 1m tokens $1.500000
Integration docs

Gemini 3.1 Pro Preview

gemini-3.1-pro-preview

Gemini 3.1 Pro Preview for text generation, reasoning, tool calling, and live streaming responses.

Chat Streaming Tools Context window: 1,048,576 tokens Max output: 65,536 tokens
input per 1m tokens $2.000000
cached input per 1m tokens $0.200000
output per 1m tokens $12.000000
Integration docs

Gemini 3.1 Pro Preview Custom Tools

gemini-3.1-pro-preview-customtools

Gemini 3.1 Pro Preview Custom Tools for text generation, reasoning, tool calling, and live streaming responses.

Chat Streaming Tools Context window: 1,048,576 tokens Max output: 65,536 tokens
input per 1m tokens $2.000000
cached input per 1m tokens $0.200000
output per 1m tokens $12.000000
Integration docs

Gemini 3.5 Flash

gemini-3.5-flash

Gemini 3.5 Flash for text generation, reasoning, tool calling, and live streaming responses.

Chat Streaming Tools Context window: 1,048,576 tokens Max output: 65,536 tokens
input per 1m tokens $1.500000
cached input per 1m tokens $0.150000
output per 1m tokens $9.000000
Integration docs

Gemini Deep Research Max Preview

gemini-deep-research-max-preview

Gemini Deep Research Max Preview for text generation, reasoning, tool calling, and live streaming responses.

Chat Streaming Tools Streaming supported Tool/function calling supported
input per 1m tokens $2.000000
cached input per 1m tokens $0.200000
output per 1m tokens $12.000000
Integration docs

Gemini Deep Research Preview

gemini-deep-research-preview

Gemini Deep Research Preview for text generation, reasoning, tool calling, and live streaming responses.

Chat Streaming Tools Streaming supported Tool/function calling supported
input per 1m tokens $2.000000
cached input per 1m tokens $0.200000
output per 1m tokens $12.000000
Integration docs

Gemini Flash Latest

gemini-flash-latest

Gemini Flash Latest for text generation, reasoning, tool calling, and streaming responses.

Chat Streaming Tools Context window: 1,048,576 tokens Max output: 65,536 tokens
input per 1m tokens $0.500000
cached input per 1m tokens $0.050000
output per 1m tokens $3.000000
Integration docs

Gemini Robotics-ER 1.6 Preview

gemini-robotics-er-1.6-preview

Gemini Robotics-ER 1.6 Preview for text generation, reasoning, tool calling, and streaming responses.

Chat Streaming Tools Streaming supported Tool/function calling supported
input per 1m tokens $1.000000
output per 1m tokens $5.000000
minimum hold $0.010000
Integration docs

Grok 3

grok-3

Grok 3 for text generation, reasoning, tool calling, and streaming responses.

Chat Streaming Tools Context window: 131,072 tokens Max output: 8,192 tokens
input per 1m tokens $3.000000
output per 1m tokens $15.000000
minimum hold $0.010000
Integration docs

Grok 3 Mini

grok-3-mini

Grok 3 Mini for text generation, reasoning, tool calling, and streaming responses.

Chat Streaming Tools Context window: 131,072 tokens Max output: 8,192 tokens
input per 1m tokens $0.250000
output per 1m tokens $1.270000
minimum hold $0.010000
Integration docs

Grok 4

grok-4

Grok 4 for text generation, reasoning, tool calling, and live streaming responses.

Chat Streaming Tools Context window: 262,000 tokens Max output: 8,192 tokens
input per 1m tokens $3.000000
output per 1m tokens $15.000000
minimum hold $0.010000
Integration docs

Grok 4 Fast Non Reasoning

grok-4-fast-non-reasoning

Grok 4 Fast Non Reasoning for text generation, reasoning, tool calling, and streaming responses.

Chat Streaming Tools Context window: 262,000 tokens Max output: 8,192 tokens
input per 1m tokens $0.200000
output per 1m tokens $0.500000
minimum hold $0.010000
Integration docs

Grok 4 Fast Reasoning

grok-4-fast-reasoning

Grok 4 Fast Reasoning for text generation, reasoning, tool calling, and streaming responses.

Chat Streaming Tools Context window: 262,000 tokens Max output: 8,192 tokens
input per 1m tokens $0.200000
output per 1m tokens $0.500000
minimum hold $0.010000
Integration docs

Grok 4.1 Fast (Non-Reasoning)

grok-4.1-fast-non-reasoning

Grok 4.1 Fast (Non-Reasoning) through Google Vertex AI Model Garden/MaaS with Omixa routing and streaming.

Chat Streaming Tools Context window: 128,000 tokens Max output: 128,000 tokens
input per 1m tokens $0.200000
cached input per 1m tokens $0.050000
output per 1m tokens $0.500000
Integration docs

Grok 4.1 Fast (Reasoning)

grok-4.1-fast-reasoning

Grok 4.1 Fast (Reasoning) through Google Vertex AI Model Garden/MaaS with Omixa routing and streaming.

Chat Streaming Tools Context window: 128,000 tokens Max output: 128,000 tokens
input per 1m tokens $0.200000
cached input per 1m tokens $0.050000
output per 1m tokens $0.500000
Integration docs

Grok 4.20 (Non-Reasoning)

grok-4-20-non-reasoning

Grok 4.20 (Non-Reasoning) through Google Vertex AI Model Garden/MaaS with Omixa routing and streaming.

Chat Streaming Tools Context window: 2,000,000 tokens Max output: 8,192 tokens
input per 1m tokens $1.250000
cached input per 1m tokens $0.200000
output per 1m tokens $2.500000
Integration docs

Grok 4.20 (Reasoning)

grok-4-20-reasoning

Grok 4.20 (Reasoning) through Google Vertex AI Model Garden/MaaS with Omixa routing and streaming.

Chat Streaming Tools Context window: 2,000,000 tokens Max output: 8,192 tokens
input per 1m tokens $1.250000
cached input per 1m tokens $0.200000
output per 1m tokens $2.500000
Integration docs

Grok 4.3

grok-4.3

Grok 4.3 for text generation, reasoning, tool calling, and live streaming responses.

Chat Streaming Tools Context window: 200,000 tokens Max output: 8,192 tokens
input per 1m tokens $1.250000
cached input per 1m tokens $0.200000
output per 1m tokens $2.500000
Integration docs

Grok Code Fast 1

grok-code-fast-1

Grok Code Fast 1 for text generation, reasoning, tool calling, and live streaming responses.

Chat Streaming Tools Context window: 256,000 tokens Max output: 8,192 tokens
input per 1m tokens $0.200000
output per 1m tokens $1.500000
minimum hold $0.010000
Integration docs

Kimi K2 Thinking

Kimi-K2-Thinking

Kimi K2 Thinking through Google Vertex AI Model Garden/MaaS with Omixa routing and streaming.

Chat Streaming Tools Context window: 131,072 tokens Max output: 65,536 tokens
input per 1m tokens $1.045000
cached input per 1m tokens $0.176000
output per 1m tokens $4.400000
Integration docs

Kimi K2.5

Kimi-K2.5

Kimi K2.5 for text generation, reasoning, tool calling, and live streaming responses.

Chat Streaming Tools Context window: 262,144 tokens Max output: 262,144 tokens
input per 1m tokens $0.660000
cached input per 1m tokens $0.110000
output per 1m tokens $3.300000
Integration docs

Kimi K2.6

Kimi-K2.6

Kimi K2.6 for text generation, reasoning, tool calling, and live streaming responses.

Chat Streaming Tools Context window: 262,144 tokens Max output: 262,144 tokens
input per 1m tokens $1.045000
cached input per 1m tokens $0.176000
output per 1m tokens $4.400000
Integration docs

MAI DS R1

MAI-DS-R1

MAI DS R1 for text generation, reasoning, tool calling, and streaming responses.

Chat Streaming Context window: 163,840 tokens Max output: 163,840 tokens
input per 1m tokens $1.350000
output per 1m tokens $5.400000
minimum hold $0.010000
Integration docs

Meta Llama 3 405B Instruct

llama-3-405b-instruct

Meta Llama 3 405B Instruct through Google Vertex AI Model Garden/MaaS with Omixa routing and streaming.

Chat Streaming Tools Context window: 131,072 tokens Max output: 8,192 tokens
input per 1m tokens $2.700000
output per 1m tokens $2.700000
minimum hold $0.010000
Integration docs

Meta Llama 3 70B Instruct

llama-3-70b-instruct

Meta Llama 3 70B Instruct through Google Vertex AI Model Garden/MaaS with Omixa routing and streaming.

Chat Streaming Tools Context window: 8,192 tokens Max output: 8,192 tokens
input per 1m tokens $0.710000
output per 1m tokens $0.710000
minimum hold $0.010000
Integration docs

Meta Llama 3 8B Instruct

llama-3-8b-instruct

Meta Llama 3 8B Instruct through Google Vertex AI Model Garden/MaaS with Omixa routing and streaming.

Chat Streaming Tools Context window: 8,192 tokens Max output: 8,192 tokens
input per 1m tokens $0.200000
output per 1m tokens $0.200000
minimum hold $0.010000
Integration docs

Meta Llama 3.2 90B Instruct

llama-3.2-90b-instruct

Meta Llama 3.2 90B Instruct through Google Vertex AI Model Garden/MaaS with Omixa routing and streaming.

Chat Streaming Tools Context window: 131,072 tokens Max output: 8,192 tokens
input per 1m tokens $0.900000
output per 1m tokens $0.900000
minimum hold $0.010000
Integration docs

Meta Llama 3.3 70B Instruct

Llama-3.3-70B-Instruct

Meta Llama 3.3 70B Instruct through Google Vertex AI Model Garden/MaaS with Omixa routing and streaming.

Chat Streaming Tools Context window: 131,072 tokens Max output: 8,192 tokens
input per 1m tokens $0.710000
output per 1m tokens $0.710000
minimum hold $0.010000
Integration docs

Meta Llama 4 Maverick Instruct

Llama-4-Maverick-17B-128E-Instruct-FP8

Meta Llama 4 Maverick Instruct through Google Vertex AI Model Garden/MaaS with Omixa routing and streaming.

Chat Streaming Tools Context window: 524,288 tokens Max output: 8,192 tokens
input per 1m tokens $0.350000
output per 1m tokens $1.150000
minimum hold $0.010000
Integration docs

Meta Llama 4 Scout Instruct

llama-4-scout-17b-16e-instruct

Meta Llama 4 Scout Instruct through Google Vertex AI Model Garden/MaaS with Omixa routing and streaming.

Chat Streaming Tools Context window: 1,048,576 tokens Max output: 8,192 tokens
input per 1m tokens $0.180000
output per 1m tokens $0.590000
minimum hold $0.010000
Integration docs

MiniMax M2

MiniMax-M2

MiniMax M2 through Google Vertex AI Model Garden/MaaS with Omixa routing and streaming.

Chat Streaming Tools Context window: 196,608 tokens Max output: 196,608 tokens
input per 1m tokens $0.300000
cached input per 1m tokens $0.030000
output per 1m tokens $1.200000
Integration docs

Mistral Document AI 2505

mistral-document-ai-2505

Mistral Document AI 2505 for text generation, reasoning, tool calling, and streaming responses.

Chat Streaming Streaming supported
minimum hold $0.010000
Integration docs

Mistral Document AI 2512

mistral-document-ai-2512

Mistral Document AI 2512 for text generation, reasoning, tool calling, and streaming responses.

Chat Streaming Streaming supported
minimum hold $0.010000
Integration docs

Mistral Large 3

Mistral-Large-3

Mistral Large 3 for text generation, reasoning, tool calling, and live streaming responses.

Chat Streaming Tools Streaming supported Tool/function calling supported
input per 1m tokens $0.500000
output per 1m tokens $1.500000
minimum hold $0.010000
Integration docs

Model Router

model-router

Model Router for text generation, reasoning, tool calling, and live streaming responses.

Chat Streaming Tools Streaming supported Tool/function calling supported
input per 1m tokens $2.000000
cached input per 1m tokens $0.500000
output per 1m tokens $8.000000
Integration docs

Qwen3 235B A22B Instruct 2507

qwen3-235b-a22b-instruct-2507

Qwen3 235B A22B Instruct 2507 through Google Vertex AI Model Garden/MaaS with Omixa routing and streaming.

Chat Streaming Tools Context window: 262,144 tokens Max output: 65,536 tokens
input per 1m tokens $0.220000
cached input per 1m tokens $0.022000
output per 1m tokens $1.800000
Integration docs

Qwen3 Coder 480B A35B Instruct

qwen3-coder-480b-a35b-instruct

Qwen3 Coder 480B A35B Instruct through Google Vertex AI Model Garden/MaaS with Omixa routing and streaming.

Chat Streaming Tools Context window: 262,144 tokens Max output: 65,536 tokens
input per 1m tokens $0.220000
cached input per 1m tokens $0.022000
output per 1m tokens $1.800000
Integration docs

Qwen3 Next 80B A3B Instruct

qwen3-next-80b-a3b-instruct

Qwen3 Next 80B A3B Instruct through Google Vertex AI Model Garden/MaaS with Omixa routing and streaming.

Chat Streaming Tools Context window: 262,144 tokens Max output: 65,536 tokens
input per 1m tokens $0.220000
cached input per 1m tokens $0.022000
output per 1m tokens $1.800000
Integration docs

Qwen3 Next 80B A3B Thinking

qwen3-next-80b-a3b-thinking

Qwen3 Next 80B A3B Thinking through Google Vertex AI Model Garden/MaaS with Omixa routing and streaming.

Chat Streaming Tools Context window: 262,144 tokens Max output: 65,536 tokens
input per 1m tokens $0.220000
cached input per 1m tokens $0.022000
output per 1m tokens $1.800000
Integration docs

o1

o1

o1 for text generation, reasoning, tool calling, and live streaming responses.

Chat Tools Context window: 200,000 tokens Max output: 100,000 tokens
input per 1m tokens $15.000000
cached input per 1m tokens $7.500000
output per 1m tokens $60.000000
Integration docs

o1 Mini

o1-mini

o1 Mini for text generation, reasoning, tool calling, and streaming responses.

Chat Tools Context window: 128,000 tokens Max output: 65,536 tokens
input per 1m tokens $15.000000
cached input per 1m tokens $7.500000
output per 1m tokens $60.000000
Integration docs

o1 Preview

o1-preview

o1 Preview for text generation, reasoning, tool calling, and streaming responses.

Chat Tools Context window: 128,000 tokens Max output: 32,768 tokens
input per 1m tokens $15.000000
cached input per 1m tokens $7.500000
output per 1m tokens $60.000000
Integration docs

o3

o3

o3 for text generation, reasoning, tool calling, and live streaming responses.

Chat Streaming Tools Context window: 200,000 tokens Max output: 100,000 tokens
input per 1m tokens $2.000000
cached input per 1m tokens $0.500000
output per 1m tokens $8.000000
Integration docs

o3 Mini

o3-mini

o3 Mini for text generation, reasoning, tool calling, and live streaming responses.

Chat Streaming Tools Context window: 200,000 tokens Max output: 100,000 tokens
input per 1m tokens $1.100000
cached input per 1m tokens $0.550000
output per 1m tokens $4.400000
Integration docs

o3 Pro

o3-pro

o3 Pro for text generation, reasoning, tool calling, and live streaming responses.

Chat Tools Context window: 200,000 tokens Max output: 100,000 tokens
input per 1m tokens $20.000000
output per 1m tokens $80.000000
minimum hold $0.010000
Integration docs

o4 Mini

o4-mini

o4 Mini for text generation, reasoning, tool calling, and live streaming responses.

Chat Streaming Tools Context window: 200,000 tokens Max output: 100,000 tokens
input per 1m tokens $1.100000
cached input per 1m tokens $0.275000
output per 1m tokens $4.400000
Integration docs
Model Reference

Image generation and editing models

Prompt-to-image, image editing, image merging, upscaling, and background removal. Endpoint: http://omixa.cloud/api/v1/images/generations

FLUX 1.1 Pro

FLUX-1.1-pro

FLUX 1.1 Pro for image generation or editing workflows.

Image Streaming Context window: 5,000 tokens Streaming supported
image per unit $0.040000
minimum hold $0.010000
Integration docs

FLUX.1 Kontext Pro

FLUX.1-Kontext-pro

FLUX.1 Kontext Pro for image generation or editing workflows.

Image Streaming Context window: 5,000 tokens Streaming supported
image per unit $0.040000
minimum hold $0.010000
Integration docs

FLUX.2 Flex

FLUX.2-flex

FLUX.2 Flex for image generation or editing workflows.

Image Streaming Context window: 32,000 tokens Streaming supported
image per unit $0.050000
minimum hold $0.010000
Integration docs

FLUX.2 Pro

FLUX.2-pro

FLUX.2 Pro for image generation or editing workflows.

Image Streaming Context window: 32,000 tokens Streaming supported
image per unit $0.015000
minimum hold $0.010000
Integration docs

GPT Image 1

gpt-image-1

GPT Image 1 for prompt-based image generation through the Omixa resale gateway.

Image Streaming Streaming supported
input per 1m tokens $5.000000
cached input per 1m tokens $1.250000
output per 1m tokens $40.000000
Integration docs

GPT Image 1 Mini

gpt-image-1-mini

GPT Image 1 Mini for prompt-based image generation through the Omixa resale gateway.

Image Streaming Streaming supported
input per 1m tokens $2.000000
cached input per 1m tokens $0.200000
output per 1m tokens $8.000000
Integration docs

GPT Image 1.5

gpt-image-1.5

GPT Image 1.5 for prompt-based image generation through the Omixa resale gateway.

Image Streaming Streaming supported
input per 1m tokens $5.000000
cached input per 1m tokens $1.250000
output per 1m tokens $32.000000
Integration docs

GPT Image 2

gpt-image-2

GPT Image 2 for prompt-based image generation through the Omixa resale gateway.

Image Streaming Streaming supported
input per 1m tokens $5.000000
cached input per 1m tokens $1.250000
output per 1m tokens $30.000000
Integration docs

Imagen 3 Capability

imagen-3.0-capability-001

Imagen 3 Capability for prompt-based image generation through the Omixa resale gateway.

Image
image per unit $0.040000
minimum hold $0.010000
Integration docs

Imagen 3 Fast

imagen-3.0-fast-generate-001

Imagen 3 Fast for prompt-based image generation through the Omixa resale gateway.

Image
image per unit $0.020000
minimum hold $0.010000
Integration docs

Imagen 3 Generate 001

imagen-3.0-generate-001

Imagen 3 Generate 001 for prompt-based image generation through the Omixa resale gateway.

Image
image per unit $0.040000
minimum hold $0.010000
Integration docs

Imagen 3 Generate 002

imagen-3.0-generate-002

Imagen 3 Generate 002 for prompt-based image generation through the Omixa resale gateway.

Image
image per unit $0.040000
minimum hold $0.010000
Integration docs

Imagen 4

imagen-4.0-generate-001

Imagen 4 for prompt-based image generation through the Omixa resale gateway.

Image
image per unit $0.040000
minimum hold $0.010000
Integration docs

Imagen 4 Fast

imagen-4.0-fast-generate-001

Imagen 4 Fast for prompt-based image generation through the Omixa resale gateway.

Image
image per unit $0.020000
minimum hold $0.010000
Integration docs

Imagen 4 Ultra

imagen-4.0-ultra-generate-001

Imagen 4 Ultra for prompt-based image generation through the Omixa resale gateway.

Image
image per unit $0.060000
minimum hold $0.010000
Integration docs

MAI Image 2

MAI-Image-2

MAI Image 2 for image generation or editing workflows.

Image Streaming Context window: 32,000 tokens Streaming supported
input per 1m tokens $5.000000
output per 1m tokens $33.000000
image per unit $0.033000
Integration docs

MAI Image 2.5

MAI-Image-2.5

MAI Image 2.5 for image generation or editing workflows.

Image Streaming Context window: 32,000 tokens Streaming supported
input per 1m tokens $5.000000
output per 1m tokens $33.000000
image per unit $0.033000
Integration docs

MAI Image 2.5 Flash

MAI-Image-2.5-Flash

MAI Image 2.5 Flash for image generation or editing workflows.

Image Streaming Context window: 32,000 tokens Streaming supported
input per 1m tokens $5.000000
output per 1m tokens $19.500000
image per unit $0.019500
Integration docs

MAI Image 2e

MAI-Image-2e

MAI Image 2e for image generation or editing workflows.

Image Streaming Context window: 32,000 tokens Streaming supported
input per 1m tokens $5.000000
output per 1m tokens $19.500000
image per unit $0.019500
Integration docs

Nano Banana (Gemini 2.5 Flash Image)

gemini-3.1-flash-image

Gemini 3.1 Flash Image for prompt-based image generation through the Omixa resale gateway.

Image Streaming Context window: 131,072 tokens Max output: 32,768 tokens
input per 1m tokens $0.500000
output per 1m tokens $3.000000
image per unit $0.067000
Integration docs

Nano Banana (Gemini 2.5 Flash Image)

gemini-2.5-flash-image

Gemini 2.5 Flash Image for prompt-based image generation through the Omixa resale gateway.

Image Streaming Context window: 32,768 tokens Max output: 32,768 tokens
input per 1m tokens $0.300000
image per unit $0.039000
minimum hold $0.010000
Integration docs

Nano Banana (Gemini 2.5 Flash Image)

gemini-2.0-flash-preview-image-generation

Gemini 2.0 Flash Preview Image Generation for prompt-based image generation through the Omixa resale gateway.

Image Streaming Context window: 32,768 tokens Max output: 8,192 tokens
input per 1m tokens $0.150000
output per 1m tokens $0.600000
image per unit $0.040000
Integration docs

Nano Banana Pro (Gemini 3 Pro Image)

gemini-3-pro-image

Gemini 3 Pro Image for image generation or editing workflows.

Image Streaming Context window: 65,536 tokens Max output: 32,768 tokens
input per 1m tokens $2.000000
cached input per 1m tokens $0.200000
output per 1m tokens $12.000000
Integration docs

Stability Conservative Upscale

stability-upscale-conservative

Stability AI upscaling that preserves the original image with minimal reinterpretation.

Image
image per unit $0.400000
minimum hold $0.010000
Integration docs

Stability Creative Upscale

stability-upscale-creative

Prompt-guided Stability AI upscaling for highly degraded or low-resolution images.

Image
image per unit $0.600000
minimum hold $0.010000
Integration docs

Stability Fast Upscale

stability-upscale-fast

Low-cost Stability AI upscaling that increases image resolution quickly.

Image
image per unit $0.020000
minimum hold $0.010000
Integration docs

Stability Remove Background

stability-remove-background

Stability AI background removal that preserves the foreground subject.

Image
image per unit $0.050000
minimum hold $0.010000
Integration docs
Model Reference

Video generation models

Text-to-video, image-to-video, first/last frame control, reference images, extend, and remix jobs. Endpoint: http://omixa.cloud/api/v1/videos/jobs

Sora

sora

Sora for video generation jobs and long-running creative workflows.

Video
video per second $0.100000
minimum hold $0.010000
Integration docs

Sora 2

sora-2

Sora 2 for video generation jobs and long-running creative workflows.

Video
video per second $0.100000
minimum hold $0.010000
Integration docs

Veo 2 Generate 001

veo-2.0-generate-001

Veo 2 Generate 001 for video generation jobs and long-running creative workflows.

Video
video per second $0.500000
minimum hold $0.010000
Integration docs

Veo 2 Generate Experimental

veo-2.0-generate-exp

Veo 2 Generate Experimental for video generation jobs and long-running creative workflows.

Video
video per second $0.500000
minimum hold $0.010000
Integration docs

Veo 2 Generate Preview

veo-2.0-generate-preview

Veo 2 Generate Preview for video generation jobs and long-running creative workflows.

Video
video per second $0.500000
minimum hold $0.010000
Integration docs

Veo 3 Fast Generate 001

veo-3.0-fast-generate-001

Veo 3 Fast Generate 001 for video generation jobs and long-running creative workflows.

Video
video per second $0.100000
minimum hold $0.010000
Integration docs

Veo 3 Generate 001

veo-3.0-generate-001

Veo 3 Generate 001 for video generation jobs and long-running creative workflows.

Video
video per second $0.400000
minimum hold $0.010000
Integration docs

Veo 3.1

veo-3.1-generate-001

Veo 3.1 for Vertex Veo video generation, image-to-video, reference-guided video, and extension workflows.

Video
video per second $0.400000
minimum hold $0.010000
Integration docs

Veo 3.1 Fast

veo-3.1-fast-generate-001

Veo 3.1 Fast for Vertex Veo video generation, image-to-video, reference-guided video, and extension workflows.

Video
video per second $0.100000
minimum hold $0.010000
Integration docs

Veo 3.1 Fast Preview

veo-3.1-fast-generate-preview

Veo 3.1 Fast Preview for video generation jobs and long-running creative workflows.

Video
video per second $0.100000
minimum hold $0.010000
Integration docs

Veo 3.1 Lite

veo-3.1-lite-generate-001

Veo 3.1 Lite for Vertex Veo video generation, image-to-video, reference-guided video, and extension workflows.

Video
video per second $0.050000
minimum hold $0.010000
Integration docs

Veo 3.1 Lite Preview

veo-3.1-lite-generate-preview

Veo 3.1 Lite Preview for video generation jobs and long-running creative workflows.

Video
video per second $0.050000
minimum hold $0.010000
Integration docs

Veo 3.1 Preview

veo-3.1-generate-preview

Veo 3.1 Preview for video generation jobs and long-running creative workflows.

Video
video per second $0.400000
minimum hold $0.010000
Integration docs
Model Reference

Audio and speech models

Text-to-speech, GPT audio chat, realtime sessions, transcription-oriented catalog rows, and voice settings. Endpoint: http://omixa.cloud/api/v1/audio

Chirp

chirp

Chirp for speech, transcription, translation, or voice generation workflows.

Audio
minimum hold $0.010000
Integration docs

Chirp 2

chirp-2

Chirp 2 for speech, transcription, translation, or voice generation workflows.

Audio
minimum hold $0.010000
Integration docs

Chirp 3

chirp-3

Chirp 3 for speech, transcription, translation, or voice generation workflows.

Audio
minimum hold $0.010000
Integration docs

GPT Audio

gpt-audio

GPT Audio for speech, transcription, translation, or voice generation workflows.

Audio Context window: 128,000 tokens Max output: 16,384 tokens
input per 1m tokens $2.500000
output per 1m tokens $10.000000
minimum hold $0.010000
Integration docs

GPT Audio 1.5

gpt-audio-1.5

GPT Audio 1.5 for speech, transcription, translation, or voice generation workflows.

Audio Context window: 128,000 tokens Max output: 16,384 tokens
input per 1m tokens $2.500000
output per 1m tokens $10.000000
minimum hold $0.010000
Integration docs

GPT Audio Mini

gpt-audio-mini

GPT Audio Mini for speech, transcription, translation, or voice generation workflows.

Audio Context window: 128,000 tokens Max output: 16,384 tokens
input per 1m tokens $0.600000
output per 1m tokens $2.400000
minimum hold $0.010000
Integration docs

GPT Realtime

gpt-realtime

GPT Realtime for speech, transcription, translation, or voice generation workflows.

Audio Context window: 32,000 tokens Max output: 4,096 tokens
input per 1m tokens $4.000000
cached input per 1m tokens $0.400000
output per 1m tokens $16.000000
Integration docs

GPT Realtime 1.5

gpt-realtime-1.5

GPT Realtime 1.5 for speech, transcription, translation, or voice generation workflows.

Audio Context window: 32,000 tokens Max output: 4,096 tokens
input per 1m tokens $4.000000
cached input per 1m tokens $0.400000
output per 1m tokens $16.000000
Integration docs

GPT Realtime 2

gpt-realtime-2

GPT Realtime 2 for speech, transcription, translation, or voice generation workflows.

Audio Context window: 32,000 tokens Max output: 4,096 tokens
input per 1m tokens $4.000000
cached input per 1m tokens $0.400000
output per 1m tokens $24.000000
Integration docs

GPT Realtime Mini

gpt-realtime-mini

GPT Realtime Mini for speech, transcription, translation, or voice generation workflows.

Audio Context window: 32,000 tokens Max output: 4,096 tokens
input per 1m tokens $0.600000
cached input per 1m tokens $0.060000
output per 1m tokens $2.400000
Integration docs

GPT-4o Audio Preview

gpt-4o-audio-preview

GPT-4o Audio Preview for speech, transcription, translation, or voice generation workflows.

Audio Context window: 128,000 tokens Max output: 16,384 tokens
input per 1m tokens $2.500000
output per 1m tokens $10.000000
minimum hold $0.010000
Integration docs

GPT-4o Mini Audio Preview

gpt-4o-mini-audio-preview

GPT-4o Mini Audio Preview for speech, transcription, translation, or voice generation workflows.

Audio Context window: 128,000 tokens Max output: 16,384 tokens
input per 1m tokens $0.150000
output per 1m tokens $0.600000
minimum hold $0.010000
Integration docs

GPT-4o Mini Realtime Preview

gpt-4o-mini-realtime-preview

GPT-4o Mini Realtime Preview for speech, transcription, translation, or voice generation workflows.

Audio Context window: 128,000 tokens Max output: 4,096 tokens
input per 1m tokens $0.600000
cached input per 1m tokens $0.300000
output per 1m tokens $2.400000
Integration docs

GPT-4o Mini TTS

gpt-4o-mini-tts

GPT-4o Mini TTS for speech, transcription, translation, or voice generation workflows.

Audio
input per 1m tokens $0.600000
output per 1m tokens $12.000000
audio per minute $0.020000
Integration docs

GPT-4o Mini Transcribe

gpt-4o-mini-transcribe

GPT-4o Mini Transcribe for speech, transcription, translation, or voice generation workflows.

Audio
input per 1m tokens $1.250000
output per 1m tokens $5.000000
audio per minute $0.003000
Integration docs

GPT-4o Realtime Preview

gpt-4o-realtime-preview

GPT-4o Realtime Preview for speech, transcription, translation, or voice generation workflows.

Audio Context window: 32,000 tokens Max output: 4,096 tokens
input per 1m tokens $5.000000
cached input per 1m tokens $2.500000
output per 1m tokens $20.000000
Integration docs

GPT-4o Transcribe

gpt-4o-transcribe

GPT-4o Transcribe for speech, transcription, translation, or voice generation workflows.

Audio
input per 1m tokens $2.500000
output per 1m tokens $10.000000
audio per minute $0.006000
Integration docs

GPT-4o Transcribe Diarize

gpt-4o-transcribe-diarize

GPT-4o Transcribe Diarize for speech, transcription, translation, or voice generation workflows.

Audio
input per 1m tokens $2.500000
output per 1m tokens $10.000000
minimum hold $0.010000
Integration docs

Gemini 2.0 Flash Live

gemini-2.0-flash-live-001

Gemini 2.0 Flash Live for speech, transcription, translation, or voice generation workflows.

Audio Streaming Tools Context window: 1,048,576 tokens Max output: 8,192 tokens
input per 1m tokens $0.500000
output per 1m tokens $2.000000
audio per minute $0.018000
Integration docs

Gemini 2.5 Flash Live Preview

gemini-2.5-flash-live-preview

Gemini 2.5 Flash Live Preview for speech, transcription, translation, or voice generation workflows.

Audio Streaming Tools Context window: 1,048,576 tokens Max output: 8,192 tokens
input per 1m tokens $0.500000
output per 1m tokens $2.000000
audio per minute $0.018000
Integration docs

Gemini 2.5 Flash TTS

gemini-2.5-flash-tts

Gemini 2.5 Flash TTS for speech, transcription, translation, or voice generation workflows.

Audio Streaming Streaming supported Reasoning controls: minimal, low, medium, high
input per 1m tokens $0.500000
output per 1m tokens $10.000000
audio per minute $0.015000
Integration docs

Gemini 2.5 Flash-Lite TTS Preview

gemini-2.5-flash-lite-preview-tts

Gemini 2.5 Flash-Lite TTS Preview for speech, transcription, translation, or voice generation workflows.

Audio Streaming Streaming supported Reasoning controls: minimal, low, medium, high
input per 1m tokens $0.500000
output per 1m tokens $10.000000
audio per minute $0.015000
Integration docs

Gemini 2.5 Pro TTS

gemini-2.5-pro-tts

Gemini 2.5 Pro TTS for speech, transcription, translation, or voice generation workflows.

Audio Streaming Streaming supported Reasoning controls: low, medium, high
input per 1m tokens $1.000000
output per 1m tokens $20.000000
audio per minute $0.030000
Integration docs

Gemini 3.1 Flash Live Preview

gemini-3.1-flash-live-preview

Gemini 3.1 Flash Live Preview for speech, transcription, translation, or voice generation workflows.

Audio Streaming Tools Context window: 1,048,576 tokens Max output: 8,192 tokens
input per 1m tokens $0.750000
output per 1m tokens $4.500000
audio per minute $0.018000
Integration docs

Gemini 3.1 Flash TTS Preview

gemini-3.1-flash-tts-preview

Gemini 3.1 Flash TTS Preview for speech, transcription, translation, or voice generation workflows.

Audio Streaming Streaming supported Reasoning controls: minimal, low, medium, high
input per 1m tokens $1.000000
output per 1m tokens $20.000000
audio per minute $0.030000
Integration docs

TTS

tts

TTS for speech, transcription, translation, or voice generation workflows.

Audio
audio per minute $0.020000
minimum hold $0.010000
Integration docs

TTS HD

tts-hd

TTS HD for speech, transcription, translation, or voice generation workflows.

Audio
audio per minute $0.020000
minimum hold $0.010000
Integration docs

Whisper

whisper

Whisper for speech, transcription, translation, or voice generation workflows.

Audio
audio per minute $0.020000
minimum hold $0.010000
Integration docs
Model Reference

Embedding models

Vector generation for semantic search, RAG, retrieval, clustering, ranking, and analytics. Endpoint: http://omixa.cloud/api/v1/embeddings

Cohere Embed v4.0

embed-v-4-0

Cohere Embed v4.0 for embeddings, retrieval, reranking, or vector analytics.

Embedding Context window: 512 tokens
input per 1m tokens $0.120000
minimum hold $0.010000
Integration docs

Cohere Rerank v4.0 Fast

Cohere-rerank-v4.0-fast

Cohere Rerank v4.0 Fast for embeddings, retrieval, reranking, or vector analytics.

Embedding
minimum hold $0.010000
Integration docs

Cohere Rerank v4.0 Pro

Cohere-rerank-v4.0-pro

Cohere Rerank v4.0 Pro for embeddings, retrieval, reranking, or vector analytics.

Embedding
minimum hold $0.010000
Integration docs

Embedding 001

embedding-001

Embedding 001 for semantic search, retrieval, ranking, and vector analytics.

Embedding Context window: 2,048 tokens
input per 1m tokens $0.150000
minimum hold $0.010000
Integration docs

Gemini Embedding

gemini-embedding-001

Gemini Embedding for semantic search, retrieval, ranking, and vector analytics.

Embedding Context window: 8,192 tokens
input per 1m tokens $0.150000
minimum hold $0.010000
Integration docs

Gemini Embedding 2

gemini-embedding-2

Gemini Embedding 2 for semantic search, retrieval, ranking, and vector analytics.

Embedding Context window: 8,192 tokens
input per 1m tokens $0.200000
minimum hold $0.010000
Integration docs

Text Embedding 004

text-embedding-004

Text Embedding 004 for semantic search, retrieval, ranking, and vector analytics.

Embedding Context window: 2,048 tokens
input per 1m tokens $0.150000
minimum hold $0.010000
Integration docs

Text Embedding 3 Large

text-embedding-3-large

Text Embedding 3 Large for semantic search, retrieval, ranking, and vector analytics.

Embedding Context window: 8,191 tokens
input per 1m tokens $0.143000
minimum hold $0.010000
Integration docs

Text Embedding 3 Small

text-embedding-3-small

Text Embedding 3 Small for semantic search, retrieval, ranking, and vector analytics.

Embedding Context window: 8,191 tokens
input per 1m tokens $0.022000
minimum hold $0.010000
Integration docs

Text Embedding Ada 002

text-embedding-ada-002

Text Embedding Ada 002 for semantic search, retrieval, ranking, and vector analytics.

Embedding Context window: 8,192 tokens
input per 1m tokens $0.110000
minimum hold $0.010000
Integration docs
Model Reference

Music generation models

Prompted music generation, clip creation, tempo control, and realtime music descriptors. Endpoint: http://omixa.cloud/api/v1/music/jobs

Lyria 2

lyria-002

Lyria 2 for music generation, clips, songs, or realtime music workflows.

Music
minimum hold $0.010000
Integration docs

Lyria 3 Clip Preview

lyria-3-clip-preview

Lyria 3 Clip Preview for music generation, clips, songs, or realtime music workflows.

Music Context window: 131,072 tokens
minimum hold $0.010000
Integration docs

Lyria 3 Pro Preview

lyria-3-pro-preview

Lyria 3 Pro Preview for music generation, clips, songs, or realtime music workflows.

Music Context window: 131,072 tokens
minimum hold $0.010000
Integration docs

Lyria Realtime Experimental

lyria-realtime-exp

Lyria Realtime Experimental for music generation, clips, songs, or realtime music workflows.

Music Streaming Streaming supported
minimum hold $0.010000
Integration docs
Copied Markdown