Chat and reasoning models

Developer documentation

Chat and reasoning models

Language, code, reasoning, multimodal chat, tool calling, and streaming responses.

Model Reference

Chat and reasoning models

Language, code, reasoning, multimodal chat, tool calling, and streaming responses. Endpoint: http://omixa.cloud/api/v1/chat/completions

AQA

aqa

AQA for text generation, reasoning, tool calling, and live streaming responses.

Chat Context window: 7,168 tokens Max output: 1,024 tokens
minimum hold $0.010000
Integration docs

Antigravity Agent Preview

antigravity-agent-preview

Antigravity Agent Preview for text generation, reasoning, tool calling, and live streaming responses.

Chat Streaming Tools Streaming supported Tool/function calling supported
input per 1m tokens $1.250000
output per 1m tokens $10.000000
minimum hold $0.010000
Integration docs

Azure Computer Use Preview

azure-computer-use-preview

Azure Computer Use Preview for text generation, reasoning, tool calling, and streaming responses.

Chat Streaming Tools Context window: 8,192 tokens Max output: 1,024 tokens
input per 1m tokens $2.000000
cached input per 1m tokens $0.500000
output per 1m tokens $8.000000
Integration docs

Claude Haiku 4.5

claude-haiku-4-5

Claude Haiku 4.5 for text generation, reasoning, tool calling, and live streaming responses.

Chat Streaming Tools Context window: 200,000 tokens Max output: 64,000 tokens
input per 1m tokens $1.000000
cached input per 1m tokens $0.100000
output per 1m tokens $5.000000
Integration docs

Claude Opus 4.1

claude-opus-4-1

Claude Opus 4.1 for text generation, reasoning, tool calling, and live streaming responses.

Chat Streaming Tools Context window: 200,000 tokens Max output: 32,000 tokens
input per 1m tokens $15.000000
cached input per 1m tokens $1.500000
output per 1m tokens $75.000000
Integration docs

Claude Opus 4.5

claude-opus-4-5

Claude Opus 4.5 for text generation, reasoning, tool calling, and live streaming responses.

Chat Streaming Tools Context window: 200,000 tokens Max output: 64,000 tokens
input per 1m tokens $5.000000
cached input per 1m tokens $0.500000
output per 1m tokens $25.000000
Integration docs

Claude Opus 4.6

claude-opus-4-6

Claude Opus 4.6 for text generation, reasoning, tool calling, and live streaming responses.

Chat Streaming Tools Context window: 1,000,000 tokens Max output: 128,000 tokens
input per 1m tokens $5.000000
cached input per 1m tokens $0.500000
output per 1m tokens $25.000000
Integration docs

Claude Opus 4.7

claude-opus-4-7

Claude Opus 4.7 for text generation, reasoning, tool calling, and live streaming responses.

Chat Streaming Tools Context window: 1,000,000 tokens Max output: 128,000 tokens
input per 1m tokens $5.000000
cached input per 1m tokens $0.500000
output per 1m tokens $25.000000
Integration docs

Claude Opus 4.8

claude-opus-4-8

Claude Opus 4.8 for text generation, reasoning, tool calling, and live streaming responses.

Chat Streaming Tools Context window: 1,000,000 tokens Max output: 128,000 tokens
input per 1m tokens $5.000000
cached input per 1m tokens $0.500000
output per 1m tokens $25.000000
Integration docs

Claude Sonnet 4.5

claude-sonnet-4-5

Claude Sonnet 4.5 for text generation, reasoning, tool calling, and live streaming responses.

Chat Streaming Tools Context window: 200,000 tokens Max output: 64,000 tokens
input per 1m tokens $3.000000
cached input per 1m tokens $0.300000
output per 1m tokens $15.000000
Integration docs

Claude Sonnet 4.6

claude-sonnet-4-6

Claude Sonnet 4.6 for text generation, reasoning, tool calling, and live streaming responses.

Chat Streaming Tools Context window: 1,000,000 tokens Max output: 128,000 tokens
input per 1m tokens $3.000000
cached input per 1m tokens $0.300000
output per 1m tokens $15.000000
Integration docs

Codex Mini

codex-mini

Codex Mini for text generation, reasoning, tool calling, and live streaming responses.

Chat Streaming Tools Context window: 200,000 tokens Max output: 100,000 tokens
input per 1m tokens $1.500000
cached input per 1m tokens $0.375000
output per 1m tokens $6.000000
Integration docs

Cohere Command A

Cohere-command-a

Cohere Command A for text generation, reasoning, tool calling, and live streaming responses.

Chat Streaming Tools Context window: 131,072 tokens Max output: 8,182 tokens
input per 1m tokens $2.500000
output per 1m tokens $10.000000
minimum hold $0.010000
Integration docs

Computer Use Preview

computer-use-preview

Computer Use Preview for text generation, reasoning, tool calling, and live streaming responses.

Chat Streaming Tools Streaming supported Tool/function calling supported
input per 1m tokens $1.250000
output per 1m tokens $10.000000
minimum hold $0.010000
Integration docs

DeepSeek OCR

DeepSeek-OCR

DeepSeek OCR through Google Vertex AI Model Garden/MaaS with Omixa routing and streaming.

Chat Streaming Context window: 32,768 tokens Max output: 8,192 tokens
input per 1m tokens $0.560000
output per 1m tokens $1.680000
minimum hold $0.010000
Integration docs

DeepSeek R1

DeepSeek-R1

DeepSeek R1 for text generation, reasoning, tool calling, and live streaming responses.

Chat Streaming Context window: 163,840 tokens Max output: 163,840 tokens
input per 1m tokens $1.350000
output per 1m tokens $5.400000
minimum hold $0.010000
Integration docs

DeepSeek R1 0528

DeepSeek-R1-0528

DeepSeek R1 0528 through Google Vertex AI Model Garden/MaaS with Omixa routing and streaming.

Chat Streaming Tools Context window: 163,840 tokens Max output: 32,768 tokens
input per 1m tokens $1.350000
output per 1m tokens $5.400000
minimum hold $0.010000
Integration docs

DeepSeek V3 0324

DeepSeek-V3-0324

DeepSeek V3 0324 for text generation, reasoning, tool calling, and live streaming responses.

Chat Streaming Tools Context window: 131,072 tokens Max output: 131,072 tokens
input per 1m tokens $1.140000
output per 1m tokens $4.560000
minimum hold $0.010000
Integration docs

DeepSeek V3.1

DeepSeek-V3.1

DeepSeek V3.1 through Google Vertex AI Model Garden/MaaS with Omixa routing and streaming.

Chat Streaming Tools Context window: 131,072 tokens Max output: 32,768 tokens
input per 1m tokens $1.230000
output per 1m tokens $4.940000
minimum hold $0.010000
Integration docs

DeepSeek V3.2

DeepSeek-V3.2

DeepSeek V3.2 through Google Vertex AI Model Garden/MaaS with Omixa routing and streaming.

Chat Streaming Tools Context window: 163,840 tokens Max output: 65,536 tokens
input per 1m tokens $0.560000
cached input per 1m tokens $0.056000
output per 1m tokens $1.680000
Integration docs

DeepSeek V3.2 Speciale

DeepSeek-V3.2-Speciale

DeepSeek V3.2 Speciale for text generation, reasoning, tool calling, and live streaming responses.

Chat Streaming Context window: 128,000 tokens Max output: 128,000 tokens
input per 1m tokens $0.580000
output per 1m tokens $1.680000
minimum hold $0.010000
Integration docs

DeepSeek V4 Flash

DeepSeek-V4-Flash

DeepSeek V4 Flash for text generation, reasoning, tool calling, and live streaming responses.

Chat Streaming Context window: 1,000,000 tokens Max output: 384,000 tokens
input per 1m tokens $0.190000
output per 1m tokens $0.510000
minimum hold $0.010000
Integration docs

DeepSeek V4 Pro

DeepSeek-V4-Pro

DeepSeek V4 Pro for text generation, reasoning, tool calling, and live streaming responses.

Chat Streaming Context window: 1,000,000 tokens Max output: 384,000 tokens
input per 1m tokens $1.740000
output per 1m tokens $3.480000
minimum hold $0.010000
Integration docs

GLM 4.7

glm-4.7

GLM 4.7 through Google Vertex AI Model Garden/MaaS with Omixa routing and streaming.

Chat Streaming Tools Context window: 200,000 tokens Max output: 128,000 tokens
input per 1m tokens $1.000000
cached input per 1m tokens $0.100000
output per 1m tokens $3.200000
Integration docs

GLM 5

glm-5

GLM 5 through Google Vertex AI Model Garden/MaaS with Omixa routing and streaming.

Chat Streaming Tools Context window: 200,000 tokens Max output: 128,000 tokens
input per 1m tokens $1.000000
cached input per 1m tokens $0.100000
output per 1m tokens $3.200000
Integration docs

GPT Chat Latest

gpt-chat-latest

GPT Chat Latest for text generation, reasoning, tool calling, and live streaming responses.

Chat Streaming Tools Context window: 128,000 tokens Max output: 16,384 tokens
input per 1m tokens $5.000000
cached input per 1m tokens $0.500000
output per 1m tokens $30.000000
Integration docs

GPT OSS 120B

gpt-oss-120b

GPT OSS 120B through Google Vertex AI Model Garden/MaaS with Omixa routing and streaming.

Chat Streaming Tools Context window: 131,072 tokens Max output: 65,536 tokens
input per 1m tokens $0.150000
output per 1m tokens $0.600000
minimum hold $0.010000
Integration docs

GPT OSS 20B

gpt-oss-20b

GPT OSS 20B through Google Vertex AI Model Garden/MaaS with Omixa routing and streaming.

Chat Streaming Tools Context window: 131,072 tokens Max output: 65,536 tokens
input per 1m tokens $0.070000
output per 1m tokens $0.300000
minimum hold $0.010000
Integration docs

GPT-4.1

gpt-4.1

GPT-4.1 for text generation, reasoning, tool calling, and live streaming responses.

Chat Streaming Tools Context window: 1,047,576 tokens Max output: 32,768 tokens
input per 1m tokens $2.000000
cached input per 1m tokens $0.500000
output per 1m tokens $8.000000
Integration docs

GPT-4.1 Mini

gpt-4.1-mini

GPT-4.1 Mini for text generation, reasoning, tool calling, and live streaming responses.

Chat Streaming Tools Context window: 1,047,576 tokens Max output: 32,768 tokens
input per 1m tokens $0.400000
cached input per 1m tokens $0.100000
output per 1m tokens $1.600000
Integration docs

GPT-4.1 Nano

gpt-4.1-nano

GPT-4.1 Nano for text generation, reasoning, tool calling, and live streaming responses.

Chat Streaming Tools Context window: 1,047,576 tokens Max output: 32,768 tokens
input per 1m tokens $0.100000
cached input per 1m tokens $0.025000
output per 1m tokens $0.400000
Integration docs

GPT-4o

gpt-4o

GPT-4o for text generation, reasoning, tool calling, and live streaming responses.

Chat Streaming Tools Context window: 128,000 tokens Max output: 4,096 tokens
input per 1m tokens $2.500000
cached input per 1m tokens $1.250000
output per 1m tokens $10.000000
Integration docs

GPT-4o Mini

gpt-4o-mini

GPT-4o Mini for text generation, reasoning, tool calling, and live streaming responses.

Chat Streaming Tools Context window: 128,000 tokens Max output: 16,384 tokens
input per 1m tokens $0.150000
cached input per 1m tokens $0.075000
output per 1m tokens $0.600000
Integration docs

GPT-5

gpt-5

GPT-5 for text generation, reasoning, tool calling, and live streaming responses.

Chat Streaming Tools Context window: 400,000 tokens Max output: 128,000 tokens
input per 1m tokens $1.250000
cached input per 1m tokens $0.125000
output per 1m tokens $10.000000
Integration docs

GPT-5 Chat

gpt-5-chat

GPT-5 Chat for text generation, reasoning, tool calling, and live streaming responses.

Chat Streaming Tools Context window: 128,000 tokens Max output: 16,384 tokens
input per 1m tokens $1.250000
cached input per 1m tokens $0.125000
output per 1m tokens $10.000000
Integration docs

GPT-5 Codex

gpt-5-codex

GPT-5 Codex for text generation, reasoning, tool calling, and live streaming responses.

Chat Streaming Tools Context window: 400,000 tokens Max output: 128,000 tokens
input per 1m tokens $1.250000
cached input per 1m tokens $0.125000
output per 1m tokens $10.000000
Integration docs

GPT-5 Mini

gpt-5-mini

GPT-5 Mini for text generation, reasoning, tool calling, and live streaming responses.

Chat Streaming Tools Context window: 400,000 tokens Max output: 128,000 tokens
input per 1m tokens $0.250000
cached input per 1m tokens $0.025000
output per 1m tokens $2.000000
Integration docs

GPT-5 Nano

gpt-5-nano

GPT-5 Nano for text generation, reasoning, tool calling, and live streaming responses.

Chat Streaming Tools Context window: 400,000 tokens Max output: 128,000 tokens
input per 1m tokens $0.050000
cached input per 1m tokens $0.005000
output per 1m tokens $0.400000
Integration docs

GPT-5 Pro

gpt-5-pro

GPT-5 Pro for text generation, reasoning, tool calling, and live streaming responses.

Chat Tools Context window: 400,000 tokens Max output: 128,000 tokens
input per 1m tokens $15.000000
output per 1m tokens $120.000000
minimum hold $0.010000
Integration docs

GPT-5.1

gpt-5.1

GPT-5.1 for text generation, reasoning, tool calling, and live streaming responses.

Chat Streaming Tools Context window: 400,000 tokens Max output: 128,000 tokens
input per 1m tokens $1.250000
cached input per 1m tokens $0.125000
output per 1m tokens $10.000000
Integration docs

GPT-5.1 Chat

gpt-5.1-chat

GPT-5.1 Chat for text generation, reasoning, tool calling, and live streaming responses.

Chat Streaming Tools Context window: 128,000 tokens Max output: 16,384 tokens
input per 1m tokens $1.250000
cached input per 1m tokens $0.125000
output per 1m tokens $10.000000
Integration docs

GPT-5.1 Codex

gpt-5.1-codex

GPT-5.1 Codex for text generation, reasoning, tool calling, and live streaming responses.

Chat Streaming Tools Context window: 400,000 tokens Max output: 128,000 tokens
input per 1m tokens $1.250000
cached input per 1m tokens $0.125000
output per 1m tokens $10.000000
Integration docs

GPT-5.1 Codex Max

gpt-5.1-codex-max

GPT-5.1 Codex Max for text generation, reasoning, tool calling, and live streaming responses.

Chat Streaming Tools Context window: 400,000 tokens Max output: 128,000 tokens
input per 1m tokens $1.250000
cached input per 1m tokens $0.125000
output per 1m tokens $10.000000
Integration docs

GPT-5.1 Codex Mini

gpt-5.1-codex-mini

GPT-5.1 Codex Mini for text generation, reasoning, tool calling, and live streaming responses.

Chat Streaming Tools Context window: 400,000 tokens Max output: 128,000 tokens
input per 1m tokens $0.250000
cached input per 1m tokens $0.025000
output per 1m tokens $2.000000
Integration docs

GPT-5.2

gpt-5.2

GPT-5.2 for text generation, reasoning, tool calling, and live streaming responses.

Chat Streaming Tools Context window: 400,000 tokens Max output: 128,000 tokens
input per 1m tokens $1.750000
cached input per 1m tokens $0.175000
output per 1m tokens $14.000000
Integration docs

GPT-5.2 Chat

gpt-5.2-chat

GPT-5.2 Chat for text generation, reasoning, tool calling, and live streaming responses.

Chat Streaming Tools Context window: 128,000 tokens Max output: 16,384 tokens
input per 1m tokens $1.750000
cached input per 1m tokens $0.175000
output per 1m tokens $14.000000
Integration docs

GPT-5.2 Codex

gpt-5.2-codex

GPT-5.2 Codex for text generation, reasoning, tool calling, and live streaming responses.

Chat Streaming Tools Context window: 400,000 tokens Max output: 128,000 tokens
input per 1m tokens $1.750000
cached input per 1m tokens $0.175000
output per 1m tokens $14.000000
Integration docs

GPT-5.3 Chat

gpt-5.3-chat

GPT-5.3 Chat for text generation, reasoning, tool calling, and live streaming responses.

Chat Streaming Tools Context window: 128,000 tokens Max output: 16,384 tokens
input per 1m tokens $1.750000
cached input per 1m tokens $0.175000
output per 1m tokens $14.000000
Integration docs

GPT-5.3 Codex

gpt-5.3-codex

GPT-5.3 Codex for text generation, reasoning, tool calling, and live streaming responses.

Chat Streaming Tools Context window: 400,000 tokens Max output: 128,000 tokens
input per 1m tokens $1.750000
cached input per 1m tokens $0.175000
output per 1m tokens $14.000000
Integration docs

GPT-5.4

gpt-5.4

GPT-5.4 for text generation, reasoning, tool calling, and live streaming responses.

Chat Streaming Tools Context window: 922,000 tokens Max output: 128,000 tokens
input per 1m tokens $2.500000
cached input per 1m tokens $0.250000
output per 1m tokens $15.000000
Integration docs

GPT-5.4 Mini

gpt-5.4-mini

GPT-5.4 Mini for text generation, reasoning, tool calling, and live streaming responses.

Chat Streaming Tools Context window: 272,000 tokens Max output: 128,000 tokens
input per 1m tokens $0.750000
cached input per 1m tokens $0.075000
output per 1m tokens $4.500000
Integration docs

GPT-5.4 Nano

gpt-5.4-nano

GPT-5.4 Nano for text generation, reasoning, tool calling, and live streaming responses.

Chat Streaming Tools Context window: 272,000 tokens Max output: 128,000 tokens
input per 1m tokens $0.200000
cached input per 1m tokens $0.020000
output per 1m tokens $1.250000
Integration docs

GPT-5.4 Pro

gpt-5.4-pro

GPT-5.4 Pro for text generation, reasoning, tool calling, and live streaming responses.

Chat Streaming Tools Context window: 1,050,000 tokens Max output: 128,000 tokens
input per 1m tokens $30.000000
output per 1m tokens $180.000000
minimum hold $0.010000
Integration docs

GPT-5.5

gpt-5.5

GPT-5.5 for language generation, reasoning, tool calling, and streaming chat responses.

Chat Streaming Tools Context window: 922,000 tokens Max output: 128,000 tokens
input per 1m tokens $5.000000
cached input per 1m tokens $0.500000
output per 1m tokens $30.000000
Integration docs

Gemini 2.0 Flash

gemini-2.0-flash

Gemini 2.0 Flash for text generation, reasoning, tool calling, and live streaming responses.

Chat Streaming Tools Context window: 1,048,576 tokens Max output: 8,192 tokens
input per 1m tokens $0.150000
output per 1m tokens $0.600000
minimum hold $0.010000
Integration docs

Gemini 2.0 Flash-Lite

gemini-2.0-flash-lite

Gemini 2.0 Flash-Lite for text generation, reasoning, tool calling, and live streaming responses.

Chat Streaming Tools Context window: 1,048,576 tokens Max output: 8,192 tokens
input per 1m tokens $0.075000
output per 1m tokens $0.300000
minimum hold $0.010000
Integration docs

Gemini 2.5 Flash

gemini-2.5-flash

Gemini 2.5 Flash for text generation, reasoning, tool calling, and live streaming responses.

Chat Streaming Tools Context window: 1,048,576 tokens Max output: 65,536 tokens
input per 1m tokens $0.300000
cached input per 1m tokens $0.030000
output per 1m tokens $2.500000
Integration docs

Gemini 2.5 Flash-Lite

gemini-2.5-flash-lite

Gemini 2.5 Flash-Lite for text generation, reasoning, tool calling, and live streaming responses.

Chat Streaming Tools Context window: 1,048,576 tokens Max output: 65,536 tokens
input per 1m tokens $0.100000
cached input per 1m tokens $0.010000
output per 1m tokens $0.400000
Integration docs

Gemini 2.5 Pro

gemini-2.5-pro

Gemini 2.5 Pro for text generation, reasoning, tool calling, and live streaming responses.

Chat Streaming Tools Context window: 1,048,576 tokens Max output: 65,536 tokens
input per 1m tokens $1.250000
cached input per 1m tokens $0.125000
output per 1m tokens $10.000000
Integration docs

Gemini 3 Flash Preview

gemini-3-flash-preview

Gemini 3 Flash Preview for text generation, reasoning, tool calling, and live streaming responses.

Chat Streaming Tools Context window: 1,048,576 tokens Max output: 65,536 tokens
input per 1m tokens $0.500000
cached input per 1m tokens $0.050000
output per 1m tokens $3.000000
Integration docs

Gemini 3.1 Flash-Lite

gemini-3.1-flash-lite

Gemini 3.1 Flash-Lite for text generation, reasoning, tool calling, and live streaming responses.

Chat Streaming Tools Context window: 1,048,576 tokens Max output: 65,536 tokens
input per 1m tokens $0.250000
cached input per 1m tokens $0.025000
output per 1m tokens $1.500000
Integration docs

Gemini 3.1 Pro Preview

gemini-3.1-pro-preview

Gemini 3.1 Pro Preview for text generation, reasoning, tool calling, and live streaming responses.

Chat Streaming Tools Context window: 1,048,576 tokens Max output: 65,536 tokens
input per 1m tokens $2.000000
cached input per 1m tokens $0.200000
output per 1m tokens $12.000000
Integration docs

Gemini 3.1 Pro Preview Custom Tools

gemini-3.1-pro-preview-customtools

Gemini 3.1 Pro Preview Custom Tools for text generation, reasoning, tool calling, and live streaming responses.

Chat Streaming Tools Context window: 1,048,576 tokens Max output: 65,536 tokens
input per 1m tokens $2.000000
cached input per 1m tokens $0.200000
output per 1m tokens $12.000000
Integration docs

Gemini 3.5 Flash

gemini-3.5-flash

Gemini 3.5 Flash for text generation, reasoning, tool calling, and live streaming responses.

Chat Streaming Tools Context window: 1,048,576 tokens Max output: 65,536 tokens
input per 1m tokens $1.500000
cached input per 1m tokens $0.150000
output per 1m tokens $9.000000
Integration docs

Gemini Deep Research Max Preview

gemini-deep-research-max-preview

Gemini Deep Research Max Preview for text generation, reasoning, tool calling, and live streaming responses.

Chat Streaming Tools Streaming supported Tool/function calling supported
input per 1m tokens $2.000000
cached input per 1m tokens $0.200000
output per 1m tokens $12.000000
Integration docs

Gemini Deep Research Preview

gemini-deep-research-preview

Gemini Deep Research Preview for text generation, reasoning, tool calling, and live streaming responses.

Chat Streaming Tools Streaming supported Tool/function calling supported
input per 1m tokens $2.000000
cached input per 1m tokens $0.200000
output per 1m tokens $12.000000
Integration docs

Gemini Flash Latest

gemini-flash-latest

Gemini Flash Latest for text generation, reasoning, tool calling, and streaming responses.

Chat Streaming Tools Context window: 1,048,576 tokens Max output: 65,536 tokens
input per 1m tokens $0.500000
cached input per 1m tokens $0.050000
output per 1m tokens $3.000000
Integration docs

Gemini Robotics-ER 1.6 Preview

gemini-robotics-er-1.6-preview

Gemini Robotics-ER 1.6 Preview for text generation, reasoning, tool calling, and streaming responses.

Chat Streaming Tools Streaming supported Tool/function calling supported
input per 1m tokens $1.000000
output per 1m tokens $5.000000
minimum hold $0.010000
Integration docs

Grok 3

grok-3

Grok 3 for text generation, reasoning, tool calling, and streaming responses.

Chat Streaming Tools Context window: 131,072 tokens Max output: 8,192 tokens
input per 1m tokens $3.000000
output per 1m tokens $15.000000
minimum hold $0.010000
Integration docs

Grok 3 Mini

grok-3-mini

Grok 3 Mini for text generation, reasoning, tool calling, and streaming responses.

Chat Streaming Tools Context window: 131,072 tokens Max output: 8,192 tokens
input per 1m tokens $0.250000
output per 1m tokens $1.270000
minimum hold $0.010000
Integration docs

Grok 4

grok-4

Grok 4 for text generation, reasoning, tool calling, and live streaming responses.

Chat Streaming Tools Context window: 262,000 tokens Max output: 8,192 tokens
input per 1m tokens $3.000000
output per 1m tokens $15.000000
minimum hold $0.010000
Integration docs

Grok 4 Fast Non Reasoning

grok-4-fast-non-reasoning

Grok 4 Fast Non Reasoning for text generation, reasoning, tool calling, and streaming responses.

Chat Streaming Tools Context window: 262,000 tokens Max output: 8,192 tokens
input per 1m tokens $0.200000
output per 1m tokens $0.500000
minimum hold $0.010000
Integration docs

Grok 4 Fast Reasoning

grok-4-fast-reasoning

Grok 4 Fast Reasoning for text generation, reasoning, tool calling, and streaming responses.

Chat Streaming Tools Context window: 262,000 tokens Max output: 8,192 tokens
input per 1m tokens $0.200000
output per 1m tokens $0.500000
minimum hold $0.010000
Integration docs

Grok 4.1 Fast (Non-Reasoning)

grok-4.1-fast-non-reasoning

Grok 4.1 Fast (Non-Reasoning) through Google Vertex AI Model Garden/MaaS with Omixa routing and streaming.

Chat Streaming Tools Context window: 128,000 tokens Max output: 128,000 tokens
input per 1m tokens $0.200000
cached input per 1m tokens $0.050000
output per 1m tokens $0.500000
Integration docs

Grok 4.1 Fast (Reasoning)

grok-4.1-fast-reasoning

Grok 4.1 Fast (Reasoning) through Google Vertex AI Model Garden/MaaS with Omixa routing and streaming.

Chat Streaming Tools Context window: 128,000 tokens Max output: 128,000 tokens
input per 1m tokens $0.200000
cached input per 1m tokens $0.050000
output per 1m tokens $0.500000
Integration docs

Grok 4.20 (Non-Reasoning)

grok-4-20-non-reasoning

Grok 4.20 (Non-Reasoning) through Google Vertex AI Model Garden/MaaS with Omixa routing and streaming.

Chat Streaming Tools Context window: 2,000,000 tokens Max output: 8,192 tokens
input per 1m tokens $1.250000
cached input per 1m tokens $0.200000
output per 1m tokens $2.500000
Integration docs

Grok 4.20 (Reasoning)

grok-4-20-reasoning

Grok 4.20 (Reasoning) through Google Vertex AI Model Garden/MaaS with Omixa routing and streaming.

Chat Streaming Tools Context window: 2,000,000 tokens Max output: 8,192 tokens
input per 1m tokens $1.250000
cached input per 1m tokens $0.200000
output per 1m tokens $2.500000
Integration docs

Grok 4.3

grok-4.3

Grok 4.3 for text generation, reasoning, tool calling, and live streaming responses.

Chat Streaming Tools Context window: 200,000 tokens Max output: 8,192 tokens
input per 1m tokens $1.250000
cached input per 1m tokens $0.200000
output per 1m tokens $2.500000
Integration docs

Grok Code Fast 1

grok-code-fast-1

Grok Code Fast 1 for text generation, reasoning, tool calling, and live streaming responses.

Chat Streaming Tools Context window: 256,000 tokens Max output: 8,192 tokens
input per 1m tokens $0.200000
output per 1m tokens $1.500000
minimum hold $0.010000
Integration docs

Kimi K2 Thinking

Kimi-K2-Thinking

Kimi K2 Thinking through Google Vertex AI Model Garden/MaaS with Omixa routing and streaming.

Chat Streaming Tools Context window: 131,072 tokens Max output: 65,536 tokens
input per 1m tokens $1.045000
cached input per 1m tokens $0.176000
output per 1m tokens $4.400000
Integration docs

Kimi K2.5

Kimi-K2.5

Kimi K2.5 for text generation, reasoning, tool calling, and live streaming responses.

Chat Streaming Tools Context window: 262,144 tokens Max output: 262,144 tokens
input per 1m tokens $0.660000
cached input per 1m tokens $0.110000
output per 1m tokens $3.300000
Integration docs

Kimi K2.6

Kimi-K2.6

Kimi K2.6 for text generation, reasoning, tool calling, and live streaming responses.

Chat Streaming Tools Context window: 262,144 tokens Max output: 262,144 tokens
input per 1m tokens $1.045000
cached input per 1m tokens $0.176000
output per 1m tokens $4.400000
Integration docs

MAI DS R1

MAI-DS-R1

MAI DS R1 for text generation, reasoning, tool calling, and streaming responses.

Chat Streaming Context window: 163,840 tokens Max output: 163,840 tokens
input per 1m tokens $1.350000
output per 1m tokens $5.400000
minimum hold $0.010000
Integration docs

Meta Llama 3 405B Instruct

llama-3-405b-instruct

Meta Llama 3 405B Instruct through Google Vertex AI Model Garden/MaaS with Omixa routing and streaming.

Chat Streaming Tools Context window: 131,072 tokens Max output: 8,192 tokens
input per 1m tokens $2.700000
output per 1m tokens $2.700000
minimum hold $0.010000
Integration docs

Meta Llama 3 70B Instruct

llama-3-70b-instruct

Meta Llama 3 70B Instruct through Google Vertex AI Model Garden/MaaS with Omixa routing and streaming.

Chat Streaming Tools Context window: 8,192 tokens Max output: 8,192 tokens
input per 1m tokens $0.710000
output per 1m tokens $0.710000
minimum hold $0.010000
Integration docs

Meta Llama 3 8B Instruct

llama-3-8b-instruct

Meta Llama 3 8B Instruct through Google Vertex AI Model Garden/MaaS with Omixa routing and streaming.

Chat Streaming Tools Context window: 8,192 tokens Max output: 8,192 tokens
input per 1m tokens $0.200000
output per 1m tokens $0.200000
minimum hold $0.010000
Integration docs

Meta Llama 3.2 90B Instruct

llama-3.2-90b-instruct

Meta Llama 3.2 90B Instruct through Google Vertex AI Model Garden/MaaS with Omixa routing and streaming.

Chat Streaming Tools Context window: 131,072 tokens Max output: 8,192 tokens
input per 1m tokens $0.900000
output per 1m tokens $0.900000
minimum hold $0.010000
Integration docs

Meta Llama 3.3 70B Instruct

Llama-3.3-70B-Instruct

Meta Llama 3.3 70B Instruct through Google Vertex AI Model Garden/MaaS with Omixa routing and streaming.

Chat Streaming Tools Context window: 131,072 tokens Max output: 8,192 tokens
input per 1m tokens $0.710000
output per 1m tokens $0.710000
minimum hold $0.010000
Integration docs

Meta Llama 4 Maverick Instruct

Llama-4-Maverick-17B-128E-Instruct-FP8

Meta Llama 4 Maverick Instruct through Google Vertex AI Model Garden/MaaS with Omixa routing and streaming.

Chat Streaming Tools Context window: 524,288 tokens Max output: 8,192 tokens
input per 1m tokens $0.350000
output per 1m tokens $1.150000
minimum hold $0.010000
Integration docs

Meta Llama 4 Scout Instruct

llama-4-scout-17b-16e-instruct

Meta Llama 4 Scout Instruct through Google Vertex AI Model Garden/MaaS with Omixa routing and streaming.

Chat Streaming Tools Context window: 1,048,576 tokens Max output: 8,192 tokens
input per 1m tokens $0.180000
output per 1m tokens $0.590000
minimum hold $0.010000
Integration docs

MiniMax M2

MiniMax-M2

MiniMax M2 through Google Vertex AI Model Garden/MaaS with Omixa routing and streaming.

Chat Streaming Tools Context window: 196,608 tokens Max output: 196,608 tokens
input per 1m tokens $0.300000
cached input per 1m tokens $0.030000
output per 1m tokens $1.200000
Integration docs

Mistral Document AI 2505

mistral-document-ai-2505

Mistral Document AI 2505 for text generation, reasoning, tool calling, and streaming responses.

Chat Streaming Streaming supported
minimum hold $0.010000
Integration docs

Mistral Document AI 2512

mistral-document-ai-2512

Mistral Document AI 2512 for text generation, reasoning, tool calling, and streaming responses.

Chat Streaming Streaming supported
minimum hold $0.010000
Integration docs

Mistral Large 3

Mistral-Large-3

Mistral Large 3 for text generation, reasoning, tool calling, and live streaming responses.

Chat Streaming Tools Streaming supported Tool/function calling supported
input per 1m tokens $0.500000
output per 1m tokens $1.500000
minimum hold $0.010000
Integration docs

Model Router

model-router

Model Router for text generation, reasoning, tool calling, and live streaming responses.

Chat Streaming Tools Streaming supported Tool/function calling supported
input per 1m tokens $2.000000
cached input per 1m tokens $0.500000
output per 1m tokens $8.000000
Integration docs

Qwen3 235B A22B Instruct 2507

qwen3-235b-a22b-instruct-2507

Qwen3 235B A22B Instruct 2507 through Google Vertex AI Model Garden/MaaS with Omixa routing and streaming.

Chat Streaming Tools Context window: 262,144 tokens Max output: 65,536 tokens
input per 1m tokens $0.220000
cached input per 1m tokens $0.022000
output per 1m tokens $1.800000
Integration docs

Qwen3 Coder 480B A35B Instruct

qwen3-coder-480b-a35b-instruct

Qwen3 Coder 480B A35B Instruct through Google Vertex AI Model Garden/MaaS with Omixa routing and streaming.

Chat Streaming Tools Context window: 262,144 tokens Max output: 65,536 tokens
input per 1m tokens $0.220000
cached input per 1m tokens $0.022000
output per 1m tokens $1.800000
Integration docs

Qwen3 Next 80B A3B Instruct

qwen3-next-80b-a3b-instruct

Qwen3 Next 80B A3B Instruct through Google Vertex AI Model Garden/MaaS with Omixa routing and streaming.

Chat Streaming Tools Context window: 262,144 tokens Max output: 65,536 tokens
input per 1m tokens $0.220000
cached input per 1m tokens $0.022000
output per 1m tokens $1.800000
Integration docs

Qwen3 Next 80B A3B Thinking

qwen3-next-80b-a3b-thinking

Qwen3 Next 80B A3B Thinking through Google Vertex AI Model Garden/MaaS with Omixa routing and streaming.

Chat Streaming Tools Context window: 262,144 tokens Max output: 65,536 tokens
input per 1m tokens $0.220000
cached input per 1m tokens $0.022000
output per 1m tokens $1.800000
Integration docs

o1

o1

o1 for text generation, reasoning, tool calling, and live streaming responses.

Chat Tools Context window: 200,000 tokens Max output: 100,000 tokens
input per 1m tokens $15.000000
cached input per 1m tokens $7.500000
output per 1m tokens $60.000000
Integration docs

o1 Mini

o1-mini

o1 Mini for text generation, reasoning, tool calling, and streaming responses.

Chat Tools Context window: 128,000 tokens Max output: 65,536 tokens
input per 1m tokens $15.000000
cached input per 1m tokens $7.500000
output per 1m tokens $60.000000
Integration docs

o1 Preview

o1-preview

o1 Preview for text generation, reasoning, tool calling, and streaming responses.

Chat Tools Context window: 128,000 tokens Max output: 32,768 tokens
input per 1m tokens $15.000000
cached input per 1m tokens $7.500000
output per 1m tokens $60.000000
Integration docs

o3

o3

o3 for text generation, reasoning, tool calling, and live streaming responses.

Chat Streaming Tools Context window: 200,000 tokens Max output: 100,000 tokens
input per 1m tokens $2.000000
cached input per 1m tokens $0.500000
output per 1m tokens $8.000000
Integration docs

o3 Mini

o3-mini

o3 Mini for text generation, reasoning, tool calling, and live streaming responses.

Chat Streaming Tools Context window: 200,000 tokens Max output: 100,000 tokens
input per 1m tokens $1.100000
cached input per 1m tokens $0.550000
output per 1m tokens $4.400000
Integration docs

o3 Pro

o3-pro

o3 Pro for text generation, reasoning, tool calling, and live streaming responses.

Chat Tools Context window: 200,000 tokens Max output: 100,000 tokens
input per 1m tokens $20.000000
output per 1m tokens $80.000000
minimum hold $0.010000
Integration docs

o4 Mini

o4-mini

o4 Mini for text generation, reasoning, tool calling, and live streaming responses.

Chat Streaming Tools Context window: 200,000 tokens Max output: 100,000 tokens
input per 1m tokens $1.100000
cached input per 1m tokens $0.275000
output per 1m tokens $4.400000
Integration docs
Copied Markdown