AQA
aqa
AQA for text generation, reasoning, tool calling, and live streaming responses.
Chat
Context window: 7,168 tokens
Max output: 1,024 tokens
Antigravity Agent Preview
antigravity-agent-preview
Antigravity Agent Preview for text generation, reasoning, tool calling, and live streaming responses.
Chat
Streaming Tools Streaming supported
Tool/function calling supported
input per 1m tokens
$1.250000
output per 1m tokens
$10.000000
minimum hold
$0.010000
Azure Computer Use Preview
azure-computer-use-preview
Azure Computer Use Preview for text generation, reasoning, tool calling, and streaming responses.
Chat
Streaming Tools Context window: 8,192 tokens
Max output: 1,024 tokens
input per 1m tokens
$2.000000
cached input per 1m tokens
$0.500000
output per 1m tokens
$8.000000
Claude Haiku 4.5
claude-haiku-4-5
Claude Haiku 4.5 for text generation, reasoning, tool calling, and live streaming responses.
Chat
Streaming Tools Context window: 200,000 tokens
Max output: 64,000 tokens
input per 1m tokens
$1.000000
cached input per 1m tokens
$0.100000
output per 1m tokens
$5.000000
Claude Opus 4.1
claude-opus-4-1
Claude Opus 4.1 for text generation, reasoning, tool calling, and live streaming responses.
Chat
Streaming Tools Context window: 200,000 tokens
Max output: 32,000 tokens
input per 1m tokens
$15.000000
cached input per 1m tokens
$1.500000
output per 1m tokens
$75.000000
Claude Opus 4.5
claude-opus-4-5
Claude Opus 4.5 for text generation, reasoning, tool calling, and live streaming responses.
Chat
Streaming Tools Context window: 200,000 tokens
Max output: 64,000 tokens
input per 1m tokens
$5.000000
cached input per 1m tokens
$0.500000
output per 1m tokens
$25.000000
Claude Opus 4.6
claude-opus-4-6
Claude Opus 4.6 for text generation, reasoning, tool calling, and live streaming responses.
Chat
Streaming Tools Context window: 1,000,000 tokens
Max output: 128,000 tokens
input per 1m tokens
$5.000000
cached input per 1m tokens
$0.500000
output per 1m tokens
$25.000000
Claude Opus 4.7
claude-opus-4-7
Claude Opus 4.7 for text generation, reasoning, tool calling, and live streaming responses.
Chat
Streaming Tools Context window: 1,000,000 tokens
Max output: 128,000 tokens
input per 1m tokens
$5.000000
cached input per 1m tokens
$0.500000
output per 1m tokens
$25.000000
Claude Opus 4.8
claude-opus-4-8
Claude Opus 4.8 for text generation, reasoning, tool calling, and live streaming responses.
Chat
Streaming Tools Context window: 1,000,000 tokens
Max output: 128,000 tokens
input per 1m tokens
$5.000000
cached input per 1m tokens
$0.500000
output per 1m tokens
$25.000000
Claude Sonnet 4.5
claude-sonnet-4-5
Claude Sonnet 4.5 for text generation, reasoning, tool calling, and live streaming responses.
Chat
Streaming Tools Context window: 200,000 tokens
Max output: 64,000 tokens
input per 1m tokens
$3.000000
cached input per 1m tokens
$0.300000
output per 1m tokens
$15.000000
Claude Sonnet 4.6
claude-sonnet-4-6
Claude Sonnet 4.6 for text generation, reasoning, tool calling, and live streaming responses.
Chat
Streaming Tools Context window: 1,000,000 tokens
Max output: 128,000 tokens
input per 1m tokens
$3.000000
cached input per 1m tokens
$0.300000
output per 1m tokens
$15.000000
Codex Mini
codex-mini
Codex Mini for text generation, reasoning, tool calling, and live streaming responses.
Chat
Streaming Tools Context window: 200,000 tokens
Max output: 100,000 tokens
input per 1m tokens
$1.500000
cached input per 1m tokens
$0.375000
output per 1m tokens
$6.000000
Cohere Command A
Cohere-command-a
Cohere Command A for text generation, reasoning, tool calling, and live streaming responses.
Chat
Streaming Tools Context window: 131,072 tokens
Max output: 8,182 tokens
input per 1m tokens
$2.500000
output per 1m tokens
$10.000000
minimum hold
$0.010000
Computer Use Preview
computer-use-preview
Computer Use Preview for text generation, reasoning, tool calling, and live streaming responses.
Chat
Streaming Tools Streaming supported
Tool/function calling supported
input per 1m tokens
$1.250000
output per 1m tokens
$10.000000
minimum hold
$0.010000
DeepSeek OCR
DeepSeek-OCR
DeepSeek OCR through Google Vertex AI Model Garden/MaaS with Omixa routing and streaming.
Chat
Streaming Context window: 32,768 tokens
Max output: 8,192 tokens
input per 1m tokens
$0.560000
output per 1m tokens
$1.680000
minimum hold
$0.010000
DeepSeek R1
DeepSeek-R1
DeepSeek R1 for text generation, reasoning, tool calling, and live streaming responses.
Chat
Streaming Context window: 163,840 tokens
Max output: 163,840 tokens
input per 1m tokens
$1.350000
output per 1m tokens
$5.400000
minimum hold
$0.010000
DeepSeek R1 0528
DeepSeek-R1-0528
DeepSeek R1 0528 through Google Vertex AI Model Garden/MaaS with Omixa routing and streaming.
Chat
Streaming Tools Context window: 163,840 tokens
Max output: 32,768 tokens
input per 1m tokens
$1.350000
output per 1m tokens
$5.400000
minimum hold
$0.010000
DeepSeek V3 0324
DeepSeek-V3-0324
DeepSeek V3 0324 for text generation, reasoning, tool calling, and live streaming responses.
Chat
Streaming Tools Context window: 131,072 tokens
Max output: 131,072 tokens
input per 1m tokens
$1.140000
output per 1m tokens
$4.560000
minimum hold
$0.010000
DeepSeek V3.1
DeepSeek-V3.1
DeepSeek V3.1 through Google Vertex AI Model Garden/MaaS with Omixa routing and streaming.
Chat
Streaming Tools Context window: 131,072 tokens
Max output: 32,768 tokens
input per 1m tokens
$1.230000
output per 1m tokens
$4.940000
minimum hold
$0.010000
DeepSeek V3.2
DeepSeek-V3.2
DeepSeek V3.2 through Google Vertex AI Model Garden/MaaS with Omixa routing and streaming.
Chat
Streaming Tools Context window: 163,840 tokens
Max output: 65,536 tokens
input per 1m tokens
$0.560000
cached input per 1m tokens
$0.056000
output per 1m tokens
$1.680000
DeepSeek V3.2 Speciale
DeepSeek-V3.2-Speciale
DeepSeek V3.2 Speciale for text generation, reasoning, tool calling, and live streaming responses.
Chat
Streaming Context window: 128,000 tokens
Max output: 128,000 tokens
input per 1m tokens
$0.580000
output per 1m tokens
$1.680000
minimum hold
$0.010000
DeepSeek V4 Flash
DeepSeek-V4-Flash
DeepSeek V4 Flash for text generation, reasoning, tool calling, and live streaming responses.
Chat
Streaming Context window: 1,000,000 tokens
Max output: 384,000 tokens
input per 1m tokens
$0.190000
output per 1m tokens
$0.510000
minimum hold
$0.010000
DeepSeek V4 Pro
DeepSeek-V4-Pro
DeepSeek V4 Pro for text generation, reasoning, tool calling, and live streaming responses.
Chat
Streaming Context window: 1,000,000 tokens
Max output: 384,000 tokens
input per 1m tokens
$1.740000
output per 1m tokens
$3.480000
minimum hold
$0.010000
GLM 4.7
glm-4.7
GLM 4.7 through Google Vertex AI Model Garden/MaaS with Omixa routing and streaming.
Chat
Streaming Tools Context window: 200,000 tokens
Max output: 128,000 tokens
input per 1m tokens
$1.000000
cached input per 1m tokens
$0.100000
output per 1m tokens
$3.200000
GLM 5
glm-5
GLM 5 through Google Vertex AI Model Garden/MaaS with Omixa routing and streaming.
Chat
Streaming Tools Context window: 200,000 tokens
Max output: 128,000 tokens
input per 1m tokens
$1.000000
cached input per 1m tokens
$0.100000
output per 1m tokens
$3.200000
GPT Chat Latest
gpt-chat-latest
GPT Chat Latest for text generation, reasoning, tool calling, and live streaming responses.
Chat
Streaming Tools Context window: 128,000 tokens
Max output: 16,384 tokens
input per 1m tokens
$5.000000
cached input per 1m tokens
$0.500000
output per 1m tokens
$30.000000
GPT OSS 120B
gpt-oss-120b
GPT OSS 120B through Google Vertex AI Model Garden/MaaS with Omixa routing and streaming.
Chat
Streaming Tools Context window: 131,072 tokens
Max output: 65,536 tokens
input per 1m tokens
$0.150000
output per 1m tokens
$0.600000
minimum hold
$0.010000
GPT OSS 20B
gpt-oss-20b
GPT OSS 20B through Google Vertex AI Model Garden/MaaS with Omixa routing and streaming.
Chat
Streaming Tools Context window: 131,072 tokens
Max output: 65,536 tokens
input per 1m tokens
$0.070000
output per 1m tokens
$0.300000
minimum hold
$0.010000
GPT-4.1
gpt-4.1
GPT-4.1 for text generation, reasoning, tool calling, and live streaming responses.
Chat
Streaming Tools Context window: 1,047,576 tokens
Max output: 32,768 tokens
input per 1m tokens
$2.000000
cached input per 1m tokens
$0.500000
output per 1m tokens
$8.000000
GPT-4.1 Mini
gpt-4.1-mini
GPT-4.1 Mini for text generation, reasoning, tool calling, and live streaming responses.
Chat
Streaming Tools Context window: 1,047,576 tokens
Max output: 32,768 tokens
input per 1m tokens
$0.400000
cached input per 1m tokens
$0.100000
output per 1m tokens
$1.600000
GPT-4.1 Nano
gpt-4.1-nano
GPT-4.1 Nano for text generation, reasoning, tool calling, and live streaming responses.
Chat
Streaming Tools Context window: 1,047,576 tokens
Max output: 32,768 tokens
input per 1m tokens
$0.100000
cached input per 1m tokens
$0.025000
output per 1m tokens
$0.400000
GPT-4o
gpt-4o
GPT-4o for text generation, reasoning, tool calling, and live streaming responses.
Chat
Streaming Tools Context window: 128,000 tokens
Max output: 4,096 tokens
input per 1m tokens
$2.500000
cached input per 1m tokens
$1.250000
output per 1m tokens
$10.000000
GPT-4o Mini
gpt-4o-mini
GPT-4o Mini for text generation, reasoning, tool calling, and live streaming responses.
Chat
Streaming Tools Context window: 128,000 tokens
Max output: 16,384 tokens
input per 1m tokens
$0.150000
cached input per 1m tokens
$0.075000
output per 1m tokens
$0.600000
GPT-5
gpt-5
GPT-5 for text generation, reasoning, tool calling, and live streaming responses.
Chat
Streaming Tools Context window: 400,000 tokens
Max output: 128,000 tokens
input per 1m tokens
$1.250000
cached input per 1m tokens
$0.125000
output per 1m tokens
$10.000000
GPT-5 Chat
gpt-5-chat
GPT-5 Chat for text generation, reasoning, tool calling, and live streaming responses.
Chat
Streaming Tools Context window: 128,000 tokens
Max output: 16,384 tokens
input per 1m tokens
$1.250000
cached input per 1m tokens
$0.125000
output per 1m tokens
$10.000000
GPT-5 Codex
gpt-5-codex
GPT-5 Codex for text generation, reasoning, tool calling, and live streaming responses.
Chat
Streaming Tools Context window: 400,000 tokens
Max output: 128,000 tokens
input per 1m tokens
$1.250000
cached input per 1m tokens
$0.125000
output per 1m tokens
$10.000000
GPT-5 Mini
gpt-5-mini
GPT-5 Mini for text generation, reasoning, tool calling, and live streaming responses.
Chat
Streaming Tools Context window: 400,000 tokens
Max output: 128,000 tokens
input per 1m tokens
$0.250000
cached input per 1m tokens
$0.025000
output per 1m tokens
$2.000000
GPT-5 Nano
gpt-5-nano
GPT-5 Nano for text generation, reasoning, tool calling, and live streaming responses.
Chat
Streaming Tools Context window: 400,000 tokens
Max output: 128,000 tokens
input per 1m tokens
$0.050000
cached input per 1m tokens
$0.005000
output per 1m tokens
$0.400000
GPT-5 Pro
gpt-5-pro
GPT-5 Pro for text generation, reasoning, tool calling, and live streaming responses.
Chat
Tools Context window: 400,000 tokens
Max output: 128,000 tokens
input per 1m tokens
$15.000000
output per 1m tokens
$120.000000
minimum hold
$0.010000
GPT-5.1
gpt-5.1
GPT-5.1 for text generation, reasoning, tool calling, and live streaming responses.
Chat
Streaming Tools Context window: 400,000 tokens
Max output: 128,000 tokens
input per 1m tokens
$1.250000
cached input per 1m tokens
$0.125000
output per 1m tokens
$10.000000
GPT-5.1 Chat
gpt-5.1-chat
GPT-5.1 Chat for text generation, reasoning, tool calling, and live streaming responses.
Chat
Streaming Tools Context window: 128,000 tokens
Max output: 16,384 tokens
input per 1m tokens
$1.250000
cached input per 1m tokens
$0.125000
output per 1m tokens
$10.000000
GPT-5.1 Codex
gpt-5.1-codex
GPT-5.1 Codex for text generation, reasoning, tool calling, and live streaming responses.
Chat
Streaming Tools Context window: 400,000 tokens
Max output: 128,000 tokens
input per 1m tokens
$1.250000
cached input per 1m tokens
$0.125000
output per 1m tokens
$10.000000
GPT-5.1 Codex Max
gpt-5.1-codex-max
GPT-5.1 Codex Max for text generation, reasoning, tool calling, and live streaming responses.
Chat
Streaming Tools Context window: 400,000 tokens
Max output: 128,000 tokens
input per 1m tokens
$1.250000
cached input per 1m tokens
$0.125000
output per 1m tokens
$10.000000
GPT-5.1 Codex Mini
gpt-5.1-codex-mini
GPT-5.1 Codex Mini for text generation, reasoning, tool calling, and live streaming responses.
Chat
Streaming Tools Context window: 400,000 tokens
Max output: 128,000 tokens
input per 1m tokens
$0.250000
cached input per 1m tokens
$0.025000
output per 1m tokens
$2.000000
GPT-5.2
gpt-5.2
GPT-5.2 for text generation, reasoning, tool calling, and live streaming responses.
Chat
Streaming Tools Context window: 400,000 tokens
Max output: 128,000 tokens
input per 1m tokens
$1.750000
cached input per 1m tokens
$0.175000
output per 1m tokens
$14.000000
GPT-5.2 Chat
gpt-5.2-chat
GPT-5.2 Chat for text generation, reasoning, tool calling, and live streaming responses.
Chat
Streaming Tools Context window: 128,000 tokens
Max output: 16,384 tokens
input per 1m tokens
$1.750000
cached input per 1m tokens
$0.175000
output per 1m tokens
$14.000000
GPT-5.2 Codex
gpt-5.2-codex
GPT-5.2 Codex for text generation, reasoning, tool calling, and live streaming responses.
Chat
Streaming Tools Context window: 400,000 tokens
Max output: 128,000 tokens
input per 1m tokens
$1.750000
cached input per 1m tokens
$0.175000
output per 1m tokens
$14.000000
GPT-5.3 Chat
gpt-5.3-chat
GPT-5.3 Chat for text generation, reasoning, tool calling, and live streaming responses.
Chat
Streaming Tools Context window: 128,000 tokens
Max output: 16,384 tokens
input per 1m tokens
$1.750000
cached input per 1m tokens
$0.175000
output per 1m tokens
$14.000000
GPT-5.3 Codex
gpt-5.3-codex
GPT-5.3 Codex for text generation, reasoning, tool calling, and live streaming responses.
Chat
Streaming Tools Context window: 400,000 tokens
Max output: 128,000 tokens
input per 1m tokens
$1.750000
cached input per 1m tokens
$0.175000
output per 1m tokens
$14.000000
GPT-5.4
gpt-5.4
GPT-5.4 for text generation, reasoning, tool calling, and live streaming responses.
Chat
Streaming Tools Context window: 922,000 tokens
Max output: 128,000 tokens
input per 1m tokens
$2.500000
cached input per 1m tokens
$0.250000
output per 1m tokens
$15.000000
GPT-5.4 Mini
gpt-5.4-mini
GPT-5.4 Mini for text generation, reasoning, tool calling, and live streaming responses.
Chat
Streaming Tools Context window: 272,000 tokens
Max output: 128,000 tokens
input per 1m tokens
$0.750000
cached input per 1m tokens
$0.075000
output per 1m tokens
$4.500000
GPT-5.4 Nano
gpt-5.4-nano
GPT-5.4 Nano for text generation, reasoning, tool calling, and live streaming responses.
Chat
Streaming Tools Context window: 272,000 tokens
Max output: 128,000 tokens
input per 1m tokens
$0.200000
cached input per 1m tokens
$0.020000
output per 1m tokens
$1.250000
GPT-5.4 Pro
gpt-5.4-pro
GPT-5.4 Pro for text generation, reasoning, tool calling, and live streaming responses.
Chat
Streaming Tools Context window: 1,050,000 tokens
Max output: 128,000 tokens
input per 1m tokens
$30.000000
output per 1m tokens
$180.000000
minimum hold
$0.010000
GPT-5.5
gpt-5.5
GPT-5.5 for language generation, reasoning, tool calling, and streaming chat responses.
Chat
Streaming Tools Context window: 922,000 tokens
Max output: 128,000 tokens
input per 1m tokens
$5.000000
cached input per 1m tokens
$0.500000
output per 1m tokens
$30.000000
Gemini 2.0 Flash
gemini-2.0-flash
Gemini 2.0 Flash for text generation, reasoning, tool calling, and live streaming responses.
Chat
Streaming Tools Context window: 1,048,576 tokens
Max output: 8,192 tokens
input per 1m tokens
$0.150000
output per 1m tokens
$0.600000
minimum hold
$0.010000
Gemini 2.0 Flash-Lite
gemini-2.0-flash-lite
Gemini 2.0 Flash-Lite for text generation, reasoning, tool calling, and live streaming responses.
Chat
Streaming Tools Context window: 1,048,576 tokens
Max output: 8,192 tokens
input per 1m tokens
$0.075000
output per 1m tokens
$0.300000
minimum hold
$0.010000
Gemini 2.5 Flash
gemini-2.5-flash
Gemini 2.5 Flash for text generation, reasoning, tool calling, and live streaming responses.
Chat
Streaming Tools Context window: 1,048,576 tokens
Max output: 65,536 tokens
input per 1m tokens
$0.300000
cached input per 1m tokens
$0.030000
output per 1m tokens
$2.500000
Gemini 2.5 Flash-Lite
gemini-2.5-flash-lite
Gemini 2.5 Flash-Lite for text generation, reasoning, tool calling, and live streaming responses.
Chat
Streaming Tools Context window: 1,048,576 tokens
Max output: 65,536 tokens
input per 1m tokens
$0.100000
cached input per 1m tokens
$0.010000
output per 1m tokens
$0.400000
Gemini 2.5 Pro
gemini-2.5-pro
Gemini 2.5 Pro for text generation, reasoning, tool calling, and live streaming responses.
Chat
Streaming Tools Context window: 1,048,576 tokens
Max output: 65,536 tokens
input per 1m tokens
$1.250000
cached input per 1m tokens
$0.125000
output per 1m tokens
$10.000000
Gemini 3 Flash Preview
gemini-3-flash-preview
Gemini 3 Flash Preview for text generation, reasoning, tool calling, and live streaming responses.
Chat
Streaming Tools Context window: 1,048,576 tokens
Max output: 65,536 tokens
input per 1m tokens
$0.500000
cached input per 1m tokens
$0.050000
output per 1m tokens
$3.000000
Gemini 3.1 Flash-Lite
gemini-3.1-flash-lite
Gemini 3.1 Flash-Lite for text generation, reasoning, tool calling, and live streaming responses.
Chat
Streaming Tools Context window: 1,048,576 tokens
Max output: 65,536 tokens
input per 1m tokens
$0.250000
cached input per 1m tokens
$0.025000
output per 1m tokens
$1.500000
Gemini 3.1 Pro Preview
gemini-3.1-pro-preview
Gemini 3.1 Pro Preview for text generation, reasoning, tool calling, and live streaming responses.
Chat
Streaming Tools Context window: 1,048,576 tokens
Max output: 65,536 tokens
input per 1m tokens
$2.000000
cached input per 1m tokens
$0.200000
output per 1m tokens
$12.000000
Gemini 3.1 Pro Preview Custom Tools
gemini-3.1-pro-preview-customtools
Gemini 3.1 Pro Preview Custom Tools for text generation, reasoning, tool calling, and live streaming responses.
Chat
Streaming Tools Context window: 1,048,576 tokens
Max output: 65,536 tokens
input per 1m tokens
$2.000000
cached input per 1m tokens
$0.200000
output per 1m tokens
$12.000000
Gemini 3.5 Flash
gemini-3.5-flash
Gemini 3.5 Flash for text generation, reasoning, tool calling, and live streaming responses.
Chat
Streaming Tools Context window: 1,048,576 tokens
Max output: 65,536 tokens
input per 1m tokens
$1.500000
cached input per 1m tokens
$0.150000
output per 1m tokens
$9.000000
Gemini Deep Research Max Preview
gemini-deep-research-max-preview
Gemini Deep Research Max Preview for text generation, reasoning, tool calling, and live streaming responses.
Chat
Streaming Tools Streaming supported
Tool/function calling supported
input per 1m tokens
$2.000000
cached input per 1m tokens
$0.200000
output per 1m tokens
$12.000000
Gemini Deep Research Preview
gemini-deep-research-preview
Gemini Deep Research Preview for text generation, reasoning, tool calling, and live streaming responses.
Chat
Streaming Tools Streaming supported
Tool/function calling supported
input per 1m tokens
$2.000000
cached input per 1m tokens
$0.200000
output per 1m tokens
$12.000000
Gemini Flash Latest
gemini-flash-latest
Gemini Flash Latest for text generation, reasoning, tool calling, and streaming responses.
Chat
Streaming Tools Context window: 1,048,576 tokens
Max output: 65,536 tokens
input per 1m tokens
$0.500000
cached input per 1m tokens
$0.050000
output per 1m tokens
$3.000000
Gemini Robotics-ER 1.6 Preview
gemini-robotics-er-1.6-preview
Gemini Robotics-ER 1.6 Preview for text generation, reasoning, tool calling, and streaming responses.
Chat
Streaming Tools Streaming supported
Tool/function calling supported
input per 1m tokens
$1.000000
output per 1m tokens
$5.000000
minimum hold
$0.010000
Grok 3
grok-3
Grok 3 for text generation, reasoning, tool calling, and streaming responses.
Chat
Streaming Tools Context window: 131,072 tokens
Max output: 8,192 tokens
input per 1m tokens
$3.000000
output per 1m tokens
$15.000000
minimum hold
$0.010000
Grok 3 Mini
grok-3-mini
Grok 3 Mini for text generation, reasoning, tool calling, and streaming responses.
Chat
Streaming Tools Context window: 131,072 tokens
Max output: 8,192 tokens
input per 1m tokens
$0.250000
output per 1m tokens
$1.270000
minimum hold
$0.010000
Grok 4
grok-4
Grok 4 for text generation, reasoning, tool calling, and live streaming responses.
Chat
Streaming Tools Context window: 262,000 tokens
Max output: 8,192 tokens
input per 1m tokens
$3.000000
output per 1m tokens
$15.000000
minimum hold
$0.010000
Grok 4 Fast Non Reasoning
grok-4-fast-non-reasoning
Grok 4 Fast Non Reasoning for text generation, reasoning, tool calling, and streaming responses.
Chat
Streaming Tools Context window: 262,000 tokens
Max output: 8,192 tokens
input per 1m tokens
$0.200000
output per 1m tokens
$0.500000
minimum hold
$0.010000
Grok 4 Fast Reasoning
grok-4-fast-reasoning
Grok 4 Fast Reasoning for text generation, reasoning, tool calling, and streaming responses.
Chat
Streaming Tools Context window: 262,000 tokens
Max output: 8,192 tokens
input per 1m tokens
$0.200000
output per 1m tokens
$0.500000
minimum hold
$0.010000
Grok 4.1 Fast (Non-Reasoning)
grok-4.1-fast-non-reasoning
Grok 4.1 Fast (Non-Reasoning) through Google Vertex AI Model Garden/MaaS with Omixa routing and streaming.
Chat
Streaming Tools Context window: 128,000 tokens
Max output: 128,000 tokens
input per 1m tokens
$0.200000
cached input per 1m tokens
$0.050000
output per 1m tokens
$0.500000
Grok 4.1 Fast (Reasoning)
grok-4.1-fast-reasoning
Grok 4.1 Fast (Reasoning) through Google Vertex AI Model Garden/MaaS with Omixa routing and streaming.
Chat
Streaming Tools Context window: 128,000 tokens
Max output: 128,000 tokens
input per 1m tokens
$0.200000
cached input per 1m tokens
$0.050000
output per 1m tokens
$0.500000
Grok 4.20 (Non-Reasoning)
grok-4-20-non-reasoning
Grok 4.20 (Non-Reasoning) through Google Vertex AI Model Garden/MaaS with Omixa routing and streaming.
Chat
Streaming Tools Context window: 2,000,000 tokens
Max output: 8,192 tokens
input per 1m tokens
$1.250000
cached input per 1m tokens
$0.200000
output per 1m tokens
$2.500000
Grok 4.20 (Reasoning)
grok-4-20-reasoning
Grok 4.20 (Reasoning) through Google Vertex AI Model Garden/MaaS with Omixa routing and streaming.
Chat
Streaming Tools Context window: 2,000,000 tokens
Max output: 8,192 tokens
input per 1m tokens
$1.250000
cached input per 1m tokens
$0.200000
output per 1m tokens
$2.500000
Grok 4.3
grok-4.3
Grok 4.3 for text generation, reasoning, tool calling, and live streaming responses.
Chat
Streaming Tools Context window: 200,000 tokens
Max output: 8,192 tokens
input per 1m tokens
$1.250000
cached input per 1m tokens
$0.200000
output per 1m tokens
$2.500000
Grok Code Fast 1
grok-code-fast-1
Grok Code Fast 1 for text generation, reasoning, tool calling, and live streaming responses.
Chat
Streaming Tools Context window: 256,000 tokens
Max output: 8,192 tokens
input per 1m tokens
$0.200000
output per 1m tokens
$1.500000
minimum hold
$0.010000
Kimi K2 Thinking
Kimi-K2-Thinking
Kimi K2 Thinking through Google Vertex AI Model Garden/MaaS with Omixa routing and streaming.
Chat
Streaming Tools Context window: 131,072 tokens
Max output: 65,536 tokens
input per 1m tokens
$1.045000
cached input per 1m tokens
$0.176000
output per 1m tokens
$4.400000
Kimi K2.5
Kimi-K2.5
Kimi K2.5 for text generation, reasoning, tool calling, and live streaming responses.
Chat
Streaming Tools Context window: 262,144 tokens
Max output: 262,144 tokens
input per 1m tokens
$0.660000
cached input per 1m tokens
$0.110000
output per 1m tokens
$3.300000
Kimi K2.6
Kimi-K2.6
Kimi K2.6 for text generation, reasoning, tool calling, and live streaming responses.
Chat
Streaming Tools Context window: 262,144 tokens
Max output: 262,144 tokens
input per 1m tokens
$1.045000
cached input per 1m tokens
$0.176000
output per 1m tokens
$4.400000
MAI DS R1
MAI-DS-R1
MAI DS R1 for text generation, reasoning, tool calling, and streaming responses.
Chat
Streaming Context window: 163,840 tokens
Max output: 163,840 tokens
input per 1m tokens
$1.350000
output per 1m tokens
$5.400000
minimum hold
$0.010000
Meta Llama 3 405B Instruct
llama-3-405b-instruct
Meta Llama 3 405B Instruct through Google Vertex AI Model Garden/MaaS with Omixa routing and streaming.
Chat
Streaming Tools Context window: 131,072 tokens
Max output: 8,192 tokens
input per 1m tokens
$2.700000
output per 1m tokens
$2.700000
minimum hold
$0.010000
Meta Llama 3 70B Instruct
llama-3-70b-instruct
Meta Llama 3 70B Instruct through Google Vertex AI Model Garden/MaaS with Omixa routing and streaming.
Chat
Streaming Tools Context window: 8,192 tokens
Max output: 8,192 tokens
input per 1m tokens
$0.710000
output per 1m tokens
$0.710000
minimum hold
$0.010000
Meta Llama 3 8B Instruct
llama-3-8b-instruct
Meta Llama 3 8B Instruct through Google Vertex AI Model Garden/MaaS with Omixa routing and streaming.
Chat
Streaming Tools Context window: 8,192 tokens
Max output: 8,192 tokens
input per 1m tokens
$0.200000
output per 1m tokens
$0.200000
minimum hold
$0.010000
Meta Llama 3.2 90B Instruct
llama-3.2-90b-instruct
Meta Llama 3.2 90B Instruct through Google Vertex AI Model Garden/MaaS with Omixa routing and streaming.
Chat
Streaming Tools Context window: 131,072 tokens
Max output: 8,192 tokens
input per 1m tokens
$0.900000
output per 1m tokens
$0.900000
minimum hold
$0.010000
Meta Llama 3.3 70B Instruct
Llama-3.3-70B-Instruct
Meta Llama 3.3 70B Instruct through Google Vertex AI Model Garden/MaaS with Omixa routing and streaming.
Chat
Streaming Tools Context window: 131,072 tokens
Max output: 8,192 tokens
input per 1m tokens
$0.710000
output per 1m tokens
$0.710000
minimum hold
$0.010000
Meta Llama 4 Maverick Instruct
Llama-4-Maverick-17B-128E-Instruct-FP8
Meta Llama 4 Maverick Instruct through Google Vertex AI Model Garden/MaaS with Omixa routing and streaming.
Chat
Streaming Tools Context window: 524,288 tokens
Max output: 8,192 tokens
input per 1m tokens
$0.350000
output per 1m tokens
$1.150000
minimum hold
$0.010000
Meta Llama 4 Scout Instruct
llama-4-scout-17b-16e-instruct
Meta Llama 4 Scout Instruct through Google Vertex AI Model Garden/MaaS with Omixa routing and streaming.
Chat
Streaming Tools Context window: 1,048,576 tokens
Max output: 8,192 tokens
input per 1m tokens
$0.180000
output per 1m tokens
$0.590000
minimum hold
$0.010000
MiniMax M2
MiniMax-M2
MiniMax M2 through Google Vertex AI Model Garden/MaaS with Omixa routing and streaming.
Chat
Streaming Tools Context window: 196,608 tokens
Max output: 196,608 tokens
input per 1m tokens
$0.300000
cached input per 1m tokens
$0.030000
output per 1m tokens
$1.200000
Mistral Document AI 2505
mistral-document-ai-2505
Mistral Document AI 2505 for text generation, reasoning, tool calling, and streaming responses.
Chat
Streaming Streaming supported
Mistral Document AI 2512
mistral-document-ai-2512
Mistral Document AI 2512 for text generation, reasoning, tool calling, and streaming responses.
Chat
Streaming Streaming supported
Mistral Large 3
Mistral-Large-3
Mistral Large 3 for text generation, reasoning, tool calling, and live streaming responses.
Chat
Streaming Tools Streaming supported
Tool/function calling supported
input per 1m tokens
$0.500000
output per 1m tokens
$1.500000
minimum hold
$0.010000
Model Router
model-router
Model Router for text generation, reasoning, tool calling, and live streaming responses.
Chat
Streaming Tools Streaming supported
Tool/function calling supported
input per 1m tokens
$2.000000
cached input per 1m tokens
$0.500000
output per 1m tokens
$8.000000
Qwen3 235B A22B Instruct 2507
qwen3-235b-a22b-instruct-2507
Qwen3 235B A22B Instruct 2507 through Google Vertex AI Model Garden/MaaS with Omixa routing and streaming.
Chat
Streaming Tools Context window: 262,144 tokens
Max output: 65,536 tokens
input per 1m tokens
$0.220000
cached input per 1m tokens
$0.022000
output per 1m tokens
$1.800000
Qwen3 Coder 480B A35B Instruct
qwen3-coder-480b-a35b-instruct
Qwen3 Coder 480B A35B Instruct through Google Vertex AI Model Garden/MaaS with Omixa routing and streaming.
Chat
Streaming Tools Context window: 262,144 tokens
Max output: 65,536 tokens
input per 1m tokens
$0.220000
cached input per 1m tokens
$0.022000
output per 1m tokens
$1.800000
Qwen3 Next 80B A3B Instruct
qwen3-next-80b-a3b-instruct
Qwen3 Next 80B A3B Instruct through Google Vertex AI Model Garden/MaaS with Omixa routing and streaming.
Chat
Streaming Tools Context window: 262,144 tokens
Max output: 65,536 tokens
input per 1m tokens
$0.220000
cached input per 1m tokens
$0.022000
output per 1m tokens
$1.800000
Qwen3 Next 80B A3B Thinking
qwen3-next-80b-a3b-thinking
Qwen3 Next 80B A3B Thinking through Google Vertex AI Model Garden/MaaS with Omixa routing and streaming.
Chat
Streaming Tools Context window: 262,144 tokens
Max output: 65,536 tokens
input per 1m tokens
$0.220000
cached input per 1m tokens
$0.022000
output per 1m tokens
$1.800000
o1
o1
o1 for text generation, reasoning, tool calling, and live streaming responses.
Chat
Tools Context window: 200,000 tokens
Max output: 100,000 tokens
input per 1m tokens
$15.000000
cached input per 1m tokens
$7.500000
output per 1m tokens
$60.000000
o1 Mini
o1-mini
o1 Mini for text generation, reasoning, tool calling, and streaming responses.
Chat
Tools Context window: 128,000 tokens
Max output: 65,536 tokens
input per 1m tokens
$15.000000
cached input per 1m tokens
$7.500000
output per 1m tokens
$60.000000
o1 Preview
o1-preview
o1 Preview for text generation, reasoning, tool calling, and streaming responses.
Chat
Tools Context window: 128,000 tokens
Max output: 32,768 tokens
input per 1m tokens
$15.000000
cached input per 1m tokens
$7.500000
output per 1m tokens
$60.000000
o3
o3
o3 for text generation, reasoning, tool calling, and live streaming responses.
Chat
Streaming Tools Context window: 200,000 tokens
Max output: 100,000 tokens
input per 1m tokens
$2.000000
cached input per 1m tokens
$0.500000
output per 1m tokens
$8.000000
o3 Mini
o3-mini
o3 Mini for text generation, reasoning, tool calling, and live streaming responses.
Chat
Streaming Tools Context window: 200,000 tokens
Max output: 100,000 tokens
input per 1m tokens
$1.100000
cached input per 1m tokens
$0.550000
output per 1m tokens
$4.400000
o3 Pro
o3-pro
o3 Pro for text generation, reasoning, tool calling, and live streaming responses.
Chat
Tools Context window: 200,000 tokens
Max output: 100,000 tokens
input per 1m tokens
$20.000000
output per 1m tokens
$80.000000
minimum hold
$0.010000
o4 Mini
o4-mini
o4 Mini for text generation, reasoning, tool calling, and live streaming responses.
Chat
Streaming Tools Context window: 200,000 tokens
Max output: 100,000 tokens
input per 1m tokens
$1.100000
cached input per 1m tokens
$0.275000
output per 1m tokens
$4.400000