Chirp
chirp
Chirp for speech, transcription, translation, or voice generation workflows.
Audio
Chirp 2
chirp-2
Chirp 2 for speech, transcription, translation, or voice generation workflows.
Audio
Chirp 3
chirp-3
Chirp 3 for speech, transcription, translation, or voice generation workflows.
Audio
GPT Audio
gpt-audio
GPT Audio for speech, transcription, translation, or voice generation workflows.
Audio
Context window: 128,000 tokens
Max output: 16,384 tokens
input per 1m tokens
$2.500000
output per 1m tokens
$10.000000
minimum hold
$0.010000
GPT Audio 1.5
gpt-audio-1.5
GPT Audio 1.5 for speech, transcription, translation, or voice generation workflows.
Audio
Context window: 128,000 tokens
Max output: 16,384 tokens
input per 1m tokens
$2.500000
output per 1m tokens
$10.000000
minimum hold
$0.010000
GPT Audio Mini
gpt-audio-mini
GPT Audio Mini for speech, transcription, translation, or voice generation workflows.
Audio
Context window: 128,000 tokens
Max output: 16,384 tokens
input per 1m tokens
$0.600000
output per 1m tokens
$2.400000
minimum hold
$0.010000
GPT Realtime
gpt-realtime
GPT Realtime for speech, transcription, translation, or voice generation workflows.
Audio
Context window: 32,000 tokens
Max output: 4,096 tokens
input per 1m tokens
$4.000000
cached input per 1m tokens
$0.400000
output per 1m tokens
$16.000000
GPT Realtime 1.5
gpt-realtime-1.5
GPT Realtime 1.5 for speech, transcription, translation, or voice generation workflows.
Audio
Context window: 32,000 tokens
Max output: 4,096 tokens
input per 1m tokens
$4.000000
cached input per 1m tokens
$0.400000
output per 1m tokens
$16.000000
GPT Realtime 2
gpt-realtime-2
GPT Realtime 2 for speech, transcription, translation, or voice generation workflows.
Audio
Context window: 32,000 tokens
Max output: 4,096 tokens
input per 1m tokens
$4.000000
cached input per 1m tokens
$0.400000
output per 1m tokens
$24.000000
GPT Realtime Mini
gpt-realtime-mini
GPT Realtime Mini for speech, transcription, translation, or voice generation workflows.
Audio
Context window: 32,000 tokens
Max output: 4,096 tokens
input per 1m tokens
$0.600000
cached input per 1m tokens
$0.060000
output per 1m tokens
$2.400000
GPT-4o Audio Preview
gpt-4o-audio-preview
GPT-4o Audio Preview for speech, transcription, translation, or voice generation workflows.
Audio
Context window: 128,000 tokens
Max output: 16,384 tokens
input per 1m tokens
$2.500000
output per 1m tokens
$10.000000
minimum hold
$0.010000
GPT-4o Mini Audio Preview
gpt-4o-mini-audio-preview
GPT-4o Mini Audio Preview for speech, transcription, translation, or voice generation workflows.
Audio
Context window: 128,000 tokens
Max output: 16,384 tokens
input per 1m tokens
$0.150000
output per 1m tokens
$0.600000
minimum hold
$0.010000
GPT-4o Mini Realtime Preview
gpt-4o-mini-realtime-preview
GPT-4o Mini Realtime Preview for speech, transcription, translation, or voice generation workflows.
Audio
Context window: 128,000 tokens
Max output: 4,096 tokens
input per 1m tokens
$0.600000
cached input per 1m tokens
$0.300000
output per 1m tokens
$2.400000
GPT-4o Mini TTS
gpt-4o-mini-tts
GPT-4o Mini TTS for speech, transcription, translation, or voice generation workflows.
Audio
input per 1m tokens
$0.600000
output per 1m tokens
$12.000000
audio per minute
$0.020000
GPT-4o Mini Transcribe
gpt-4o-mini-transcribe
GPT-4o Mini Transcribe for speech, transcription, translation, or voice generation workflows.
Audio
input per 1m tokens
$1.250000
output per 1m tokens
$5.000000
audio per minute
$0.003000
GPT-4o Realtime Preview
gpt-4o-realtime-preview
GPT-4o Realtime Preview for speech, transcription, translation, or voice generation workflows.
Audio
Context window: 32,000 tokens
Max output: 4,096 tokens
input per 1m tokens
$5.000000
cached input per 1m tokens
$2.500000
output per 1m tokens
$20.000000
GPT-4o Transcribe
gpt-4o-transcribe
GPT-4o Transcribe for speech, transcription, translation, or voice generation workflows.
Audio
input per 1m tokens
$2.500000
output per 1m tokens
$10.000000
audio per minute
$0.006000
GPT-4o Transcribe Diarize
gpt-4o-transcribe-diarize
GPT-4o Transcribe Diarize for speech, transcription, translation, or voice generation workflows.
Audio
input per 1m tokens
$2.500000
output per 1m tokens
$10.000000
minimum hold
$0.010000
Gemini 2.0 Flash Live
gemini-2.0-flash-live-001
Gemini 2.0 Flash Live for speech, transcription, translation, or voice generation workflows.
Audio
Streaming Tools Context window: 1,048,576 tokens
Max output: 8,192 tokens
input per 1m tokens
$0.500000
output per 1m tokens
$2.000000
audio per minute
$0.018000
Gemini 2.5 Flash Live Preview
gemini-2.5-flash-live-preview
Gemini 2.5 Flash Live Preview for speech, transcription, translation, or voice generation workflows.
Audio
Streaming Tools Context window: 1,048,576 tokens
Max output: 8,192 tokens
input per 1m tokens
$0.500000
output per 1m tokens
$2.000000
audio per minute
$0.018000
Gemini 2.5 Flash TTS
gemini-2.5-flash-tts
Gemini 2.5 Flash TTS for speech, transcription, translation, or voice generation workflows.
Audio
Streaming Streaming supported
Reasoning controls: minimal, low, medium, high
input per 1m tokens
$0.500000
output per 1m tokens
$10.000000
audio per minute
$0.015000
Gemini 2.5 Flash-Lite TTS Preview
gemini-2.5-flash-lite-preview-tts
Gemini 2.5 Flash-Lite TTS Preview for speech, transcription, translation, or voice generation workflows.
Audio
Streaming Streaming supported
Reasoning controls: minimal, low, medium, high
input per 1m tokens
$0.500000
output per 1m tokens
$10.000000
audio per minute
$0.015000
Gemini 2.5 Pro TTS
gemini-2.5-pro-tts
Gemini 2.5 Pro TTS for speech, transcription, translation, or voice generation workflows.
Audio
Streaming Streaming supported
Reasoning controls: low, medium, high
input per 1m tokens
$1.000000
output per 1m tokens
$20.000000
audio per minute
$0.030000
Gemini 3.1 Flash Live Preview
gemini-3.1-flash-live-preview
Gemini 3.1 Flash Live Preview for speech, transcription, translation, or voice generation workflows.
Audio
Streaming Tools Context window: 1,048,576 tokens
Max output: 8,192 tokens
input per 1m tokens
$0.750000
output per 1m tokens
$4.500000
audio per minute
$0.018000
Gemini 3.1 Flash TTS Preview
gemini-3.1-flash-tts-preview
Gemini 3.1 Flash TTS Preview for speech, transcription, translation, or voice generation workflows.
Audio
Streaming Streaming supported
Reasoning controls: minimal, low, medium, high
input per 1m tokens
$1.000000
output per 1m tokens
$20.000000
audio per minute
$0.030000
TTS
tts
TTS for speech, transcription, translation, or voice generation workflows.
Audio
audio per minute
$0.020000
minimum hold
$0.010000
TTS HD
tts-hd
TTS HD for speech, transcription, translation, or voice generation workflows.
Audio
audio per minute
$0.020000
minimum hold
$0.010000
Whisper
whisper
Whisper for speech, transcription, translation, or voice generation workflows.
Audio
audio per minute
$0.020000
minimum hold
$0.010000