Grok Voice

∣

Available in API

Voice agents that
feel human.

Deploy intelligent speech-to-speech voice agents for customer support, sales, and more. Enterprise-grade text-to-speech and speech-to-text APIs.

Open Playground Documentation

#1 Tau Voice LeaderboardSub-second latency25+ languages$0.05 / min

Voice Agent

Build real-time voice agents with tool use, search, and multi-turn conversation.

Full-duplex real-time conversations with sub-second latency
Built-in reasoning for complex, multi-step requests
Orchestrate dozens of tools in ambiguous real-world workflows

Open Playground Read Docs

Start

Ara

Eve

Leo

Rex

Sal

Text to Speech

Natural speech from text with multiple voices and audio formats. Built for telephony and web. Enter text, choose a voice, and press Play.

80+ natural voices across 25+ languages
Speech tags for tone, pauses, whisper, and laughter
PCM, MP3, Opus, FLAC, and WAV outputs

Open Playground Read Docs

Speech to Text

Enterprise-grade transcription for phone calls, meetings, videos, and podcasts.

Entity recognition across medicine, law, and finance
Inverse text normalization for numbers, currencies, and more
Streaming and batch endpoints from one API
Speaker diarization for multi-speaker audio

Open Playground Read Docs

MP3, WAV, OGG, Opus, FLAC, AAC, MP4, M4A, MKV, MOV, WebM

Custom voices

Clone a voice from a short recording and use it instantly across Grok Text to Speech and Voice Agent APIs.

Clone from under a minute of natural speech
Two-stage verification: passphrase + speaker embedding match
Inherits every TTS capability — speech tags, multilingual, streaming

Clone Your Voice Read Docs

Original

Cloned

Start

Ara

Eve

Leo

Rex

Sal

The full voice stack

Everything you need to build production voice experiences — from realtime agents to batch transcription.

Realtime voice agents

Full-duplex conversations with sub-second latency

Text-to-speech

Natural speech from text across 80+ voices

Speech-to-text

Accurate transcription with speaker diarization

Tool calling

Call APIs and take actions mid-conversation

Custom voices

Clone or create voices for your brand

25+ languages

Multilingual with natural intonation per locale

Sub-second latency

Fast enough for real conversations at scale

Speech tags

Control whisper, laughter, pauses, and tone

Speaker diarization

Identify who said what in multi-speaker audio

Streaming & batch

Realtime WebSocket or async batch processing

Multiple audio formats

PCM, MP3, Opus, FLAC, WAV, and more

Session control

Dynamic instructions, context, and tool updates

Enterprise ready

SOC 2, HIPAA eligible, and GDPR compliant

Text normalization

Proper formatting of numbers, dates, addresses

Interruption handling

Natural turn-taking with barge-in support

80+ voices across 25+ languages

Multilingual voices with natural intonation. Preview any voice instantly.

Pricing

Simple, transparent pricing

Straightforward usage-based pricing with no hidden fees, minimums, or force upgrades.

Pricing Docs

Realtime

Real-time voice conversations over WebSocket

$0.05 / min·$3.00 / hr

Text to Speech

Convert text to natural speech

$15.00 / 1M characters

Speech to Text

Transcribe audio files and live streams

$0.10 / hr·$0.20 / hr(streaming)

Need higher limits or rollout help?

Talk with xAI about onboarding, custom limits, and enterprise deployment.

Contact Sales

Enterprise

Trust, controls, and deployment support

Enterprise-ready controls, compliance, security, and scale.

Contact Sales

SOC 2 Type II

Audited controls for security, availability, and confidentiality.

HIPAA eligible

BAA available for healthcare applications handling protected health information.

GDPR and DPA support

Data processing agreements and EU data residency options.

High availability

Multi-region infrastructure for enterprise workloads.

Custom rate limits

Concurrent session and request limits scaled to your traffic.

SSO and audit controls

SAML SSO, role-based access, and audit logging for your team.

Zero Data Retention

Enable zero data retention for your deployments.

Ready to build with voice?

Get an API key and start building in minutes, or talk to our team about enterprise deployment.

Get API Key Contact Sales

Voice agents thatfeel human.

Voice Agent

Text to Speech

Speech to Text

Custom voices

The full voice stack

80+ voices across 25+ languages

Simple, transparent pricing

Trust, controls, and deployment support

SOC 2 Type II

HIPAA eligible

GDPR and DPA support

High availability

Custom rate limits

SSO and audit controls

Zero Data Retention

Ready to build with voice?

Voice agents that
feel human.