Back to news
Jan 28, 2026

Grok Imagine API

State-of-the-art video generation across quality, cost, and latency.

A world-class video generation model.

A breakthrough video editing model.

We're excited to unveil the Grok Imagine API, a unified bundle of powerful APIs designed for end-to-end creative workflows. Grok Imagine is our most powerful video-audio generative model yet. Bring an image to life, start from a simple text prompt, or even refine a complex cinematic sequence.

Performance and Benchmarks

We engineered these models to deliver high-quality, native video-audio generation on par with today's top providers, while also optimizing latency, concurrency, and cost—refined through multiple rounds of close partner feedback. One message came through consistently: quality alone is not enough if latency and cost make iteration painful. By improving speed and economics, we enable developers, creative teams, and enterprise workflows to explore multiple directions in parallel—converging faster through rapid, cost-effective experimentation.

Artificial Analysis: Text-to-Video Rankings

RankPriceLatency
Grok Imagine
1
Veo 3.1 Fast
4
Veo 3
5
Sora 2 Pro
9
Sora 2
12

Artificial Analysis T2V Score vs Latency

Higher score with lower latency is better.

AA T2V Score
Latency(seconds)

Artificial Analysis T2V Score vs Price

Higher score with lower price is better.

AA T2V Score
Price($/sec)

LMArena T2V Score vs Latency

Higher score with lower latency is better.

LMArena T2V Score
Latency(seconds)

LMArena T2V Score vs Price

Higher score with lower price is better.

LMArena T2V Score
Price($/sec)
* Source: Artificial Analysis official website as of 2026-01-28 0:00 PST, LMArena official website as of 2026-02-04 16:30 PST
** We evaluated latency using 10 identical prompts, with each prompt tested 10 times per model to account for variability in processing times. P50 latency in seconds are reported for videos generated at 720p resolution with an 8-second duration. We measured end-to-end API latency using a 1-second polling interval using fal.ai for all models except Veo (measured using Vertex API), Sora (measured using OpenAI API), and Grok Imagine (measured using the xAI API)

Video Editing Benchmark

Grok Imagine consistently outperforms competitors across key evaluation metrics.

Grok Imagine
Kling o1
Overall
57%
43%
Instruction Following
53.1%
46.9%
Consistency
60.6%
39.4%
Grok Imagine
Runway Aleph
Overall
64.1%
35.9%
Instruction Following
57.4%
42.6%
Consistency
63.1%
36.9%
* Human raters conducted side-by-side comparison on IVEBench comprising of a diverse database of high-quality source videos, spanning seven semantic dimensions, and covering varying video lengths (each including a prompt and input video, and evaluating a single generated video). All comparisons were done at 1280 x 720 resolution.

Video Generation

Transform static images or text into dynamic, high-quality video sequences.

Cinematic motion understanding

Transform your photos into cinematic videos with realistic motion, object interactions, and visual continuity.

Flexible styles & formats

Support for portrait, landscape, and platform-ready aspect ratios—across flexible clip lengths.

G
grok_imagine
Original Sound — AI Generated

Imagine for everyone

Whether you're a creator, educator, influencer, designer, or parent, Grok Imagine makes it easy to bring ideas to life—fast.

Avatar

"I have a story in my head—and I want to bring it to life without a full production team."

Turn ideas into cinematic visuals instantly. Prototype, iterate, and publish faster while staying fully in control of your creative vision.

Video Editing

Add, Remove, Swap Objects

Add an object, remove an unwanted element, or swap out a prop with high precision and consistency.

Add Performance

Animate any character with your own performance.

Scene Control

Effortlessly transform any scene—switch from golden sunshine to autumn, winter, fog, sunset, or cloudy settings in seconds.

Object Control

Edit colors and objects with precision and control for your product showcase.

Restyle

Instantly reinvent your footage with any visual style.

Sketches to Life

From static black-and-white lines to vibrant, living animations — instantly.

What our partners are saying

The Grok Imagine API is available on our partner platforms, alongside our own API.

Fal logo

"We are thrilled to partner with xAI. Grok Imagine API delivers outstanding video quality with native audio video generation, combining photorealistic realism, strong creative style, and an impressive level of control."

ComfyUI logo

"The ComfyUI team gravitates strongly toward retro anime and cyberpunk aesthetics, and Grok performs exceptionally well in both text-to-image and image-to-video generation for these styles."

Invideo logo

"Grok is awesome because it helps bring the craziest ideas to life. At a video level grok brings visuals to life with accuracy and fluidity to actions that are prompted as well as to create visuals that are creatively restrictive to make using other models.The Addition of sound and dialogue further help in bringing visuals to life. "

Flora logo

"Imagine API is great at fast, high-quality visual ideation. It’s particularly strong at capturing scene-level style, mood, and physical realism, making it a great fit for early creative exploration and rapid iteration!"

HeyGen logo

"We've been integrating Grok into our Video Agent at HeyGen. With Grok's latest video model, you can just prompt edits directly for quick tweaks, empowering our users to iterate on clips in minutes instead of completely leaving it to luck. Excited to see more capabilities unlock from Grok."

Get Started

Start building with Grok Imagine API today! We offer:

  • APIs and SDKs for seamless integration
  • Tutorials to jumpstart Grok Imagine projects
  • Hosted console for visual creation and experimentation