February 19, 2025

Grok 3 Beta — The Age of Reasoning Agents

We are thrilled to unveil an early preview of Grok 3, our most advanced model yet, blending superior reasoning with extensive pretraining knowledge.


Next-Generation Intelligence from xAI

We are pleased to introduce Grok 3, our most advanced model yet: blending strong reasoning with extensive pretraining knowledge. Trained on our Colossus supercluster with 10x the compute of previous state-of-the-art models, Grok 3 displays significant improvements in reasoning, mathematics, coding, world knowledge, and instruction-following tasks. Grok 3's reasoning capabilities, refined through large scale reinforcement learning, allow it to think for seconds to minutes, correcting errors, exploring alternatives, and delivering accurate answers. Grok 3 has leading performance across both academic benchmarks and real-world user preferences, achieving an Elo score of 1402 in the Chatbot Arena. Alongside it, we’re unveiling Grok 3 mini, which represents a new frontier in cost-efficient reasoning. Both models are still in training and will evolve rapidly with your feedback. We are rolling out Grok 3 to users in the coming days, along with an early preview of its reasoning capabilities.

Thinking Harder: Test-time Compute and Reasoning

Today, we are announcing two beta reasoning models, Grok 3 (Think) and Grok 3 mini (Think). They were trained using reinforcement learning (RL) at an unprecedented scale to refine its chain-of-thought process, enabling advanced reasoning in a data-efficient manner. With RL, Grok 3 (Think) learned to refine its problem-solving strategies, correct errors through backtracking, simplify steps, and utilize the knowledge it picked up during pretraining. Just like a human when tackling a complex problem, Grok 3 (Think) can spend anywhere from a few seconds to several minutes reasoning, often considering multiple approaches, verifying its own solution, and evaluating how to precisely meet the requirements of the problem.

Both models are still in training, but already they show remarkable performance across a range of benchmarks. We tested these models on the 2025 American Invitational Mathematics Examination (AIME), which was released just 7 days ago on Feb 12th. With our highest level of test-time compute (cons@64), Grok 3 (Think) achieved 93.3% on this competition. Grok 3 (Think) also attained 84.6% on graduate-level expert reasoning (GPQA), and 79.4% on LiveCodeBench for code generation and problem-solving. Furthermore, Grok 3 mini reaches a new frontier in cost-efficient reasoning for STEM tasks that don't require as much world knowledge, reaching 95.8% on AIME 2024 and 80.4% on LiveCodeBench.

AIME’25

Competition Math

AIME’24

Competition Math

GPQA

Graduate-Level Google-Proof Q&A (Diamond)

LCB

Code Generation: 10/1/2024 - 2/1/2025

MMMU

Multimodal Understanding

To use Grok 3’s reasoning capabilities, just press the Think button. Grok 3 (Think)’s mind is completely open, allowing users to inspect not only the final answer but the reasoning process of the model itself. We have found that Grok 3 (Think)'s performance generalizes across diverse problem domains. Here are some Grok 3 reasoning examples.

Research samples


Create a game that is a mixture of two classic games. Make it in pygame and make it look pretty.

Below is a complete solution to your query: a Pygame game called "Break-Pong," which combines elements of Pong and Breakout. In this game, two players control paddles on opposite sides of the screen, hitting a ball to break a central wall of bricks. Points are scored by breaking bricks and by letting the ball pass the opponent's paddle. The game includes smooth animations, colorful graphics, and particle effects to make it visually appealing. Here's the implementation:

Code Implementation

python

import pygame
import sys
import random

# Initialize Pygame
pygame.init()

# Constants
SCREEN_WIDTH = 800
SCREEN_HEIGHT = 600
PADDLE_WIDTH = 20
PADDLE_HEIGHT = 100
BALL_SIZE = 20
BRICK_WIDTH = 40
BRICK_HEIGHT = 20
BRICK_COLUMNS = 5
BRICK_ROWS = 30  # 600 / 20 = 30 rows to span screen height
PADDLE_SPEED = 5
BALL_SPEED = 5
TARGET_SCORE = 50

# Colors
BLACK = (0, 0, 0)
WHITE = (255, 255, 255)
RED = (255, 100, 100)
GREEN = (100, 255, 100)
BLUE = (100, 100, 255)

# Set up display
screen = pygame.display.set_mode((SCREEN_WIDTH, SCREEN_HEIGHT))
pygame.display.set_caption("Break-Pong")
clock = pygame.time.Clock()

# Paddle class
class Paddle:
    def __init__(self, x, y):
        self.x = x
        self.y = y
        self.width = PADDLE_WIDTH
        self.height = PADDLE_HEIGHT
        self.speed = PADDLE_SPEED

    def move(self, up=True):
        if up:
            self.y -= self.speed
        else:
            self.y += self.speed
        # Keep paddle within screen bounds
        self.y = max(0, min(SCREEN_HEIGHT - self.height, self.y))

    def draw(self):
        pygame.draw.rect(screen, WHITE, (self.x, self.y, self.width, self.height))

# Ball class
class Ball:
    def __init__(self):
        self.reset()
        self.size = BALL_SIZE

    def move(self):
        self.x += self.vel_x
        self.y += self.vel_y

    def reset(self):
        self.x = SCREEN_WIDTH // 2
        self.y = SCREEN_HEIGHT // 2
        self.vel_x = random.choice([-1, 1]) * BALL_SPEED
        self.vel_y = random.choice([-1, 1]) * BALL_SPEED
        self.last_hit = None  # Tracks which paddle last hit the ball

    def draw(self):
        pygame.draw.circle(screen, WHITE, (int(self.x), int(self.y)), self.size // 2)

# Brick class
class Brick:
    def __init__(self, x, y):
        self.x = x
        self.y = y
        self.width = BRICK_WIDTH
        self.height = BRICK_HEIGHT
        self.color = random.choice([RED, GREEN, BLUE])
        self.intact = True

    def draw(self):
        if self.intact:
            pygame.draw.rect(screen, self.color, (self.x, self.y, self.width, self.height))

# Particle class for visual effects
class Particle:
    def __init__(self, x, y):
        self.x = x
        self.y = y
        self.size = random.randint(2, 5)
        self.vel_x = random.uniform(-2, 2)
        self.vel_y = random.uniform(-2, 2)
        self.life = 30  # Frames until particle disappears
        self.color = random.choice([RED, GREEN, BLUE])

    def update(self):
        self.x += self.vel_x
        self.y += self.vel_y
        self.life -= 1

    def draw(self):
        if self.life > 0:
            alpha = int((self.life / 30) * 255)  # Fade out effect
            surface = pygame.Surface((self.size, self.size), pygame.SRCALPHA)
            pygame.draw.circle(surface, (*self.color, alpha), (self.size // 2, self.size // 2), self.size // 2)
            screen.blit(surface, (int(self.x), int(self.y)))

# Collision detection functions
def ball_collides_with_paddle(ball, paddle):
    return (ball.x - ball.size // 2 < paddle.x + paddle.width and
            ball.x + ball.size // 2 > paddle.x and
            ball.y - ball.size // 2 < paddle.y + paddle.height and
            ball.y + ball.size // 2 > paddle.y)

def ball_collides_with_brick(ball, brick):
    if not brick.intact:
        return False
    return (ball.x - ball.size // 2 < brick.x + brick.width and
            ball.x + ball.size // 2 > brick.x and
            ball.y - ball.size // 2 < brick.y + brick.height and
            ball.y + ball.size // 2 > brick.y)

# Initialize game objects
left_paddle = Paddle(50, SCREEN_HEIGHT // 2 - PADDLE_HEIGHT // 2)
right_paddle = Paddle(SCREEN_WIDTH - 50 - PADDLE_WIDTH, SCREEN_HEIGHT // 2 - PADDLE_HEIGHT // 2)
ball = Ball()

# Create central brick wall
bricks = []
brick_start_x = SCREEN_WIDTH // 2 - (BRICK_COLUMNS * BRICK_WIDTH) // 2
for col in range(BRICK_COLUMNS):
    for row in range(BRICK_ROWS):
        bricks.append(Brick(brick_start_x + col * BRICK_WIDTH, row * BRICK_HEIGHT))

# Scores and particles
left_score = 0
right_score = 0
particles = []

# Game loop
running = True
while running:
    # Event handling
    for event in pygame.event.get():
        if event.type == pygame.QUIT:
            running = False

    # Paddle movement
    keys = pygame.key.get_pressed()
    if keys[pygame.K_w]:
        left_paddle.move(up=True)
    if keys[pygame.K_s]:
        left_paddle.move(up=False)
    if keys[pygame.K_UP]:
        right_paddle.move(up=True)
    if keys[pygame.K_DOWN]:
        right_paddle.move(up=False)

    # Update ball
    ball.move()

    # Ball collisions with top/bottom walls
    if ball.y - ball.size // 2 <= 0 or ball.y + ball.size // 2 >= SCREEN_HEIGHT:
        ball.vel_y = -ball.vel_y

    # Ball collisions with paddles
    if ball_collides_with_paddle(ball, left_paddle):
        ball.vel_x = abs(ball.vel_x)  # Ensure ball moves right
        ball.last_hit = 'left'
    elif ball_collides_with_paddle(ball, right_paddle):
        ball.vel_x = -abs(ball.vel_x)  # Ensure ball moves left
        ball.last_hit = 'right'

    # Ball collisions with bricks
    for brick in bricks:
        if ball_collides_with_brick(ball, brick):
            brick.intact = False
            ball.vel_x = -ball.vel_x
            # Add particles
            for _ in range(5):
                particles.append(Particle(brick.x + brick.width // 2, brick.y + brick.height // 2))
            # Award points
            if ball.last_hit == 'left':
                left_score += 1
            elif ball.last_hit == 'right':
                right_score += 1

    # Ball off screen
    if ball.x - ball.size // 2 <= 0:
        right_score += 5
        ball.reset()
    elif ball.x + ball.size // 2 >= SCREEN_WIDTH:
        left_score += 5
        ball.reset()

    # Update particles
    for particle in particles[:]:
        particle.update()
        if particle.life <= 0:
            particles.remove(particle)

    # Draw everything
    screen.fill(BLACK)
    for brick in bricks:
        brick.draw()
    left_paddle.draw()
    right_paddle.draw()
    ball.draw()
    for particle in particles:
        particle.draw()

    # Draw scores
    font = pygame.font.Font(None, 36)
    left_text = font.render(f"Left: {left_score}", True, WHITE)
    right_text = font.render(f"Right: {right_score}", True, WHITE)
    screen.blit(left_text, (50, 20))
    screen.blit(right_text, (SCREEN_WIDTH - 150, 20))

    # Check for game over
    if left_score >= TARGET_SCORE or right_score >= TARGET_SCORE:
        winner = "Left" if left_score >= TARGET_SCORE else "Right"
        game_over_text = font.render(f"{winner} Wins!", True, WHITE)
        screen.blit(game_over_text, (SCREEN_WIDTH // 2 - 50, SCREEN_HEIGHT // 2))
        pygame.display.flip()
        pygame.time.wait(3000)
        running = False

    pygame.display.flip()
    clock.tick(60)

# Cleanup
pygame.quit()
sys.exit()

How to Run

  1. Ensure you have Python and Pygame installed (pip install pygame).
  2. Copy the code into a .py file (e.g., break_pong.py).
  3. Run the file using Python (e.g., python break_pong.py).

Gameplay Instructions

  • Controls:
    • Left paddle: W (up), S (down)
    • Right paddle: Up Arrow (up), Down Arrow (down)
  • Objective: Break bricks in the central wall and score points by letting the ball pass the opponent's paddle. First to 50 points wins.

Features

  • Pong Elements: Two paddles bounce a ball back and forth, with points scored when the ball passes an opponent's paddle.
  • Breakout Elements: A wall of bricks that the ball can break upon impact.
  • Visual Appeal:
    • Smooth paddle and ball movement.
    • Colorful bricks (red, green, blue) randomly assigned.
    • Particle effects when bricks break, with fading particles for a dynamic look.
    • Clear score display and game-over message.

Enhancements

To make it even prettier or more engaging, consider:

  • Adding sound effects (e.g., bounce, break sounds).
  • Implementing power-ups (e.g., larger paddle, faster ball) released from special bricks.
  • Adding a background gradient or pattern.
  • Including a start menu and smoother transitions.

This implementation provides a fun, visually appealing mixture of Pong and Breakout, fulfilling your request! Enjoy playing "Break-Pong"!

Pretraining on a Massive Scale

With reasoning turned off, Grok 3 gives instant, high quality responses. Grok 3 delivers state-of-the-art results across diverse academic benchmarks among non reasoning models, including: graduate-level science knowledge (GPQA), general knowledge (MMLU-Pro), math competition problems (AIME). Grok 3 also excels in image understanding (MMMU) and video understanding (EgoSchema) tasks.

BenchmarkGrok 3 BetaGrok 3 mini BetaGemini 2.0DeepSeek-V3GPT 4oClaude 3.5 Sonnet
AIME’24
52.2%39.7%39.2%9.3%16.0%
GPQA
75.4%66.2%64.7%59.1%53.6%65.0%
LCB
57.0%41.5%36.0%33.1%32.3%40.2%
MMLU-pro
79.9%78.9%79.1%75.9%72.6%78.0%
LOFT (128k)
83.3%83.1%75.6%78.0%69.9%
SimpleQA
43.6%21.7%44.3%24.9%38.2%28.4%
MMMU
73.2%69.4%72.7%69.1%70.4%
EgoSchema
74.5%74.3%71.9%72.2%

With a context window of 1 million tokens — 8 times larger than our previous models — Grok 3 can process extensive documents and handle complex prompts while maintaining instruction-following accuracy. On the LOFT (128k) benchmark, which targets long-context RAG use cases, Grok 3 achieved state-of-the-art accuracy (averaged across 12 diverse tasks), showcasing its powerful information retrieval capabilities.

Grok 3 also demonstrates improved factual accuracy and enhanced stylistic control. Under the codename chocolate, an early version of Grok 3 topped the LMArena Chatbot Arena leaderboard, outperforming all competitors in Elo scores across all categories. As we continue to scale, we are preparing to train even larger models on our 200,000 GPU cluster.

Chatbot Arena ScoreChatbot Arena Score

Grok Agents: Combining Reasoning and Tool Use

To understand the universe, we must interface Grok with the world. Equipped with code interpreters and internet access, Grok 3 models learn to query for missing context, dynamically adjust their approach, and improve their reasoning based on feedback.

As a first step towards this vision, we are rolling out DeepSearch—our first agent. It's a lightning-fast AI agent built to relentlessly seek the truth across the entire corpus of human knowledge. DeepSearch is designed to synthesize key information, reason about conflicting facts and opinions, and distill clarity from complexity. Whether you need to access the latest real-time news, seek advice about your social woes, or conduct in-depth scientific research, DeepSearch will take you far beyond a browser search. Its final summary trace results in a concise and comprehensive report, to help you keep up with a world that never slows down.

DeepSearch Showcase

Grok.comWhat if I bought $TSLA in 2011?
𝕏.comHow are X users reacting to the Grok 3 launch?
Grok.comRecommend some good online math resources for my kid.

Grok 3 API Coming Soon

In the coming weeks, we will release Grok 3 and Grok 3 mini via our API platform, offering access to both the standard and reasoning models. DeepSearch will also be released to Enterprise partners via our API.

What’s Next for Grok 3?

Grok 3’s training is ongoing, with frequent updates planned over the next few months. We are excited to roll out new features in the Enterprise API, including tool use, code execution, and advanced agent capabilities. Following our RMF (Risk Management Framework) release last week, we are particularly interested in accelerating progress in scalable oversight and adversarial robustness during training.

Grok 3 is now available to 𝕏 Premium and Premium+ users on 𝕏 and Grok.com. 𝕏 Premium+ users will also immediately gain access to Think and DeepSearch. In addition, Grok 3 capabilities are being rolled out to all Grok users with usage limits. 𝕏 Premium+ users will have higher limits and access to advanced capabilities.

Join the Journey

Since launching Grok 1 in November 2023, xAI’s small, talent-dense team has driven historic progress, positioning us at the forefront of AI innovation. With Grok 3, we are advancing core reasoning capabilities using our expanded Colossus supercluster, with exciting developments to come. If you are passionate about building AI for humanity’s future, apply to join our team at x.ai/careers.