Built in 122 days — outpacing every estimate — Colossus was the most powerful AI training system yet. Then we doubled it in 92 days to 200k GPUs. This is just the beginning.
H100 GPUs in a single interconnected cluster
The build
We were told it would take 24 months to build. So we took the project into our own hands, questioned everything, removed whatever was unnecessary, and accomplished our goal in four months.

By the numbers
We doubled our compute at an unprecedented rate, with a roadmap to 1M GPUs. Progress in AI is driven by compute and no one has come close to building at this magnitude and speed.
NVIDIA H100 GPUs in a single interconnected cluster
Aggregate memory bandwidth across the full system
Per-server network bandwidth for distributed training
Total storage for training data and checkpoints
Timeline
Doubled to 200K GPUs.