Inside Google's TPU V8 strategy, delivering two chips for two crucial tasks at incredible scale — network scales up to 1 million TPUs per cluster, an advantage over Nvidia AI accelerators

MEMBER EXCLUSIVE
The Google TPU 8i and 8t
(Image credit: Google)

Google announced its eighth-generation Tensor Processing Units at Cloud Next on April 22, shipping two distinct chip designs for the first time in the TPU program's decade-long history. The two chips — TPU 8t and TPU 8i — are intended for use in different workloads. TPU 8t targets large-scale model training, while TPU 8i is built for low-latency inference and reasoning workloads.

The split also extends to the supply chain, with MediaTek having joined Broadcom as a silicon design partner for the eighth-gen program back in December, ending Broadcom’s exclusive role in TPU development since 2015. Both chips are fabricated on TSMC's N3 process family with HBM3E memory and will be available to Google Cloud customers later this year.

Latest Videos From
Swipe to scroll horizontally
Google TPU 8 Specs
Row 0 - Cell 0

TPU 8t

TPU 8i

Workload

Large-scale pre-training

Sampling, serving, and reasoning

Network topology

3D Torus

Boardfly

Specialized chip features

SparseCore (Embeddings) & LLM Decoder Engine

CAE (Collectives Acceleration Engine)

HBM capacity

216 GB

288 GB

On-chip SRAM

128 MB

384 MB

Peak FP4 PFLOPs

12.6

10.1

HBM bandwidth

6,528 GB/s

8,601 GB/s

CPU header

Arm Axion

Arm Axion

TOPICS
Luke James
Contributor

Luke James is a freelance writer and journalist.  Although his background is in legal, he has a personal interest in all things tech, especially hardware and microelectronics, and anything regulatory.