Meta's new MTIA lineup joins hyperscalers' unified push for dedicated inferencing chips — companies diversify AI chips in effort to diversify from sole reliance on Nvidia

MEMBER EXCLUSIVE

Meta announced four successive generations of its custom Meta Training and Inference Accelerator (MTIA) chips on March 11: The MTIA 300, 400, 450, and 500, all scheduled for deployment over the next two years. Meta described the chips as progressively optimized for AI inference workloads on the premise that HBM memory bandwidth is the binding constraint on inference.

Coming two weeks after Meta disclosed a long-term AI infrastructure with AMD, the announcement puts Meta alongside Google, AWS, and Microsoft, each of which has spent the last few years building and scaling custom silicon programs for AI accelerated workloads. Will this emerging class of chips put a dent in Nvidia's stranglehold on the AI chip industry?

Latest Videos From

According to Meta, the chips use a modular chiplet architecture that allows the MTIA 400, 450, and 500 to share the same chassis, rack, and network infrastructure. That compatibility means each new chip generation drops into the existing physical footprint without requiring new data center buildouts, the mechanism Meta cited for its roughly six-month development cadence, well faster than the industry's typical one-to-two year cycle. “More importantly, we have deployed hundreds of thousands of MTIA chips in production, onboarded numerous internal production models, and tested MTIA with large language models (LLMs) like Llama.”

Swipe to scroll horizontally

MTIA chips
Row 0 - Cell 0	MTIA 300	MTIA 400	MTIA 450	MTIA 500
Workload Focus	R&R Training	General	AI Inference	AI Inference
Module TDP	800 W	1,200 W	1,400 W	1,700 W
HBM Bandwidth	6.1 TB/s	9.2 TB/s	18.4 TB/s	27.6 TB/s
HBM Capacity	216 GB	288 GB	288 GB	384-512 GB
MX4 Performance	-	12 PFLOPS	21 PFLOPS	30 PLOPS
FP8/MX8 Performance	1.2 PFLOPS	6 PFLOPS	7 PFLOPS	10 PFLOPS
BF16 Performance	0.6 PLOPS	3 PFLOPS	3.5 PFLOPS	5 PFLOPS

Google announced Ironwood, its seventh-generation TPU, at Google Cloud Next in April 2025; the company described it as the first TPU purpose-built for inference and the beginning of an “age of inference,” distinct from the training-first era that preceded it. Ironwood delivers 192 GB of HBM3E per chip at 7.37 TB/s of memory bandwidth, per Google's published specifications, and scales to configurations of up to 9,216 AI accelerators.

Then, in December at re:Invent, AWS announced Trainium3, a 3nm chip with 144 GB HBM3E per chip at 4.9 TB/s bandwidth, with a single Trainium3 UltraServer connecting 144 chips. AWS has also maintained a separate Inferentia product line — a chip dedicated exclusively to inference — since 2019. Meanwhile, Microsoft introduced its Maia 200 for inference workloads built on TSMC 3nm, which it called its “most efficient inference system.”

Broadcom is what’s connecting the dots across many of these programs, having had a hand in building both Google’s TPUs (as the company’s silicon integrator) and Meta’s MTIA family. Meta described the MTIA chips as being developed “in close partnership with” Broadcom, and said that the company “has remained and will continue” to be a key partner of Meta’s AI infrastructure strategy.

This convergence continues with software stacks, with Meta building MTIA natively on PyTorch, vLLM, and Triton. Google also added TPU support for vLLM in beta, and AWS runs its Neuron SDK across PyTorch, TensorFlow, and JAX. These shared inference-serving frameworks ultimately determine how easily production workloads can port between chips, and portability is what will make the economics of switching from CUDA-locked Nvidia silicon as the default GPU credible at scale.

None of this changes Nvidia’s position in large-scale pre-training. Frontier model development still overwhelmingly runs on high-end GPU clusters, and Nvidia’s Blackwell is the current standard for that workload. Meta itself operates large Nvidia GPU clusters alongside MTIA deployments, and its February 2026 AMD agreement adds further GPU capacity to a portfolio that already spans multiple silicon vendors.

TOPICS

Luke James is a freelance writer and journalist. Although his background is in legal, he has a personal interest in all things tech, especially hardware and microelectronics, and anything regulatory.

Meta's new MTIA lineup joins hyperscalers' unified push for dedicated inferencing chips — companies diversify AI chips in effort to diversify from sole reliance on Nvidia

An inference case against GPUs

Google, AWS, and Microsoft

Nvidia retains training