The custom AI ASIC state of play (May 2026) — Broadcom deals, Google TPUs, Meta MTIA & beyond

Google's Alphachip TPU — (Image credit: Google Deepmind)

Nvidia still holds approximately 70% of the AI chip market share, but that share is projected to erode as Google, Amazon, Meta, Microsoft, and OpenAI invest billions in purpose-built chips designed for their specific workloads. ASIC-based AI server shipments are projected to reach 27.8% of the market in 2026, the highest share since 2023, which also forecasts that custom ASIC shipments will grow 44.6% year-over-year in 2026, nearly triple the 16.1% growth rate projected for merchant GPUs.

This is being enabled almost entirely by TSMC, which fabricates chips for all five hyperscalers and for Broadcom, the dominant custom AI chip architect. Broadcom alone carries a $73 billion AI backlog and is targeting $100 billion in annual AI chip revenue by 2027.

Latest Videos From

Broadcom, which has arguably emerged as the core enabler of the AI ASIC ecosystem, reported $8.4 billion in AI semiconductor revenue for Q1 FY2026 (ending February 2026), a 106% year-over-year increase, and guided to $10.7 billion in Q2. CEO Hock Tan told investors the company has "line of sight to achieve AI revenue from chips in excess of $100 billion in 2027," backed by a disclosed $73 billion AI backlog.

Broadcom has confirmed six major XPU customers, including Google, which remains the longest-standing partner, with seven generations of co-designed TPU's since 2014. OpenAI signed a multi-year collaboration in October 2025 for 10 gigawatts of custom accelerators, with first deployment targeting the second half of 2026 using both 3nm and 2nm designs. That deal came after OpenAI was widely reported to be behind a separate $10 billion order. However, Broadcom semiconductor president Charlie Kawwas joked on CNBC that OpenAI "has not given me that PO yet," leaving the identity of the mystery customer officially unconfirmed.

The tech behind this growth is Broadcom's 3.5D XDSiP platform, which uses face-to-face 3D stacking via TSMC's SoIC process combined with 2.5D CoWoS integration. The platform enables packages exceeding 6,000 mm squared of silicon with up to 12 HBM stacks, far beyond the roughly 2,500 mm squared limit of conventional 2.5D designs. In February, Broadcom announced it had begun shipping the industry's first 2nm compute SoC built on this platform, integrating four N2 compute dies, one I/O die, and six HBM modules.

Google's TPU program is the most mature custom AI silicon effort among the hyperscalers, and its latest generation represents a significant architectural leap. The TPU v7, codenamed Ironwood, was announced at Cloud Next in April 2025 and entered preview in November. Each chip delivers 4,614 FP8 TFLOPS with 192 GB of HBM3E memory at 7.37 TB/s bandwidth. It’s manufactured on TSMC's N3P process in a dual-chiplet design co-developed with Broadcom and MediaTek, and features two TensorCores with doubled 256x256 MXU arrays plus four SparseCores.

The 9,216-chip superpod configuration delivers 42.5 FP8 exaflops with 1.77 PB of aggregate HBM. Per-chip, Ironwood's 4,614 TFLOPS sits close to Blackwell's approximately 5,000 FP8 TFLOPS, but SemiAnalysis estimates that TPUs achieve higher sustained model FLOP utilization of roughly 90% for transformers versus 70% to 80% for GPUs, narrowing or erasing the real-world performance gap. Google claims that the total cost of ownership (TCO) per Ironwood chip is roughly 44% lower than a GB200 server from its own procurement perspective.

Google is now selling TPU access aggressively beyond its own services. Anthropic committed to up to one million TPUs in the largest deal in Google Cloud history back in October, while Meta entered talks for multi-billion-dollar TPU deployments in February this year. The current-generation TPU v6e Trillium remains widely available on Google Cloud at $2.70 per chip-hour on demand, delivering roughly four-times better price-performance than H100 instances for LLM workloads, according to Google's own benchmarks. Google's Axion ARM CPU, based on Neoverse V2 and reportedly manufactured on TSMC 3nm according to TrendForce, complements TPUs for general-purpose cloud workloads.

AWS has matched Google's pace with an aggressive custom silicon roadmap developed by Annapurna Labs, the Israeli chip design house acquired by Amazon in 2015. Trainium3, which went generally available at re:Invent in December, is AWS's first 3nm chip. Each Trainium3 delivers 2.517 PFLOPS FP8 with 144GB HBM3E at 4.9 TB/s bandwidth, roughly double the compute and 1.5 times the memory of its predecessor. The new Trn3 UltraServer packs 144 chips delivering 362 FP8 petaflops with 20.7 TB of memory, a 4.4 times improvement over Trn2 UltraServers.

AWS CEO Matt Garman said at re:Invent 2025 that the company had "already deployed more than 1 million Trainium processors" and was selling them as fast as production allowed. CEO Andy Jassy called it "already a multibillion-dollar business." The Project Rainier facility in Indiana, an $11 billion, 2.2 GW campus, had roughly 500,000 Trainium2 chips running for Anthropic by October 2025, and AWS also confirmed an OpenAI deal to supply 2 GW of Trainium computing capacity.

Trainium4 was announced in December 2025 for late 2026 or early 2027 availability, promising three times FP8 performance, six times FP4 throughput, and four times memory bandwidth over Trainium3, with an estimated 288 GB of memory. One notable feature is support for Nvidia NVLink Fusion, enabling hybrid clusters that mix Trainium and Nvidia GPUs. AWS's Graviton5 ARM CPU (192 cores, TSMC 3nm, Neoverse V3) was also announced at re:Invent 2025.

The MTIA 400 delivers 6 PFLOPS FP8 and 18 PFLOPS MX4 with 288GB HBM at 9.2 Tbps bandwidth in a 1,200W envelope. The MTIA 500, scheduled for 2027 mass deployment, scales to 10 PFLOPS FP8 and 30 PFLOPS MX4 with up to 512GB HBM at 27.6 Tbps in a 2x2 chiplet configuration, consuming 1,700W. From the MTIA 300 to the 500, HBM bandwidth increases 4.5 times and compute scales 25 times, with a new chip roughly every six months.

Microsoft's custom silicon program took a significant step forward in January with the deployment of Maia 200, manufactured on TSMC 3nm with over 140 billion transistors. The chip delivers more than 10 PFLOPS FP4 and 5 PFLOPS FP8 with 216GB HBM3E at 7 TB/s bandwidth in a 750W envelope. Microsoft claims it offers 30% better performance per dollar than the best hardware in its existing fleet and calls it "the most performant first-party silicon from any hyperscaler." Maia 200 currently serves GPT-5.2 models for OpenAI and powers Microsoft 365 Copilot workloads from its Des Moines data center.

The path to Maia 200 was far from smooth, though. The original Maia 100, built on TSMC 5nm, was reportedly designed more for image processing than generative AI and never powered production AI services at scale. Maia 200 was delayed roughly six months due to design changes requested by OpenAI that caused simulation instability, plus chip team turnover. CEO Satya Nadella has emphasized that Microsoft will continue purchasing Nvidia and AMD chips alongside Maia. Microsoft's Cobalt 200 Arm CPU (TSMC 3nm, 132 Neoverse V3 cores) was announced at Ignite 2025 and is now live in Azure data centers.

Tesla’s Dojo project, meanwhile, met a very different fate. Despite years of development and an innovative D1 chip (TSMC 7nm, 50 billion transistors, 362 TFLOPS BF16, with a unique 354-core mesh architecture), Tesla disbanded the Dojo team in August. Lead architect Peter Bannon departed, and roughly 20 engineers left to found DensityAI. Elon Musk explained that "once it became clear that all paths converged to AI6, I had to shut down Dojo." Tesla is now focusing on AI5 and AI6 inference chips, with AI6 backed by a $16.5 billion Samsung fabrication deal, while relying on Nvidia hardware for current training needs.

Among other contenders, Intel's Gaudi 3 has struggled with software maturity and missed targets. Shipment goals were cut by more than 30% in 2024, and the Habana Labs brand is being absorbed into Intel's broader accelerator efforts under CEO Lip-Bu Tan. In China, Huawei's Ascend 910C (SMIC 7nm, roughly 800 TFLOPS FP16, 128GB HBM) targets 600,000 units in 2026 but faces yield challenges at around 20%. Cambricon, meanwhile, plans to triple output to 500,000 chips.

Its CoWoS advanced packaging capacity is scaling from roughly 65,000-75,000 wafers per month in 2025 to a target of 120,000-130,000 wafers per month in 2026, and capital expenditure of up to $56 billion is planned for the year. The 2nm node entered mass production at the back-end of last year, with capacity fully booked and targeting over 60,000 WPM by the end of the year. Nvidia has secured roughly 60% of CoWoS allocation (c. 595,000 wafers), Broadcom about 15% (c. 150,000 wafers), and AMD approximately 11% (c. 105,000 wafers). Every custom ASIC in this article depends on CoWoS or its successor CoWoS-L for HBM integration, and TSMC's packaging capacity is now a more binding constraint than wafer fabrication itself.

With custom silicon’s up to 65% TCO advantage over conventional GPUs for inference at production scale, it’s easy to see why so many hyperscalers are pursuing custom ASICs. Broadcom and Marvell together control roughly 95% of the ASIC co-design market, so the question is no longer whether custom silicon will take share from Nvidia, but how quickly it erodes Nvidia's pricing power as these programs reach full production scale.

Luke James is a freelance writer and journalist. Although his background is in legal, he has a personal interest in all things tech, especially hardware and microelectronics, and anything regulatory.

The custom AI ASIC state of play (May 2026) — Broadcom deals, Google TPUs, Meta MTIA & beyond

Broadcom

Google TPU

Amazon Trainium

Meta MTIA

Microsoft, Tesla, and other efforts

TSMC enables it all