- Memory bandwidth — not FLOPs — is the binding constraint on frontier model training and inference economics through at least 2027.
- High-Bandwidth Memory (HBM) supply will be undersupplied roughly 3× through 2026, with SK Hynix, Micron, and Samsung running near 100% utilization on every advanced node.
- Six distinct solution paths are in flight — from HBM4 and advanced packaging to CXL pooling, custom silicon, algorithmic compression, and silicon photonics. Two are shipping; four aren't ready.
- The winners and losers won't be obvious from compute headlines. Packaging capacity, KV-cache efficiency, and interconnect topology now matter more than peak TFLOPs.
The bottleneck, in one paragraph.
Every GPU shipped in 2026 will spend more cycles waiting for data than computing on it. That is the entire story — everything else is consequence.
For two years the public conversation about AI infrastructure has fixated on compute — TFLOPs, GPU counts, training runs measured in megawatts. That framing is increasingly wrong. The frontier-model bottleneck is now memory bandwidth: how quickly model weights and activations can be moved from memory to the compute cores that need them. By every internal benchmark from the major labs, modern transformer inference is memory-bound 70–85% of the time, and training is bandwidth-limited in roughly the same proportion at the scales now common.
The math is unforgiving. A 70B-parameter model in FP16 occupies ~140 GB. To generate a single token, the entire weight set must be read from memory and processed. An H200 with 4.8 TB/s of HBM3E bandwidth can theoretically push ~34 reads/second of that model — which is your ceiling for tokens-per-second per GPU before any other factor enters. Compute is rarely the wall. Bandwidth is.
We used to design accelerators around peak FLOPs. We now design them around bytes per FLOP — and the answer is always "more, faster, closer."— Sandra Mehta, lead architect, anonymous hyperscaler · April 2026
This reframing matters because the supply chain for bandwidth looks nothing like the supply chain for compute. There is functionally one company on Earth — TSMC — that can do advanced packaging at the volumes needed. There are three companies that can manufacture leading-edge HBM stacks. There is a multi-quarter queue for both. Read the long-form deep-dive →
Why HBM, specifically.
High-Bandwidth Memory isn't "fast DRAM." It's a wholly different manufacturing problem — and the world has roughly three companies capable of solving it.
HBM is what you get when you stack 8 to 16 DRAM dies vertically, drill thousands of through-silicon vias (TSVs) through them, and bond the stack directly to a GPU's silicon interposer. The result is an order-of-magnitude bandwidth uplift versus DDR — H200 HBM3E delivers 4.8 TB/s; the fastest server DDR5 manages ~70 GB/s.
But HBM stacks are catastrophically harder to produce than commodity DRAM. Yields fall with each additional die layer; thermal management between layers is a research-grade problem; TSV pitch is now under 40µm and shrinking; and every stack must be co-packaged with the GPU via TSMC's CoWoS or equivalent. The packaging line, not the memory fab, is the binding constraint today.
Three implications follow from this gap. First, HBM allocation is now a strategic asset: NVIDIA, AMD, and the hyperscalers negotiate multi-year supply agreements that are functionally equivalent to capacity ownership. Second, CoWoS capacity at TSMC is the harder constraint than HBM itself through 2026. Third, alternative paths — see the next section — are no longer fringe research. They are board-level priorities at every relevant company.
HBM supply vs. demand, 2022—2027 (projected)
Industry solutions, six paths in flight.
There is no single answer to the bandwidth wall. We track six in parallel — three hardware, three not — with status updated every Monday at 09:00 UTC.
HBM4 & HBM4E
Next-gen 12- and 16-Hi stacks delivering 1.5–2× per-stack bandwidth and 1.8× capacity. Standard ratified Q4 2025; risk production underway at SK Hynix, qualification at Samsung and Micron.
Advanced packaging
CoWoS-L, SoIC, and Intel's EMIB-T are pushing interposer area and HBM-per-package counts higher. TSMC has tripled CoWoS capacity since 2023 — still oversubscribed through 2027.
CXL memory pooling
CXL 3.0 lets multiple accelerators share a pooled DRAM tier — trading some bandwidth for vastly more capacity. Useful for KV-cache offload in inference; less clearly useful for training.
Custom inference silicon
Groq, Cerebras, SambaNova, plus hyperscaler ASICs (TPU, Trainium, MTIA, Maia) re-architect around bandwidth instead of FLOPs — SRAM-heavy, dataflow, or wafer-scale designs.
Algorithmic compression
FP8, INT4, mixture-of-experts, speculative decoding, KV-cache compression and paged attention reduce bytes-per-token by 4–8×. Free bandwidth via software — most underrated lever.
Silicon photonics
Co-packaged optics promise per-pin bandwidth far beyond electrical limits, with lower energy per bit. Lightmatter and Ayar Labs have first products; meaningful volumes are a 2027+ story.
How the story evolved.
A continuously-updated log of every material development we've tracked since this pillar opened in Jan 2025. New entries every week.
Key players.
Who actually moves the needle on memory supply, packaging, and the alternatives. Market share data refreshed quarterly.
HBM Memory · 94% concentrated
SK Hynix
First to qualify HBM3E with NVIDIA. First to sample HBM4. The single most pivotal company in the entire AI supply chain right now.
Micron
Fastest HBM3E ramp of any vendor; sold out through 2027. The "American HBM" trade is now structural, not speculative.
Samsung
Hardest-hit by qualification delays through 2025. May-2026 NVIDIA HBM3E pass changes the trajectory, but the gap with SK Hynix is real.
TSMC
CoWoS is the actual bottleneck — not HBM. Capacity expanding faster than ever, still booked through 2027 across every advanced node.
NVIDIA
Effectively pre-buys most leading-edge HBM and CoWoS capacity. Sets the demand floor; partly insulated from supply swings.
AMD
MI350/MI400 roadmap leans heavily on per-package HBM density. Allocation more constrained than NVIDIA's; share gain hinges on supply.
Outlook · what to watch.
Six catalysts that will tell you whether the bottleneck is closing, holding, or worsening. Mid-2026 through end of 2027.
HBM4 risk-production yield
If SK Hynix reports >70% functional yield on 12-Hi HBM4, the 2027 supply curve bends sharply. Below 50% and the gap persists into 2028.
Samsung re-entry
Will Samsung's HBM3E (and eventually HBM4) close the gap with SK Hynix, or remain a perpetual #3? Determines whether the duopoly becomes a true triopoly.
CoWoS capacity vs HBM ramp
HBM4 supply may outpace packaging capacity for the first time. If true, packaging — not memory — defines the 2027 GPU production ceiling.
Custom-silicon share
Hyperscaler ASICs — TPU v6, Trainium 3, MTIA, Maia — could absorb 25–35% of inference workloads, blunting HBM demand growth at the margin.
Photonics first volumes
Co-packaged optics in real systems, not demos. Lightmatter and Ayar Labs partner roadmaps suggest meaningful, but still niche, deployment by end-2027.
Algorithmic surprises
The biggest free wins to date have been software-side (FP8, paged attention, MoE). The next 5× could come from anywhere — and ship in a weekend.