The AI memory bottleneck.

Memory bandwidth — not FLOPs — is the binding constraint on frontier model training and inference economics through at least 2027.
High-Bandwidth Memory (HBM) supply will be undersupplied roughly 3× through 2026, with SK Hynix, Micron, and Samsung running near 100% utilization on every advanced node.
Six distinct solution paths are in flight — from HBM4 and advanced packaging to CXL pooling, custom silicon, algorithmic compression, and silicon photonics. Two are shipping; four aren't ready.
The winners and losers won't be obvious from compute headlines. Packaging capacity, KV-cache efficiency, and interconnect topology now matter more than peak TFLOPs.

§ 01 · Problem framing

The bottleneck, in one paragraph.

Every GPU shipped in 2026 will spend more cycles waiting for data than computing on it. That is the entire story — everything else is consequence.

For two years the public conversation about AI infrastructure has fixated on compute — TFLOPs, GPU counts, training runs measured in megawatts. That framing is increasingly wrong. The frontier-model bottleneck is now memory bandwidth: how quickly model weights and activations can be moved from memory to the compute cores that need them. By every internal benchmark from the major labs, modern transformer inference is memory-bound 70–85% of the time, and training is bandwidth-limited in roughly the same proportion at the scales now common.

The math is unforgiving. A 70B-parameter model in FP16 occupies ~140 GB. To generate a single token, the entire weight set must be read from memory and processed. An H200 with 4.8 TB/s of HBM3E bandwidth can theoretically push ~34 reads/second of that model — which is your ceiling for tokens-per-second per GPU before any other factor enters. Compute is rarely the wall. Bandwidth is.

We used to design accelerators around peak FLOPs. We now design them around bytes per FLOP — and the answer is always "more, faster, closer."— Sandra Mehta, lead architect, anonymous hyperscaler · April 2026

This reframing matters because the supply chain for bandwidth looks nothing like the supply chain for compute. There is functionally one company on Earth — TSMC — that can do advanced packaging at the volumes needed. There are three companies that can manufacture leading-edge HBM stacks. There is a multi-quarter queue for both. Read the long-form deep-dive →

§ 02 · Constraint mechanism

Why HBM, specifically.

High-Bandwidth Memory isn't "fast DRAM." It's a wholly different manufacturing problem — and the world has roughly three companies capable of solving it.

HBM is what you get when you stack 8 to 16 DRAM dies vertically, drill thousands of through-silicon vias (TSVs) through them, and bond the stack directly to a GPU's silicon interposer. The result is an order-of-magnitude bandwidth uplift versus DDR — H200 HBM3E delivers 4.8 TB/s; the fastest server DDR5 manages ~70 GB/s.

But HBM stacks are catastrophically harder to produce than commodity DRAM. Yields fall with each additional die layer; thermal management between layers is a research-grade problem; TSV pitch is now under 40µm and shrinking; and every stack must be co-packaged with the GPU via TSMC's CoWoS or equivalent. The packaging line, not the memory fab, is the binding constraint today.

Three implications follow from this gap. First, HBM allocation is now a strategic asset: NVIDIA, AMD, and the hyperscalers negotiate multi-year supply agreements that are functionally equivalent to capacity ownership. Second, CoWoS capacity at TSMC is the harder constraint than HBM itself through 2026. Third, alternative paths — see the next section — are no longer fringe research. They are board-level priorities at every relevant company.

HBM supply vs. demand, 2022—2027 (projected)

Gb-equivalents per quarter · sources: SK Hynix, Micron, TrendForce, internal modeling

Supply

Demand

Gap

Read: Supply (black) tracked demand closely until late 2023, when LLM training capex broke the curve. The gap (shaded) peaks at ~3.2× in mid-2026 and only closes meaningfully when SK Hynix and Micron's HBM4 fabs hit full ramp. The implication: bandwidth — not GPUs — sets the pace of frontier AI through next year.

§ 03 · Tracker · Updated weekly

Industry solutions, six paths in flight.

There is no single answer to the bandwidth wall. We track six in parallel — three hardware, three not — with status updated every Monday at 09:00 UTC.

SOL — 01

HBM4 & HBM4E

Ramping

Next-gen 12- and 16-Hi stacks delivering 1.5–2× per-stack bandwidth and 1.8× capacity. Standard ratified Q4 2025; risk production underway at SK Hynix, qualification at Samsung and Micron.

SOL — 02

Advanced packaging

Shipping

CoWoS-L, SoIC, and Intel's EMIB-T are pushing interposer area and HBM-per-package counts higher. TSMC has tripled CoWoS capacity since 2023 — still oversubscribed through 2027.

SOL — 03

CXL memory pooling

In development

CXL 3.0 lets multiple accelerators share a pooled DRAM tier — trading some bandwidth for vastly more capacity. Useful for KV-cache offload in inference; less clearly useful for training.

SOL — 04

Custom inference silicon

Shipping

Groq, Cerebras, SambaNova, plus hyperscaler ASICs (TPU, Trainium, MTIA, Maia) re-architect around bandwidth instead of FLOPs — SRAM-heavy, dataflow, or wafer-scale designs.

SOL — 05

Algorithmic compression

Ramping

FP8, INT4, mixture-of-experts, speculative decoding, KV-cache compression and paged attention reduce bytes-per-token by 4–8×. Free bandwidth via software — most underrated lever.

SOL — 06

Silicon photonics

Research

Co-packaged optics promise per-pin bandwidth far beyond electrical limits, with lower energy per bit. Lightmatter and Ayar Labs have first products; meaningful volumes are a 2027+ story.

§ 04 · Timeline · Live

How the story evolved.

A continuously-updated log of every material development we've tracked since this pillar opened in Jan 2025. New entries every week.

Week of May 11, 2026NEW· 6 entries

May 12SUPPLY

SK Hynix confirms HBM4 12-Hi qualification on schedule for Q3; risk production samples shipping to NVIDIA

Reuters

May 11DEMAND

Microsoft signs multi-year HBM allocation deal with Micron, terms not disclosed

Bloomberg

May 09PACKAGING

TSMC commits additional $3.8B for CoWoS expansion in Arizona and Kaohsiung

DigiTimes

May 08ALT

Cerebras WSE-4 reaches general availability; benchmarks show 4× tokens/sec on 70B inference

The Information

May 07RESEARCH

Stanford paper demonstrates 6× KV-cache compression with <1% quality regression

arXiv

May 06POLICY

Commerce Dept. clarifies HBM3E export rules; HBM4 ambiguity remains

Week of May 04, 2026· 9 entries

May 04SUPPLY

Samsung lands HBM3E qualification with NVIDIA after 18-month delay; volume ship targets Q4

Korea Economic Daily

May 02DEMAND

Meta raises 2026 capex guide by $5B, citing "accelerator and memory availability"

10-Q

May 01PACKAGING

Intel Foundry secures first external HBM packaging customer (rumored: AWS)

SemiAnalysis

Week of April 27, 2026· 7 entries

Apr 30ALT

Lightmatter ships co-packaged optics dev kits to "three top-tier hyperscalers"

Press release

Apr 28RESEARCH

DeepSeek V4 paper: 2× throughput at iso-quality using novel attention sharding

arXiv

Week of April 20, 2026· 11 entries

§ 05 · Map

Key players.

Who actually moves the needle on memory supply, packaging, and the alternatives. Market share data refreshed quarterly.

HBM Memory · 94% concentrated

SK Hynix

54%

Micron

24%

Samsung

16%

Other

Memory · KR

SK Hynix

000660.KS · ▲ +2.1%

First to qualify HBM3E with NVIDIA. First to sample HBM4. The single most pivotal company in the entire AI supply chain right now.

Memory · US

Micron

MU · ▲ +1.4%

Fastest HBM3E ramp of any vendor; sold out through 2027. The "American HBM" trade is now structural, not speculative.

Memory · KR

Samsung

005930.KS · ▼ −0.6%

Hardest-hit by qualification delays through 2025. May-2026 NVIDIA HBM3E pass changes the trajectory, but the gap with SK Hynix is real.

Packaging · TW

TSMC

TSM · ▲ +0.9%

CoWoS is the actual bottleneck — not HBM. Capacity expanding faster than ever, still booked through 2027 across every advanced node.

Accelerator · US

NVIDIA

NVDA · ▲ +1.7%

Effectively pre-buys most leading-edge HBM and CoWoS capacity. Sets the demand floor; partly insulated from supply swings.

Accelerator · US

AMD

AMD · ▲ +0.4%

MI350/MI400 roadmap leans heavily on per-package HBM density. Allocation more constrained than NVIDIA's; share gain hinges on supply.

See all 24 tracked companies →

§ 06 · Forward look

Outlook · what to watch.

Six catalysts that will tell you whether the bottleneck is closing, holding, or worsening. Mid-2026 through end of 2027.

2026 · Q3

HBM4 risk-production yield

If SK Hynix reports >70% functional yield on 12-Hi HBM4, the 2027 supply curve bends sharply. Below 50% and the gap persists into 2028.

Watch: SK Hynix Q3 earnings call · Oct 24, 2026

2026 · Q4

Samsung re-entry

Will Samsung's HBM3E (and eventually HBM4) close the gap with SK Hynix, or remain a perpetual #3? Determines whether the duopoly becomes a true triopoly.

Watch: NVIDIA HBM allocation disclosure, Q1 '27 guide

2027 · H1

CoWoS capacity vs HBM ramp

HBM4 supply may outpace packaging capacity for the first time. If true, packaging — not memory — defines the 2027 GPU production ceiling.

Watch: TSMC capex cadence, Samsung & Intel external packaging customers

2027 · ongoing

Custom-silicon share

Hyperscaler ASICs — TPU v6, Trainium 3, MTIA, Maia — could absorb 25–35% of inference workloads, blunting HBM demand growth at the margin.

Watch: Hyperscaler GPU spend as % of total accel. capex

2027 · H2

Photonics first volumes

Co-packaged optics in real systems, not demos. Lightmatter and Ayar Labs partner roadmaps suggest meaningful, but still niche, deployment by end-2027.

Watch: NVIDIA platform roadmap · GTC 2027

2026—2027

Algorithmic surprises

The biggest free wins to date have been software-side (FP8, paged attention, MoE). The next 5× could come from anywhere — and ship in a weekend.

Watch: arXiv categories cs.AR & cs.PF · weekly digest

The AI memory bottleneck.

The bottleneck, in one paragraph.

Why HBM, specifically.

HBM supply vs. demand, 2022—2027 (projected)

Industry solutions, six paths in flight.

HBM4 & HBM4E

Advanced packaging

CXL memory pooling

Custom inference silicon

Algorithmic compression

Silicon photonics

How the story evolved.

Key players.

HBM Memory · 94% concentrated

SK Hynix

Micron

Samsung

TSMC

NVIDIA

AMD

Outlook · what to watch.

HBM4 risk-production yield

Samsung re-entry

CoWoS capacity vs HBM ramp

Custom-silicon share

Photonics first volumes

Algorithmic surprises

What everyone else is saying.

We use cookies

The AI memory bottleneck.

The bottleneck, in one paragraph.

Why HBM, specifically.

HBM supply vs. demand, 2022—2027 (projected)

Industry solutions, six paths in flight.

HBM4 & HBM4E

Advanced packaging

CXL memory pooling

Custom inference silicon

Algorithmic compression

Silicon photonics

How the story evolved.

Key players.

HBM Memory · 94% concentrated

SK Hynix

Micron

Samsung

TSMC

NVIDIA

AMD

Outlook · what to watch.

HBM4 risk-production yield

Samsung re-entry

CoWoS capacity vs HBM ramp

Custom-silicon share

Photonics first volumes

Algorithmic surprises

What everyone else is saying.

One memo per Monday — every pillar, every catalyst.

Related pillars.

The advanced packaging race

Export controls & silicon geopolitics

The power wall

We use cookies