bitdepth.co
LIVE
HBM3E demand +47% q/qNVDA inventory lead time −9dSK Hynix HBM4 risk prod 2026Q3TSMC CoWoS capex +$3.8B
Pillars/AI/The AI memory bottleneck.
Active pillar · Updated weekly · Edition 14

The AI memory bottleneck.

The constraint that's quietly reshaping the trillion-dollar AI compute build-out — explained, tracked, and updated as it evolves.

First publishedJan 22, 2025
Last revisedMay 12, 2026 · 02:14 UTC
Reading time22 min · 9,400 words
Tracked sources12
  • Memory bandwidth — not FLOPs — is the binding constraint on frontier model training and inference economics through at least 2027.
  • High-Bandwidth Memory (HBM) supply will be undersupplied roughly 3× through 2026, with SK Hynix, Micron, and Samsung running near 100% utilization on every advanced node.
  • Six distinct solution paths are in flight — from HBM4 and advanced packaging to CXL pooling, custom silicon, algorithmic compression, and silicon photonics. Two are shipping; four aren't ready.
  • The winners and losers won't be obvious from compute headlines. Packaging capacity, KV-cache efficiency, and interconnect topology now matter more than peak TFLOPs.
§ 01 · Problem framing

The bottleneck, in one paragraph.

Every GPU shipped in 2026 will spend more cycles waiting for data than computing on it. That is the entire story — everything else is consequence.

For two years the public conversation about AI infrastructure has fixated on compute — TFLOPs, GPU counts, training runs measured in megawatts. That framing is increasingly wrong. The frontier-model bottleneck is now memory bandwidth: how quickly model weights and activations can be moved from memory to the compute cores that need them. By every internal benchmark from the major labs, modern transformer inference is memory-bound 70–85% of the time, and training is bandwidth-limited in roughly the same proportion at the scales now common.

The math is unforgiving. A 70B-parameter model in FP16 occupies ~140 GB. To generate a single token, the entire weight set must be read from memory and processed. An H200 with 4.8 TB/s of HBM3E bandwidth can theoretically push ~34 reads/second of that model — which is your ceiling for tokens-per-second per GPU before any other factor enters. Compute is rarely the wall. Bandwidth is.

We used to design accelerators around peak FLOPs. We now design them around bytes per FLOP — and the answer is always "more, faster, closer."— Sandra Mehta, lead architect, anonymous hyperscaler · April 2026

This reframing matters because the supply chain for bandwidth looks nothing like the supply chain for compute. There is functionally one company on Earth — TSMC — that can do advanced packaging at the volumes needed. There are three companies that can manufacture leading-edge HBM stacks. There is a multi-quarter queue for both. Read the long-form deep-dive →

§ 02 · Constraint mechanism

Why HBM, specifically.

High-Bandwidth Memory isn't "fast DRAM." It's a wholly different manufacturing problem — and the world has roughly three companies capable of solving it.

HBM is what you get when you stack 8 to 16 DRAM dies vertically, drill thousands of through-silicon vias (TSVs) through them, and bond the stack directly to a GPU's silicon interposer. The result is an order-of-magnitude bandwidth uplift versus DDR — H200 HBM3E delivers 4.8 TB/s; the fastest server DDR5 manages ~70 GB/s.

But HBM stacks are catastrophically harder to produce than commodity DRAM. Yields fall with each additional die layer; thermal management between layers is a research-grade problem; TSV pitch is now under 40µm and shrinking; and every stack must be co-packaged with the GPU via TSMC's CoWoS or equivalent. The packaging line, not the memory fab, is the binding constraint today.

Three implications follow from this gap. First, HBM allocation is now a strategic asset: NVIDIA, AMD, and the hyperscalers negotiate multi-year supply agreements that are functionally equivalent to capacity ownership. Second, CoWoS capacity at TSMC is the harder constraint than HBM itself through 2026. Third, alternative paths — see the next section — are no longer fringe research. They are board-level priorities at every relevant company.

HBM supply vs. demand, 2022—2027 (projected)

Gb-equivalents per quarter · sources: SK Hynix, Micron, TrendForce, internal modeling
Supply
Demand
Gap
100755025202220232024202520262027PROJPROJTODAYTODAY
Read: Supply (black) tracked demand closely until late 2023, when LLM training capex broke the curve. The gap (shaded) peaks at ~3.2× in mid-2026 and only closes meaningfully when SK Hynix and Micron's HBM4 fabs hit full ramp. The implication: bandwidth — not GPUs — sets the pace of frontier AI through next year.
§ 03 · Tracker · Updated weekly

Industry solutions, six paths in flight.

There is no single answer to the bandwidth wall. We track six in parallel — three hardware, three not — with status updated every Monday at 09:00 UTC.

SOL — 01

HBM4 & HBM4E

Ramping

Next-gen 12- and 16-Hi stacks delivering 1.5–2× per-stack bandwidth and 1.8× capacity. Standard ratified Q4 2025; risk production underway at SK Hynix, qualification at Samsung and Micron.

SOL — 02

Advanced packaging

Shipping

CoWoS-L, SoIC, and Intel's EMIB-T are pushing interposer area and HBM-per-package counts higher. TSMC has tripled CoWoS capacity since 2023 — still oversubscribed through 2027.

SOL — 03

CXL memory pooling

In development

CXL 3.0 lets multiple accelerators share a pooled DRAM tier — trading some bandwidth for vastly more capacity. Useful for KV-cache offload in inference; less clearly useful for training.

SOL — 04

Custom inference silicon

Shipping

Groq, Cerebras, SambaNova, plus hyperscaler ASICs (TPU, Trainium, MTIA, Maia) re-architect around bandwidth instead of FLOPs — SRAM-heavy, dataflow, or wafer-scale designs.

SOL — 05

Algorithmic compression

Ramping

FP8, INT4, mixture-of-experts, speculative decoding, KV-cache compression and paged attention reduce bytes-per-token by 4–8×. Free bandwidth via software — most underrated lever.

SOL — 06

Silicon photonics

Research

Co-packaged optics promise per-pin bandwidth far beyond electrical limits, with lower energy per bit. Lightmatter and Ayar Labs have first products; meaningful volumes are a 2027+ story.

§ 04 · Timeline · Live

How the story evolved.

A continuously-updated log of every material development we've tracked since this pillar opened in Jan 2025. New entries every week.

Week of May 11, 2026NEW· 6 entries
May 12SUPPLYSK Hynix confirms HBM4 12-Hi qualification on schedule for Q3; risk production samples shipping to NVIDIAReuters
May 11DEMANDMicrosoft signs multi-year HBM allocation deal with Micron, terms not disclosedBloomberg
May 09PACKAGINGTSMC commits additional $3.8B for CoWoS expansion in Arizona and KaohsiungDigiTimes
May 08ALTCerebras WSE-4 reaches general availability; benchmarks show 4× tokens/sec on 70B inferenceThe Information
May 07RESEARCHStanford paper demonstrates 6× KV-cache compression with <1% quality regressionarXiv
May 06POLICYCommerce Dept. clarifies HBM3E export rules; HBM4 ambiguity remainsFT
Week of May 04, 2026· 9 entries
May 04SUPPLYSamsung lands HBM3E qualification with NVIDIA after 18-month delay; volume ship targets Q4Korea Economic Daily
May 02DEMANDMeta raises 2026 capex guide by $5B, citing "accelerator and memory availability"10-Q
May 01PACKAGINGIntel Foundry secures first external HBM packaging customer (rumored: AWS)SemiAnalysis
Week of April 27, 2026· 7 entries
Apr 30ALTLightmatter ships co-packaged optics dev kits to "three top-tier hyperscalers"Press release
Apr 28RESEARCHDeepSeek V4 paper: 2× throughput at iso-quality using novel attention shardingarXiv
Week of April 20, 2026· 11 entriesView all weeks →
§ 05 · Map

Key players.

Who actually moves the needle on memory supply, packaging, and the alternatives. Market share data refreshed quarterly.

HBM Memory · 94% concentrated

SK Hynix
54%
Micron
24%
Samsung
16%
Other
6%
Memory · KR

SK Hynix

000660.KS · ▲ +2.1%

First to qualify HBM3E with NVIDIA. First to sample HBM4. The single most pivotal company in the entire AI supply chain right now.

Memory · US

Micron

MU · ▲ +1.4%

Fastest HBM3E ramp of any vendor; sold out through 2027. The "American HBM" trade is now structural, not speculative.

Memory · KR

Samsung

005930.KS · ▼ −0.6%

Hardest-hit by qualification delays through 2025. May-2026 NVIDIA HBM3E pass changes the trajectory, but the gap with SK Hynix is real.

Packaging · TW

TSMC

TSM · ▲ +0.9%

CoWoS is the actual bottleneck — not HBM. Capacity expanding faster than ever, still booked through 2027 across every advanced node.

Accelerator · US

NVIDIA

NVDA · ▲ +1.7%

Effectively pre-buys most leading-edge HBM and CoWoS capacity. Sets the demand floor; partly insulated from supply swings.

Accelerator · US

AMD

AMD · ▲ +0.4%

MI350/MI400 roadmap leans heavily on per-package HBM density. Allocation more constrained than NVIDIA's; share gain hinges on supply.

See all 24 tracked companies →
§ 06 · Forward look

Outlook · what to watch.

Six catalysts that will tell you whether the bottleneck is closing, holding, or worsening. Mid-2026 through end of 2027.

2026 · Q3

HBM4 risk-production yield

If SK Hynix reports >70% functional yield on 12-Hi HBM4, the 2027 supply curve bends sharply. Below 50% and the gap persists into 2028.

Watch: SK Hynix Q3 earnings call · Oct 24, 2026
2026 · Q4

Samsung re-entry

Will Samsung's HBM3E (and eventually HBM4) close the gap with SK Hynix, or remain a perpetual #3? Determines whether the duopoly becomes a true triopoly.

Watch: NVIDIA HBM allocation disclosure, Q1 '27 guide
2027 · H1

CoWoS capacity vs HBM ramp

HBM4 supply may outpace packaging capacity for the first time. If true, packaging — not memory — defines the 2027 GPU production ceiling.

Watch: TSMC capex cadence, Samsung & Intel external packaging customers
2027 · ongoing

Custom-silicon share

Hyperscaler ASICs — TPU v6, Trainium 3, MTIA, Maia — could absorb 25–35% of inference workloads, blunting HBM demand growth at the margin.

Watch: Hyperscaler GPU spend as % of total accel. capex
2027 · H2

Photonics first volumes

Co-packaged optics in real systems, not demos. Lightmatter and Ayar Labs partner roadmaps suggest meaningful, but still niche, deployment by end-2027.

Watch: NVIDIA platform roadmap · GTC 2027
2026—2027

Algorithmic surprises

The biggest free wins to date have been software-side (FP8, paged attention, MoE). The next 5× could come from anywhere — and ship in a weekend.

Watch: arXiv categories cs.AR & cs.PF · weekly digest

We use cookies

We use cookies to ensure you get the best experience on our website. For more information on how we use cookies, please see our cookie policy.