LIVEMon · Aug 3 · 2026, 03:37 UTC

HBM3E demand +47% q/qNVDA inventory lead time −9dSK Hynix HBM4 risk prod 2026Q3TSMC CoWoS capex +$3.8B

Published Apr 15 · 8 min read

The FP8 → INT4 quantization roadmap.

Inference vendors are racing from FP8 to INT4 as the next lever on compute efficiency. We map the roadmap across frameworks, the accuracy tradeoffs that actually matter in production, and which memory vendors benefit most.

PublishedApr 15, 2026

Length2,000 words · 8 min

↳ Part of pillar

This is one entry in The AI memory bottleneck.

Thirty more sub-articles, six tracked solution paths, a weekly-updated timeline, and a live aggregated feed — all on the pillar page.

Open the pillar →

We use cookies

We use cookies to ensure you get the best experience on our website. For more information on how we use cookies, please see our cookie policy.

The FP8 → INT4 quantization roadmap.

This is one entry in The AI memory bottleneck.

More in The AI memory bottleneck

Samsung's HBM3E qualification, finally. The full timeline and what it means for NVIDIA's 2026 allocation strategy.

Why every AI training run is now a packaging negotiation.

Cerebras WSE-4 is generally available. We ran the benchmarks. The numbers are real.

We use cookies