The Memory Battle Intensifying Alongside GPUs — How Memory Evolved from Commodity to the Lifeline of AI Infrastructure
![]()
When discussing AI infrastructure, Nvidia tends to take center stage.
Nvidia remains the defining company of the AI era, with FY2026 revenue of $215.9 billion — up 65% year-over-year — and a GAAP gross margin of 71.1%. It is, by any measure, in a class of its own.
But when you look at what is actually happening on the ground where AI infrastructure operates, the components that support the GPU are also growing in importance.
What is truly becoming a bottleneck, and at the same time being re-evaluated, is the entire memory hierarchy — including HBM (High Bandwidth Memory, a type of memory directly attached to the GPU for high-speed data transfer), DRAM, and NAND.
I see a fairly bright outlook for the memory industry. There are six reasons.
First, the growth rates and profitability of major memory companies have already become something quite different from the “low-margin, commodity memory business” of the past.
Second, the shift to long-term contracts is dulling the cyclicality that has historically defined the memory industry.
Third, reading the market reaction to Google’s KV cache compression technique “TurboQuant,” or CXL, as memory demand destruction is technically quite sloppy.
Fourth, not just HBM but also DRAM and NAND are seeing their supply-demand dynamics linked as part of AI infrastructure.
Fifth, actual pricing data and contracting practices have already entered a phase of “securing supply over chasing price.”
Sixth, new supply capacity is slow to come online, and the strategic value of memory remains high over the medium term.
I will walk through each of these below.
1. Major Memory Companies Are No Longer the “Memory Companies of the Past”
Let me start with the financial performance of the major memory companies.
SK hynix recorded full-year 2025 revenue of 97.1467 trillion won, operating profit of 47.2063 trillion won, and net income of 42.9479 trillion won, reaching an operating margin of 49% and a net margin of 44% (The Korea Times).
Micron also saw FY2025 revenue grow about 50% year-over-year to $37.4 billion, and in fiscal Q2 2026 posted revenue of $23.86 billion, GAAP EPS of $12.07, and non-GAAP EPS of $12.20 (Micron earnings report).
Samsung’s preliminary Q1 2026 results showed revenue of 133 trillion won and operating profit of 57.2 trillion won — an eight-fold increase in operating profit year-over-year (Reuters).
At least in terms of profit structure, this shows that the memory industry has already begun to leave behind its old identity as a “low-margin sector.”
The Information laid out projected 2026 revenue growth rates: 159% for SK hynix, 191% for Micron, and 71% for Nvidia.
In other words, in terms of single-year revenue growth, memory companies have already entered a phase where they surpass Nvidia. The same article also noted SK hynix’s gross margin of 60% and operating margin of 49%, suggesting that on some profitability metrics, memory companies are already approaching Nvidia-level territory.
I think it is important not to dismiss this as a temporary trend.
In the past, memory companies were typical cyclical stocks, heavily dependent on PC and smartphone sales cycles.
When demand weakened, prices collapsed. When supply increased, profits shrank quickly. As a result, investors found it difficult to assign high valuations to memory companies, viewing them as inherently risky.
But now, memory is becoming a core asset that determines the performance and supply of entire AI clusters.
The center of demand has shifted from PCs and smartphones to big tech companies expanding AI infrastructure on multi-year timelines. Moreover, HBM and high-performance DRAM are no longer simple capacity products. They have become high-value-added components defined by bandwidth, power efficiency, advanced packaging, and yield.
In short, viewing today’s memory companies through the same lens of “cyclical commodity business” risks overlooking a real structural change.
2. Long-Term Contracts Are Starting to Change the Cyclicality of the Memory Industry
As I mentioned earlier, I believe the current strength in the memory market is not just a temporary upturn.
The reason memory stocks have historically been valued cheaply was straightforward. They depended on PC and smartphone demand — when demand weakened, prices collapsed, and when supply grew, profits fell. Memory was a textbook cyclical sector.
But now that premise is breaking down. According to Reuters, Samsung has signaled a shift from the traditional one-year contract cycle to multi-year contracts of 3 to 5 years with major customers.
Micron also announced its first-ever 5-year SCA (Strategic Customer Agreement) in March 2026. TrendForce has also reported that Samsung and SK hynix are redesigning their big tech contracts from annual deals to 3-to-5-year LTAs. Micron’s 5-year SCA can be confirmed in the company’s earnings release.
The key point here is that memory is no longer “a component you buy from whoever is cheapest at the moment.” It is becoming a strategic asset where you need to secure your seat years in advance.
And behind this shift is something bigger than spot market dynamics.
AI chip roadmaps, data center construction plans, and cloud revenue projections themselves are now tied to memory supply commitments.
Today’s memory is not merely a supporting component swayed by economic cycles — it is becoming the supply bottleneck of AI infrastructure itself.
3. Google’s “TurboQuant” Is Closer to Memory Hierarchy Optimization Than Demand Destruction
Here, I want to address TurboQuant, which was recently raised as a concern for the memory industry.
In March 2026, Google Research introduced TurboQuant as a technique that can compress LLM KV cache by at least roughly 6x and speed up attention on H100 by up to 8x.
The market reacted by thinking, “If AI needs only one-sixth the memory, that must be bad news for memory companies.” Share prices of major memory stocks — Samsung Electronics, SK hynix, Micron — dropped sharply.
However, this reaction may have been somewhat hasty.
In economics, there is a well-known phenomenon called Jevons’ paradox: when a resource becomes more efficient to use, total consumption of that resource can actually increase.
If KV cache compression lowers the cost of long-context processing, companies will naturally think, “Let’s use longer contexts,” “Let’s increase concurrent connections,” or “Let’s run multiple agents at once.”
In other words, even if the per-request memory requirement goes down, total usage across a data center could actually go up.
Let me explain in more detail below.
Compression Targets Only a Part of Memory, Not the Whole Model
What TurboQuant primarily compresses is not the model weights themselves, but rather the KV cache that grows during inference.
This means the story is not “all the memory AI needs is cut to one-sixth.”
To use an analogy: this is like organizing the notes and sticky pads spread across your desk to save space — not reducing the bookshelf itself. The value of the bookshelf does not disappear.
Capacity Reduction and Bandwidth Demand Are Separate Issues
Next, it is important to recognize that capacity reduction and bandwidth demand are different things.
Yes, lighter KV cache may reduce some memory capacity requirements. But in AI systems, “how much capacity you have” is only part of the equation — “how fast you can read and write” matters just as much.
For example, even if a warehouse has plenty of floor space, logistics break down if loading and unloading are slow. The value of HBM lies not just in raw capacity, but in this “high-speed throughput.” So even if compression advances, the value of HBM and advanced packaging does not immediately vanish.
NVIDIA’s Rubin generation emphasizes coherent memory systems between CPU and GPU along with high-bandwidth design, showing that AI system value depends on bandwidth and system integration, not just capacity. Therefore, even as KV cache compression progresses, the value of HBM and advanced packaging does not simply disappear (NVIDIA Developer Blog).
Inference Optimization and Training Demand Are Separate Markets
Furthermore, inference optimization and training demand are separate.
TurboQuant primarily improves memory efficiency during inference. But looking at AI servers as a whole, a large portion of HBM demand is also driven by training, post-training, parallelization of large models, and GPU-to-GPU communication.
In other words, just because chat responses become slightly more efficient does not mean memory demand across the entire AI data center shrinks accordingly.
TurboQuant Immediately Becoming an Industry Standard Is a Separate Question
There are already multiple approaches to KV compression, and they are competing on hardware compatibility, kernel support, accuracy degradation, and integration with existing stacks.
It would be premature to conclude that a single paper from Google will immediately undermine overall memory demand.
The HBM Concern Versus DRAM and NAND Being Sold Off Together
Finally, I think it is worth separating out which types of memory are actually vulnerable to this news.
The market’s first association was HBM. HBM processes large volumes of data at high bandwidth close to the GPU — critical for both LLM training and inference. For inference with long contexts in particular, HBM supports the expanding KV cache. Since Google’s TurboQuant was introduced specifically as a technique to compress KV cache significantly, it is understandable that the market worried about reduced HBM demand.
However, DRAM and NAND being sold off across the board is a different matter. What TurboQuant directly compresses is specifically the KV cache during LLM inference. It does not necessarily affect general DRAM demand, let alone NAND and storage demand, to the same degree. In March, the sell-off extended beyond Samsung and SK hynix to Micron, Western Digital, and SanDisk, with the market essentially dumping “memory stocks as a group” (Bloomberg, Taipei Times).
In short, while this news has some relevance to HBM, I do not think it is fair to say DRAM and NAND face the same level of headwind.
4. CXL Expands DRAM Demand Around HBM
I also want to briefly touch on CXL.
CXL is a standard that connects CPUs, GPUs, memory, and accelerators at high speed while maintaining coherence, making it easier to share and expand memory as needed.
In simple terms, it is a technology that enlarges the total pool of memory available to AI — not just the ultra-fast memory near the GPU, but also DRAM and other memory beyond it.
The key point is that CXL is not a technology that makes HBM unnecessary.
On the contrary, it is a technology that keeps HBM at the center while attaching DRAM and other memory around it to increase the total memory available to the AI system.
NVIDIA Rubin also promotes the use of LPDDR5X and coherent memory alongside HBM4, indicating that AI systems are moving toward combining multiple memory tiers rather than relying on a single type (NVIDIA Developer Blog).
What is happening here, I think, is that the role of DRAM and external memory is expanding around HBM, not replacing it.
As AI handles longer contexts, supports more concurrent connections, and runs multiple models and agents simultaneously, the ultra-fast region directly connected to the GPU will not be enough. The DRAM supporting the outer layer, and storage beyond that, will also grow in importance — the entire memory hierarchy becomes more critical.
Seen this way, the CXL trend points not to shrinking memory demand but to expanding total demand across the hierarchy, with HBM at the top and DRAM and peripheral memory underneath.
For the topic at hand, this is not a story of “some memory becoming unnecessary” — it is a story of “the entire memory hierarchy becoming even more critical in the AI era.”
5. In the Real Market, the Supply Race Has Spread Beyond HBM to DRAM and NAND
Demand is rising not only for DRAM but also for NAND.
TrendForce reported that conventional DRAM contract prices rose 90–95% quarter-over-quarter in Q1 2026, with a further 58–63% increase expected in Q2. NAND Flash contract prices are also projected to rise 70–75% in Q2.
Moreover, DRAM makers are shifting capacity allocation toward higher-margin server and HBM products, and the ripple effects are spreading not only to PC and consumer markets but also to enterprise storage.
In short, the supply tightness is not limited to HBM alone — it has already spread across the entire memory hierarchy (Reuters).
Samsung’s projected Q1 2026 operating profit of 57.2 trillion won is a result of this structure. Reuters attributed this to rising DRAM prices and supply tightness driven by AI data center demand.
For SK hynix as well, following Samsung’s strong outlook, shares jumped 15% on April 8, and analysts raised their full-year 2026 operating profit forecast to 216 trillion won (Reuters).
This is not a world where “only HBM is strong and everything else is normal.” The AI wave is flowing through the entire memory stack — DRAM and NAND included.
6. Supply Capacity Is Slow to Ramp, and Strategic Value Will Persist Through 2027–2028
Finally, supply. For investors, this may be the most important point.
Reuters reported that the AI boom is creating a new memory supply crisis, and that SK hynix expects HBM shortages to persist at least through the second half of 2027.
On top of this, the shift to long-term contracts compounds the situation. In other words, today’s memory industry has entered a cycle where
demand is strong from AI, contracts are getting longer, and new supply is slow to come
— a cycle that can sustain profits over a much longer horizon than the “short-cycle industry” of the past.
This is because it is no longer just price that supports margins — the time gap between contract commitment and supply availability is now doing so as well.
Conclusion
My bullish view on the memory industry is not simply because it is temporarily benefiting from the AI boom.
The growth rates and margins of major memory companies have already far exceeded the levels of the old memory business. The shift to long-term contracts is driving structural changes that cannot be captured by traditional short market cycles. The TurboQuant shock and the CXL trend are better understood as memory hierarchy optimization and expansion of total usage, rather than demand destruction. And looking at actual supply and demand, tightness extends beyond HBM to DRAM and NAND, with supply expansion still taking time.
In short, today’s memory industry can no longer be captured by the old framing of “a component business swayed by economic cycles.” As a core layer supporting AI infrastructure, it has already begun entering a different phase.
That is why I believe one of the most promising candidates to follow Nvidia and other AI semiconductors is the memory industry.
This layer next to the GPU is not just a supporting component.
What gets re-evaluated in the next era will not be the GPU alone — it will be the surrounding layers that support the GPU’s performance.
The memory industry is a leading candidate at the center of that story.
Join the conversation on LinkedIn — share your thoughts and comments.
Discuss on LinkedIn