Foundry nodes, custom chip economics and supply bottlenecks in GPUs, CPUs, HBM and advanced packaging

Chip Capacity, Foundry And Shortages

The semiconductor industry is currently navigating a complex landscape shaped by rapid capacity expansion efforts, technological advancements, and persistent supply bottlenecks—especially in areas critical to AI hardware development. Despite the aggressive investments by leading foundries, major chip designers, and regional players, fundamental challenges remain that threaten to slow the pace of AI infrastructure growth.

Capacity and Node Developments at TSMC, Intel, and Custom Chip Houses

Leading foundries such as TSMC and Samsung are pushing forward with the deployment of advanced process nodes to meet the surging demand for AI accelerators and high-performance computing (HPC) chips. TSMC is progressing into 3nm (N3) and N3P nodes, supported by EUV lithography, and is establishing large-scale AI-specific fabs, including their mega-factory in Taiwan. These efforts aim to increase capacity while enhancing transistor performance and power efficiency.

Samsung is also ramping up production at 5nm and 3nm, with a focus on 2.5D and 3D stacking technologies. These stacking innovations are critical for improving thermal management and latency, enabling more powerful and dense AI chips.

On the other hand, Intel is positioning its 18A node as a strategic alternative for external customers, emphasizing its focus on process innovation. While Intel touts its latest nodes as suitable for high-end AI chips, industry watchers remain cautious about whether its manufacturing capacity can keep pace with the explosive demand, especially from giants like Nvidia.

Custom chip houses and domestic manufacturers in regions like China are also investing heavily to develop their own fabrication capabilities, aiming to reduce dependence on offshore foundries. Initiatives such as Huawei’s Atlas 950 exemplify efforts to integrate thousands of chips into large-scale AI solutions domestically, supported by significant investments in self-reliant fabrication technology.

Supply Lockups, GPU/CPU Shortages, and Financial Impacts

Despite these capacity expansions, supply chain constraints continue to hinder the industry:

Memory shortages, particularly in HBM and DRAM, persist due to the complexities of stacking and advanced packaging. Both Samsung and SK Hynix are struggling to meet the exponential growth in demand for high-bandwidth memory essential for AI workloads, leading to delays and increased prices.
HBM capacity has been heavily locked in by suppliers like Broadcom, which has secured long-term TSMC capacity commitments through 2028. This lockup creates supply bottlenecks for GPU and AI accelerator vendors, impacting their ability to scale deployments rapidly.
Interconnect technology bottlenecks are increasingly constraining data movement at scale. Industry leaders like Ayar Labs and Lightmatter are pioneering photonic interconnects capable of Tb/s data rates, aiming to replace traditional copper links that are nearing their physical and economic limits around 2028. These breakthroughs are essential for supporting exascale AI training clusters.

The thermal and power management challenges associated with next-generation AI chips are also intensifying. Chips exceeding 700W are becoming common as AI models grow larger and more complex. Innovative cooling solutions—such as diamond cooling and microchannel heat exchangers—are being developed to dissipate heat efficiently, supporting the trend toward higher power densities.

On the geopolitical front, regional diversification strategies are gaining momentum. The US is investing heavily in Arizona-based fabs to foster strategic independence, while China is pursuing self-reliance initiatives by developing domestic lithography ecosystems and power infrastructure—evidenced by record increases in transformer exports to support data centers.

Financial and Industry-Wide Impacts

These supply constraints have tangible impacts on vendors’ financial performance. Companies like Nvidia, AMD, and Intel face delays and increased costs, which can squeeze margins and slow product rollouts. The lockup of capacity and materials—notably HBM and advanced packaging—limits the ability of vendors to meet the demand surge driven by AI workloads.

Moreover, the industry is witnessing a strategic shift toward photonic interconnects and advanced thermal solutions to address these bottlenecks. As copper interconnects approach their physical limits around 2028, the industry’s transition to high-speed optical interconnects will be crucial for scaling AI infrastructure efficiently.

In summary, while capacity expansions at TSMC, Intel, and regional players are vital steps toward meeting AI hardware demands, fundamental bottlenecks remain. The industry must innovate in memory technology, interconnects, thermal management, and manufacturing capacity to sustain the rapid growth of AI. Geopolitical strategies and technological breakthroughs are intertwined in shaping the future supply landscape, making the next few years critical for overcoming these physical and infrastructural constraints.

Sources (10)