China's Domestic AI Chip Advancement: A Technical Assessment

China's Domestic AI Chip Advancement: A Technical Assessment

The context of stringent US export controls on AI chips and semiconductor technology has served as a significant catalyst for domestic development within China. While NVIDIA maintains global dominance in AI GPUs, Chinese entities, notably Huawei, are actively cultivating a parallel ecosystem specifically tailored for the Chinese market. Huawei's recent release of a new AI GPU and the accompanying CloudMatrix system represents a notable stride in this effort. Positioned as a potential Chinese counterpart to NVIDIA, these developments warrant close examination.

Huawei's Ascend 910C GPU At the core of Huawei's recent advancements is the Ascend 910C GPU. This processor is positioned as Huawei's direct response to NVIDIA's advanced Blackwell 200 (GB200) GPU. Reflecting a key industry trend driven by the exponential growth in AI model size and data processing demands, Huawei has adopted a strategy similar to NVIDIA's by shifting towards larger GPUs.

The Ascend 910C utilizes a double-die design, echoing NVIDIA's approach with its Blackwell architecture. This design incorporates two GPU dies interconnected by a bridge, with each die surrounded by four memory modules. This configuration effectively doubles the compute and memory capacity per GPU compared to a single-die approach.

According to official specifications, the new Huawei GPU delivers 800 teraFLOPS (tFLOPS) of compute performance at 16-bit precision. For context within the Chinese market where export controls limit access to cutting-edge foreign technology, this performance is four times more powerful than NVIDIA's H20 chip, which is currently the most advanced chip NVIDIA is permitted to sell in China. However, when compared to its intended competitor, the NVIDIA GB200, Huawei's new GPU is reportedly still three times less powerful. The GB200 also features state-of-the-art memory, offering higher memory bandwidth and being at least twice as efficient in terms of performance per watt.

Regarding manufacturing, the new Huawei GPU is reportedly manufactured by TSMC at the 7 nm node. While confirming data is challenging, reports suggest that the 910C GPU dies are manufactured externally and then brought to China. This highlights that manufacturing remains one of the most significant challenges for China, indicating a continued heavy reliance on the US and Europe for critical technologies and tools. Despite struggles with achieving high yields in manufacturing domestically, the sources suggest that with time, funding, and perseverance, this challenge may be overcome.

The CloudMatrix 384 System Huawei has leveraged the Ascend 910C GPU to build its new AI CloudMatrix 384 data center solution. This system is composed of 384 910C GPUs and is explicitly positioned as China's domestically developed alternative to NVIDIA's NVL72 system.

While the individual Ascend 910C GPU may lag behind NVIDIA's most advanced silicon in raw performance, the CloudMatrix 384 system-level architecture appears to be Huawei's strategic workaround. The CloudMatrix, with its 384 GPUs, incorporates five times more GPUs than NVIDIA's NVL72, which uses 72 GPUs. By deploying a significantly larger number of processors, Huawei has managed to almost double the performance of NVIDIA's systemat the cluster level. The sources suggest there may be further room for scaling this system.

Architectural Divergence: Optical vs. Electrical Interconnects A critical distinction and a key factor behind the CloudMatrix's system performance lies in its interconnect architecture. Both NVIDIA's NVL72 and Huawei's CloudMatrix employ a flat, all-to-all architecture, meaning every GPU can theoretically communicate with every other GPU. However, the key difference is that Huawei relies heavily on optical links for interconnections, not just between racks but also directly between GPUs. In contrast, NVIDIA's NVL72 primarily uses electrical links (copper cables). The NVL72 utilizes 1,500 copper cables and 36 NVLink switches to connect its 72 Blackwell GPUs and switches within the rack.

Huawei's decision to go "fully optical" involves connecting each of the 384 GPUs optically to the network using multiple optical transceivers. This design choice offers a huge bandwidth, enabling the simultaneous transfer of large amounts of data. However, this approach comes with significant drawbacks. Using thousands of optical transceivers explodes the power consumption and increases the complexity of the system. Optical transceivers drain considerable power and make the system challenging to maintain and prone to failures. The sources note that NVIDIA's electrical cabling approach is significantly simpler, six times cheaper, and much more power-efficient than using optics.

This fundamental architectural difference is reflected in the system-level power consumption. The CloudMatrix 384 consumes approximately 600 kW, whereas NVIDIA's NVL72 consumes around 145 kW. This means Huawei's solution consumes roughly four times more power. While the less efficient silicon contributes to this, the primary driver of the power difference is the system-level architecture, particularly the extensive use of power-hungry optical transceivers. This is why NVIDIA is reportedly moving towards more integrated solutions like silicon photonics to save power.

Power Consumption and Infrastructure Context Despite the significantly higher power consumption of the CloudMatrix, the sources suggest that Chinese customers may be less concerned about power limitations compared to those in the US and Europe. This is attributed to the cheaper cost of energy in China. China has also heavily invested in extending its power grid, incorporating significant renewable sources like solar, hydro, wind, and nuclear, although coal and oil still account for a large portion (50%) of their energy split. This suggests that access to abundant, cheap energy is a strategic advantage China may leverage in the build-out of power-intensive AI data centers.

The long-term cost of intelligence is seen as converging towards the cost of energy, making energy infrastructure (renewables, storage, grid resilience) a critical sector for investment alongside chip technology itself. Water supply is also highlighted as a critical resource for data centers, with a typical 100-megawatt data center consuming roughly 2 million liters per day, much of which is lost to evaporation.

Software and System-Level Innovation The sources emphasize that success in modern AI computing is increasingly about building comprehensive infrastructure and systems, not just the most powerful individual chip. NVIDIA's CEO is cited stating that NVIDIA is now an infrastructure company. Huawei appears to have internalized this, finding workarounds for their silicon limitations at the system level, networking, and software stack. China is generally considered strong in software development.

The CloudMatrix runs on Huawei's proprietary CANN software stack. This stack is described as similar to NVIDIA's CUDA but built for Huawei's GPUs and optimized for their Neural Processing Units (NPUs), which are specialized silicon components designed to accelerate AI tasks like matrix multiplication. The CANN stack manages compilers, graph optimization, and workload distribution across the hardware. It plays a crucial role in the CloudMatrix, particularly given the system's complexity and potential for failures, ensuring smooth operation. This highlights that China is investing heavily in developing the full vertical stack necessary for AI dominance.

Challenges and Future Directions While the CloudMatrix represents a significant step, it is acknowledged that it is not a direct one-to-one replacement for NVIDIA's state-of-the-art system. However, it is deemed good enough to replace NVIDIA's H20 GPU and capable of beating NVIDIA's system performance at the cluster level, albeit with lower power efficiency.

Manufacturing remains a key hurdle. Future Huawei GPUs, such as the Ascend 910D and 920, are reported to be in production, suggesting continued development.

Beyond traditional data center design, China is also exploring innovative solutions to address resource constraints like water and energy consumption. Experiments like the underwater AI data center off the coast of Sanya, Hainan, demonstrate efforts to reduce power consumption and cooling costs through direct water cooling. However, such approaches introduce significant challenges, including difficult maintenance requiring retrieval from the sea, leading to potentially huge downtimes, and raising concerns about negative environmental effects on marine ecosystems.

Conclusion Huawei's development of the Ascend 910C GPU and the CloudMatrix 384 system signifies a considerable advancement in China's domestic AI chip capabilities. While still potentially lagging behind NVIDIA in raw silicon performance and power efficiency at the individual component level, China is demonstrating strength in system-level integration, networking (particularly the aggressive adoption of optical interconnects), and proprietary software stacks. This approach, combined with potentially lower constraints on power consumption within China's domestic market and significant investment in energy infrastructure, allows them to achieve competitive performance at the cluster level.

The US export controls appear to be effectively stimulating these domestic developments and the creation of a parallel AI ecosystem in China. While manufacturing dependencies on foreign technology persist, the long-term trajectory indicates a determined effort to build a self-sufficient AI computing infrastructure. The strategic focus extends beyond silicon to encompass energy, water, and innovative data center designs, reflecting a holistic approach to supporting future AI growth. These developments underscore the complex and multi-faceted nature of the global AI race.