KADC 2026 Series Analysis · Part 3 · Agent Infrastructure / CPU+GPU Convergence
On May 22, 2026, at the Kunpeng Ascend Developer Conference (KADC 2026) in Beijing's Zhongguancun International Innovation Center, Huawei raised a question few people were discussing: What kind of infrastructure do agents actually need?
Not "what compute does AI need?"—that question already has no shortage of answers. The question was more specific: When an LLM evolves from single-turn dialogue into a multi-step, tool-calling, state-managing agent, what has to change in the OS, sandboxing, networking, security, and memory services running underneath it?
Huawei's answer: "Agent Infra"—an infrastructure category they believe is distinct from AI Infra. Meanwhile, in a separate session, openEuler architects proposed something even more ambitious: Agent POSIX—portability primitives purpose-built for Agentic AI.
Is this marketing repackaging, or a genuine problem? Let's work backwards from the agent's actual workload.
Why Agents Need New Infrastructure
Let's get one thing straight: an agent is not a faster chatbot.
A chatbot's execution model is single request-response. A user sends a message, the LLM generates a reply, done. Infrastructure only needs two things: inference acceleration (GPU side) and request routing (API layer).
An agent's execution model is fundamentally different. It is a stateful, multi-step control flow: task decomposition → tool selection → tool execution → observation → state update → next decision. This loop can execute dozens of times, and each iteration may invoke external tools, mutate environment state, or spawn subtasks.
What does this mean for infrastructure? Huawei offered some data points:
- Token consumption grew 6x in six months—not because models got larger, but because multi-turn agent interactions drive exponential token growth.
- Agent invocation frequency surged 50–100x—a single complex task can trigger hundreds of tool calls.
- Control flow complexity is exploding—tokenization, context assembly, memory management, and data copy/sync operations are running continuously.
Even more consequential are the shifts in network topology and security boundaries:
Network topology is moving from star-shaped radiation (human-machine interaction, one user one request) to a full mesh (dense, concurrent communication between agents and between agents and tools). Traditional infrastructure was designed for the former and is not optimized for the latter.
Security boundaries are shifting from static API calls to dynamic call chains—an agent at runtime dynamically selects and invokes untrusted external tools, and each call has different permission requirements, data flow patterns, and isolation needs. This is not a scenario that WAFs and API gateways were built to handle.
The direct consequence: high latency, throughput ceilings, and excess energy consumption. Traditional IT infrastructure designed for web and microservices is systematically mismatched against the characteristics of agent workloads.
This is not Huawei's observation alone. In the DeltaBox paper published by Shanghai Jiao Tong University and Huawei in 2025, researchers systematically quantified the agent sandbox workload: in SWE-bench MCTS (Monte Carlo Tree Search) agent scenarios, state management overhead accounted for 47%–77% of total trajectory time. In other words, agents were spending most of their time waiting for sandbox checkpoints and rollbacks, rather than doing useful inference.
That number is the anchor for the entire Agent Infra thesis.
The CPU's Role Transition in the Agent Era
For the past three years, the dominant theme in infrastructure has been "GPU is king." Training needs GPUs, inference needs GPUs, everything revolves around GPUs. CPUs are supporting players—handling I/O scheduling, data movement, and network protocol processing.
Agents change this equation. Not because GPUs matter less, but because agent control flows inherently run on CPUs, and the complexity of those control flows is exploding.
What Is Agent Control Flow
When an agent executes a task like "fix this bug," it roughly follows these steps:
- Task decomposition: Breaking "fix the bug" into subtasks (read code → locate issue → generate patch → run tests → analyze failures → modify patch → retry)
- Tool selection: At each step, deciding which tool to use (file reader, code search, terminal execution, LSP call)
- Context management: Maintaining conversation history, tool execution results, and intermediate state, all within a token budget—trimming and reassembling as needed
- State machine management: Tracking task progress, managing subtask dependencies, handling timeouts and retries
Each step is CPU-intensive. Tokenization is fundamentally string processing. Context assembly is memory copy and sorting. Memory management involves graph database queries and vector retrieval. Tool invocation involves API orchestration and JSON serialization/deserialization.
Why These Operations Concentrate on the CPU
The tools an agent invokes—database queries, file operations, API requests, code execution—naturally run on CPUs, networks, and storage. The GPU's role in this process is inference calls (the LLM generating the next decision), but everything between two inference calls executes on the CPU.
As agent task complexity grows, the intervals between inference calls get longer, and cumulative CPU overhead becomes the dominant component of end-to-end latency.
Huawei's Liu Linchao, Director of the Kunpeng Computing Product Department, made a notable assertion at KADC 2026: "The shift from 'intelligent compute as the center' to 'general compute and intelligent compute in coordination' is the most significant architectural paradigm change."
This is not Huawei marketing language. It describes a fact: as agent workloads consume a larger share of compute resources, the relationship between CPU and GPU shifts from "CPU serves GPU" to "CPU and GPU execute in concert."
A Quantitative Illustration
Consider an agent executing a complete bug fix workflow:
- 10 LLM inference calls, averaging 2 seconds each (GPU side) = 20 seconds
- Between each inference call: tokenization (50ms) + context assembly (100ms) + tool execution (500ms) + state management (200ms) = 850ms × 10 = 8.5 seconds (CPU side)
GPU share ≈ 70%, CPU share ≈ 30%. GPU still looks dominant.
But if the agent uses MCTS for multi-path exploration—standard practice for current SOTA agents—each inference step may fork into 5–10 parallel sandboxes:
- 10 inference calls × 2 seconds = 20 seconds (GPU)
- 10 inference calls × 5 parallel sandboxes × (sandbox fork 50ms + tool execution 500ms + state management 200ms) = 37.5 seconds (CPU)
CPU overhead overtakes GPU. This is the engineering basis for "general compute + intelligent compute coordination."
Kunpeng Supernode: A Three-Layer Agent Architecture
What Huawei presented at KADC 2026 was not a single product but a three-layer architecture spanning from chip to OS to runtime. This is the most important technical foundation for Agent Infra as a standalone category.
Bottom Layer: Kunpeng Supernode + Lingqu Interconnect
Core specifications of the Kunpeng Supernode:
- TB-scale interconnect bandwidth
- ~100ns latency (compared to microsecond-scale traditional Ethernet)
- 24TB unified memory pool—globally addressed, all CPU cores sharing the same address space
- Heterogeneous fusion with Ascend Supernode—intelligent compute (GPU) and general compute (CPU) sharing the same interconnect protocol
What do these numbers mean?
In a traditional server cluster, memory access between two machines traverses the network protocol stack, with latency in the microsecond-to-millisecond range. The Kunpeng Supernode uses the Lingqu (UnifiedBus) protocol to compress inter-node memory access latency to the 100-nanosecond range, and unified addressing means software doesn't need to distinguish between local and remote memory.
This isn't incremental optimization. It changes the fundamental model of how CPUs collaborate—from "communicate over the network" to "operate as if sharing memory."
Three key technologies in the Lingqu protocol amplify this advantage:
| Technology | Effect | Engineering Implication |
|---|---|---|
| SGL (Scatter-Gather List) | Communication latency −20% | Reduces memory copy operations; transmits multiple non-contiguous memory segments in one pass |
| Transparent UBSocket | Zero application modification, latency further reduced by 40% | Application code doesn't need rewritten socket calls; kernel layer automatically routes through Lingqu fast path |
| Shared TP (Transport) | Communication memory footprint −90% | Multiple connections multiplex the same transport layer, dramatically reducing memory overhead |
The notable figure is the 40% latency reduction from "Transparent UBSocket"—it means existing applications gain performance improvements without code changes, which is a prerequisite for large-scale migration.
Middle Layer: openEuler Heterogeneous Fusion OS
If Lingqu is the hardware-layer glue, openEuler is the software-layer glue.
At KADC 2026, openEuler demonstrated three key capabilities of its supernode OS:
- Global resource abstraction: All CPUs, memory, and accelerators within the supernode are abstracted into a single resource pool. Upper-layer applications see a "big machine" rather than a cluster.
- Heterogeneous compute peer interconnect: Breaking down CPU-GPU connectivity barriers, using the Lingqu protocol for memory pooling and compute integration.
- Efficient, compatible scheduling interfaces: Maintaining POSIX compatibility while providing supernode-aware scheduling capabilities.
The upcoming openEuler 24.03 LTS SP4 optimizes three dimensions: one-click deployment (usability), cross-node failure detection compressed to under 500ms (reliability), and large-scale container low-latency communication via UMDK (low latency).
The joint development with OpenCloudOS is also worth noting: VM live migration at extreme speed, extra-large cloud instances, and high-performance memory-semantic communication. These capabilities all point to the same goal—giving cloud-based agent runtimes near-bare-metal performance.
Top Layer: Agent Infra
This is the most interesting part of the entire architecture. Huawei defines Agent Infra as three core services:
Lightweight Sandboxing: Agent tool execution requires isolated environments. Traditional container startup times are in the seconds range (Docker cold start 275ms to several seconds, Firecracker microVM ~125ms), while agent fork demands are in the millisecond range. The Kunpeng Supernode, through multi-level cache sharing + incremental snapshots + rapid forking from any state, pushes rollback performance into the ~10ms range. Measured result: agent task success rate improved by over 10%.
Memory Services: Agent long-term memory requires persistent storage with near-memory access latency. The Kunpeng Supernode's approach includes: shared memory buffer pool warm loading / fast loading (hot data resident in memory), distributed global graph indexing (multimodal retrieval performance doubled), and context caching to reduce redundant injection (token overhead −50%).
Full-Chain Security: CCA (Confidential Computing Architecture, ARM's confidential computing standard) confidential agent solution, eBPF container-level trusted authorization, and built-in cryptographic modules + openGauss sub-second recovery. These three defense layers cover runtime isolation, policy enforcement, and data protection respectively.
The combination of these three services constitutes what Huawei calls "Agent Infra." The key point is not the technical specs of individual components, but that this is the first time any vendor has designed agent-specific infrastructure across the full stack from chip to OS to runtime.
Sandboxing: The First-Principle Requirement for the Agent Era
Why is sandboxing so critical for agents? It deserves a deeper look.
The Nature of the Problem
An agent's core capability is "using tools." But tool execution is inherently dangerous—an agent may generate and execute untrusted code, modify critical files, or make unexpected network requests. Every tool execution needs to happen in an isolated environment.
Even more critically, agents frequently need to "explore and roll back": try an approach, and if it fails, revert to a prior state and try another. In search strategies like MCTS, this fork-explore-rollback pattern occurs at extremely high frequency.
Bottlenecks of Existing Solutions
The DeltaBox paper, jointly published by Shanghai Jiao Tong University and Huawei, provides detailed quantitative data:
| Solution | Checkpoint Latency | Restore Latency | Bottleneck |
|---|---|---|---|
| E2B | ~4 seconds / 1GiB RAM | — | Full state replication |
| Docker commit | Several seconds | Several seconds | Layer copying |
| CRIU | Seconds (multi-GiB) | Seconds | Full process state dump |
| Firecracker snapshot | Hundreds of ms to seconds | Hundreds of ms | Guest memory pre-touch |
| DeltaBox | ~14ms | ≤6ms (P95) | Incremental deltas |
DeltaBox's key insight: the state change between consecutive agent checkpoints is minimal—perhaps a few modified files and a handful of changed memory pages. Rather than replicating everything, just record the delta.
This idea isn't new (OverlayFS, CRIU incremental dumps use similar thinking), but DeltaBox systematically applies it to the agent scenario and achieves millisecond-level C/R through two OS-level mechanisms:
- DeltaFS: Runtime hot-layer switching based on OverlayFS—freezing the current layer and inserting a new one without unmounting, reducing file operations to CoW (Copy-on-Write)
- DeltaCR: Incremental CRIU dumping + template fork() recovery, bypassing traditional pipes to fork directly from a frozen template process
Result: In the SWE-bench MCTS scenario, state management overhead dropped from 47%–77% to 3%–6%. Agents get to spend more time on useful inference instead of waiting for sandboxes.
The Kunpeng Supernode's sandboxing approach follows the same philosophy as DeltaBox: multi-level cache sharing + incremental snapshots + fast forking, with rollback in the ~10ms range. The difference is that Kunpeng further accelerates cross-node sandbox distribution through the hardware layer (Lingqu interconnect + unified memory addressing).
Why 10ms-Level Rollback Matters This Much
Consider an agent using Best-of-N sampling, with N=10:
- If each fork + rollback takes 1 second: state management overhead for 10 parallel trajectories = 10 seconds per iteration → severely impacts throughput
- If each fork + rollback takes 10ms: state management overhead for 10 parallel trajectories = 100ms per iteration → essentially negligible
This isn't a "somewhat faster" difference. It's a phase transition from "not feasible" to "feasible." The 10% improvement in agent task success rate that Huawei reports likely understates the gains achievable on deeper search trees.
The Architectural Challenge of Memory Services
Agent memory isn't simply "save it and use it next time." It must simultaneously satisfy three mutually contradictory requirements:
- Low-latency access: Agents need to retrieve memory in real time during inference. Latency must approach memory access speeds (microsecond scale).
- Persistent storage: Memory cannot be lost when an agent restarts. It needs to be durably stored.
- Semantic retrieval: Agents need to retrieve memory by semantic similarity (not exact match), requiring vector and graph indices.
The Contradictions in Traditional Approaches
Vector databases (Pinecone, Milvus) solve semantic retrieval but operate at millisecond to tens-of-millisecond latency, and don't support the high-frequency real-time access patterns of agent runtimes.
In-memory databases like Redis solve the latency problem but lack semantic retrieval capabilities, and the memory footprint of large-scale agent memories can balloon quickly.
Application-layer memory solutions like LangChain Memory, Zep, and Mem0 are essentially glue code at the framework level—organizing LLM conversation history, tool execution results, and user preferences into context that gets injected into prompts. They don't solve the infrastructure-layer performance problem.
Kunpeng Supernode's Memory Architecture
Kunpeng's approach is designed from the hardware layer up:
Shared Memory Buffer Pool: Leveraging the supernode's 24TB unified memory pool to keep agent hot memory resident. The buffer pool supports warm loading (automatically loading frequently-used memory when a new agent starts) and fast loading (pulling from remote memory on demand via the Lingqu protocol at 100-nanosecond latency).
Distributed Global Graph Index: An agent's knowledge graph may span multiple nodes (e.g., multiple agents across an enterprise sharing a knowledge base). The global graph index enables multimodal retrieval without hopping between nodes. Huawei claims multimodal retrieval performance is doubled.
Context Caching: This is the most intuitively measurable optimization. Agents frequently inject the same system prompts, tool descriptions, and historical context across multiple interaction turns. By caching these repeated elements, the system avoids re-tokenization and retransmission every time. Measured result: token overhead reduced by 50%.
Real-world data from China Telecom's Tianyi Cloud AgentDesk corroborates the value of memory services: LoCoMo long-sequence task accuracy +37%, input token consumption −68%. Accuracy improvements and cost reductions happening simultaneously—this means memory isn't "nice to have"; it's a force multiplier for agent capability.
Security: From API Gateways to Dynamic Call Chain Protection
The fundamental difference between agent security and traditional application security: agent call chains are dynamically generated, not predefined.
A traditional web application's database access pattern is fixed—ORM generates SQL → connection pool → database → result returned. Security policies can be designed around this fixed pattern.
An agent might execute: read file → discover a missing dependency → pip install → discover the dependency has a vulnerability → search for alternatives → download replacement package → run tests → discover test failure → modify configuration → retry…
Each step has different permission requirements, data flows, and security boundaries. And these steps are decided by the agent at runtime—the developer cannot predict the complete call chain in advance.
Kunpeng's Security Approach
The agent security solution Kunpeng presented at KADC 2026 covers three phases:
Pre-execution: Prompt injection detection + hardware trusted root binding intent to behavior. Ensures the agent is verified as a legitimate instance before it begins executing.
During execution: Three-tier dynamic sandboxing (Conch/Session/Tool sandboxes) + eBPF container-level trusted authorization + CCA confidential computing architecture. Each sandbox tier corresponds to a different granularity of isolation: Conch-level isolates the entire agent session, Session-level isolates a single interaction, and Tool-level isolates an individual tool call. eBPF (Extended Berkeley Packet Filter, a programmable security policy engine at the Linux kernel level) enables dynamic security policy definition without modifying the kernel, while CCA (Confidential Computing Architecture, ARM's confidential computing standard) provides hardware-level memory encryption to prevent memory dump attacks.
Post-execution: Built-in cryptographic modules + openGauss sub-second recovery. Agent critical data is encrypted at rest, and even if compromised, the system can rapidly restore to a known-safe state.
These technology choices are not arbitrary. CCA adopts ARM's open standard (rather than a proprietary solution). eBPF builds on the Linux community's mainstream technology stack (rather than custom kernel modules). This signals that Huawei has chosen ecosystem compatibility over technological lock-in at the security layer.
Comparison with the Overseas Agent Ecosystem
To understand where Kunpeng Agent Infra stands, we need to place it on the map of the global agent ecosystem.
Ecosystem Layer Comparison
| Layer | Huawei (Kunpeng) | Overseas Counterpart | Key Difference |
|---|---|---|---|
| Orchestration | None (relies on upper-layer frameworks) | LangGraph / CrewAI / AutoGen / Dify | Huawei doesn't build orchestration; focuses on infrastructure |
| Tool Protocol | None (relies on upper-layer) | Anthropic MCP / Google A2A / OpenAI Function Calling | Huawei is not participating in tool protocol standardization |
| Runtime | openEuler + Lingqu + Sandboxing | Docker / Firecracker / WASM / E2B | Huawei has hardware co-design advantages |
| Memory | Shared memory + openGauss + Graph index | LangChain Memory / Zep / Mem0 | Huawei designs from the memory architecture layer |
| Security | CCA + eBPF + Confidential computing | Each vendor builds their own (E2B uses gVisor, OpenAI uses Firecracker) | Huawei chose open standards |
| Interconnect | Lingqu protocol (TB-scale bandwidth, ~100ns latency) | UALink / NVLink / CXL (primarily serving GPUs) | Huawei is the only interconnect designed for CPU agent workloads |
Key Gaps and Opportunities
Gap 1: Empty orchestration layer. Huawei has no agent orchestration framework of its own, nor is it participating in the standardization of tool protocols like MCP or A2A. This means Kunpeng's Agent Infra depends on upper-layer frameworks like LangGraph and Dify to "discover" it. If these frameworks don't proactively adapt to Kunpeng, the value of Agent Infra will be hard for developers to perceive.
Gap 2: The overseas ecosystem's path dependency. The overseas agent ecosystem (LangGraph, CrewAI, OpenAI Agents SDK) focuses on orchestration logic and developer experience, with infrastructure thinking that rarely extends beyond "give me a fast enough Docker" or "give me a Firecracker sandbox." No one is systematically rethinking agent runtimes from the perspective of CPU interconnect, OS scheduling, and memory architecture.
Opportunity: The unique value of full-stack design. Huawei is the only vendor designing Agent Infra across the full stack from chip to OS to runtime. The overseas ecosystem is vibrant at the orchestration layer but nearly empty at the infrastructure layer. If Kunpeng can demonstrate that "LangGraph running on Agent POSIX is 3x faster than on vanilla Linux," the ecosystem adaptation problem solves itself.
A signal worth watching: MCP suffered a systematic security vulnerability disclosure in early 2026—the CSA (Cloud Security Alliance) research identified that a STDIO design flaw in the MCP SDK affected approximately 200,000 instances, involving over 150 million package downloads. This is precisely the evidence that the agent tool protocol layer needs infrastructure-grade security support; application-layer protocols alone aren't sufficient. Kunpeng's CCA + eBPF approach has a structural advantage on this problem.
"The New POSIX": Ambition and Risk in the Standards Battle
openEuler introduced an ambitious concept at KADC 2026: Agent POSIX.
In openEuler's architecture, the Agent Infra software stack is divided into three layers:
- Bottom layer, Agent Kernel: Native scheduling and security implemented within the supernode kernel
- Middle layer, Agent Service: Abstracting security, memory, and sandboxing into Agent POSIX primitives, turning development from "building wheels" into "assembling building blocks"
- Top layer: Supporting various agent applications
The name "Agent POSIX" has a clear point of reference. POSIX (Portable Operating System Interface) defined the portability standard for Unix/Linux applications and has been the foundation of the software ecosystem for 40 years. Any program conforming to POSIX can run on any POSIX-compatible operating system.
Huawei wants to do something similar for the agent era: define a portability standard for agent runtimes. If an agent framework (say, LangGraph) calls Agent POSIX primitives instead of directly manipulating Docker APIs or the filesystem, it can run on any Agent POSIX-compliant infrastructure—whether that's a Kunpeng Supernode, AWS, or a local cluster.
If It Succeeds
All agent frameworks could run on Agent POSIX-compliant infrastructure without adapting to specific hardware or OS combinations. Developer agent code could seamlessly migrate across Kunpeng, AWS, and Alibaba Cloud.
This would be the "Linux moment" for the agent era—an ecosystem explosion driven by operating system standardization.
If It Fails
Just another Huawei proprietary standard. Historically, Huawei has made multiple attempts to define industry standards (HarmonyOS's distributed capabilities, HMS Core), but cases of genuine, broad industry adoption are rare. Standard formation requires multi-party participation; one company cannot dictate it.
Assessment
Kunpeng's 4.15 million developers, 7,000+ ecosystem partners, and openEuler's open-source foundation (2,100+ enterprises, 27,000 contributors, 16 million installed instances) provide a plausible base. But the decisive factors are two:
- Whether significant non-Huawei participants join. If Agent POSIX only circulates within the Kunpeng ecosystem, it is functionally a proprietary standard. Only when Alibaba Cloud, Tencent Cloud, or even overseas vendors build their own Agent Infra on this standard can it become a genuine industry standard.
- Compatibility with existing protocols like MCP/A2A. Agent POSIX must not compete with existing tool protocols; it should serve as their underlying support layer. If Agent POSIX and MCP are in a replacement rather than complementary relationship, developers won't buy in.
As it stands today, Agent POSIX reads more like an architectural vision from openEuler than a mature standardization proposal. But the question it raises is the right one: agents need a standardized runtime interface layer, just as Unix applications needed POSIX 40 years ago.
Assessment
Conditions for Agent Infra to Be Established as a Standalone Category
Three conditions, evaluated one by one:
1. Agent workloads continue to grow. Nearly certain. From ChatGPT's function calling to OpenAI's Codex, from Anthropic's computer use to Google's A2A protocol, the industry direction is clear: the next step for LLMs is agents. The 47%–77% state management overhead revealed by the DeltaBox paper proves that the infrastructure pressure from agents is real.
2. CPU becomes the agent bottleneck. Already happening. Agent control flows are densely concentrated on CPU execution, and multi-path exploration can push CPU overhead past GPU. The Kunpeng Supernode's unified memory addressing and 100-nanosecond interconnect directly address this bottleneck.
3. General-purpose infrastructure cannot efficiently support the workload. Data already supports this. Docker cold starts in hundreds of milliseconds, E2B checkpoints in seconds, Firecracker snapshots in hundreds of milliseconds—these solutions struggle against the millisecond-level fork demands of agents. Agent-specific sandboxing (like DeltaBox's 14ms checkpoint / 6ms restore) is what's needed.
Two of three conditions are confirmed; the third has a clear trajectory. Agent Infra's establishment as a standalone infrastructure category is a matter of when, not if.
Core Strengths of the Kunpeng Approach
The only solution systematically designed for Agent Infra from the chip layer. Overseas solutions either focus on orchestration (LangGraph, CrewAI) or single-point optimization (E2B for sandboxing, Zep for memory). Nobody else is rethinking the agent runtime from a full-stack perspective spanning CPU interconnect, OS scheduling, memory architecture, and security isolation.
The general compute + intelligent compute coordination architecture has a natural advantage in agent scenarios. The 24TB unified memory pool means agent memory services don't have to trade off between persistence and low latency. The Lingqu protocol's 100-nanosecond interconnect makes cross-node sandbox forking nearly as cheap as a local operation.
Core Risks
Whether "the new POSIX" can achieve industry consensus. This is the biggest uncertainty. A standards battle is not a technology battle; it's an ecosystem battle. Kunpeng has 4.15 million developers, but developer count in the Chinese market does not equal legitimacy for a global standard.
Whether the overseas agent ecosystem will adapt to Kunpeng. MCP has 97 million downloads; A2A has 50+ partners. These protocols' runtime assumptions are "standard Linux + Docker/Firecracker." Getting them to adapt to Agent POSIX requires compelling performance evidence and strong economic incentives.
Delivery capability. KADC 2026 presented an architectural blueprint with some measured results. Between blueprint and large-scale commercial deployment lies a long path of engineering implementation, ecosystem adaptation, and customer education. Huawei's delivery capability in enterprise markets has been battle-tested (telecommunications, financial services), but its delivery velocity in the nascent agent market remains to be seen.
A Deeper Observation
The Agent Infra thesis is, at its core, a unique window of opportunity for China's AI infrastructure industry.
In the GPU/AI Infra layer, NVIDIA's CUDA ecosystem moat is extraordinarily deep. Chinese vendors have been chasing it for five years and are still catchinging up. But in the Agent Infra layer, there is no NVIDIA-scale monopolist anywhere in the world—E2B is still seed-stage, LangGraph is a Python framework, MCP is just a protocol. The "operating system" position in this space is vacant.
Huawei is the first player seriously contesting that position. The Kunpeng Supernode's full-stack design, openEuler's open-source foundation, and the Agent POSIX standardization ambition all point toward the same goal: becoming the Linux of the agent era.
Whether it can pull this off depends on two things: the openness of the ecosystem, and the speed of delivery. Not technical capability—Huawei's technical chops have been proven repeatedly—but the ability to make non-Huawei developers and vendors feel this ecosystem is worth joining.
That is the core of the Agent Infra battle, and the hardest part.
This article is Part 3 of the KADC 2026 Series Analysis. Data sources include KADC 2026 official releases, openEuler community documentation, the DeltaBox paper (Shanghai Jiao Tong University & Huawei, arXiv 2605.22781), the CSA MCP security research report, and official documentation from various agent frameworks.
