← Thinking Thinking

Inside Jalapeno: What Happens When an AI Company Builds Its Own Heart

A deep technical dissection of OpenAI's first in-house AI inference chip.

2026-06-27Thinking89 min read

Inside Jalapeño: What Happens When an AI Company Builds Its Own Heart

On June 24, 2026, OpenAI and Broadcom jointly released Jalapeño, OpenAI's first in-house AI inference chip. From architecture to tape-out in just 9 months, with inference costs projected to drop ~50% and performance matching NVIDIA Blackwell and Google TPU. First deployment by end of 2026, scaling to 10GW of compute capacity by 2029. This is not just a chip. It is the moment OpenAI pivots from "AI company" to "AI full-stack company", and the moment the AI industry's supply chain logic starts to be rewritten.


Introduction: Three and a Half Years from ChatGPT to Jalapeño

Rewind to November 30, 2022—the day ChatGPT launched. Back then, OpenAI was still a "model company." Its compute lived in NVIDIA GPU clusters inside Microsoft Azure, and the trained models were exposed to the world through the OpenAI API. Nobody imagined that the same company would, three and a half years later, stand in a Broadcom office in Silicon Valley and receive a wafer it designed itself from Hock Tan's hands.

But the story of Jalapeño isn't "an AI company suddenly gained chip-making ability." It's the final landing of a series of decisions:

  • End of 2023: Richard Ho (a core engineer from Google's Cloud TPU team) joined OpenAI and began building a chip team
  • Mid-2024: OpenAI and Broadcom started secret joint development (the "pre-phase" of the 18-month cycle)
  • October 2025: The partnership was publicly disclosed—tens of billions of dollars, 10GW deployment target
  • June 24, 2026: Jalapeño officially released; engineering samples already running GPT-5.3-Codex-Spark

The chip's name, Jalapeño (Mexican pepper), continues Google's TPU tradition of naming chips after snacks (Boardwalk, Bristlecone, Trillium), with OpenAI's signature "seemingly casual, actually precise" branding. The naming style makes it impossible to take as seriously as "B100" or "MI300X," but its strategic significance far exceeds what the name conveys.

This article will dissect Jalapeño's:

  1. Designers . Google TPU soul + 40-person minimalist team + Broadcom's ASIC engineering power
  2. Manufacturing . TSMC 3nm + Broadcom silicon implementation + Celestica system integration
  3. Positioning . Why "inference-only, not training" is the only commercially correct choice
  4. Architecture and Specifications . Systolic array, HBM, Arm CPU, Tomahawk interconnect
  5. Process & Compute. vs Google TPU v7/v8 and Blackwell compute, efficiency, HBM
  6. Physical Analysis. 3nm, Die Size, transistor count, packaging, thermal, memory
  7. Software Adaptation. Vertical integration advantages and ecosystem lock-in costs
  8. Horizontal Comparison. vs NVIDIA / Google TPU / AWS / Microsoft
  9. China Vendor Panorama. Alibaba, ByteDance, DeepSeek, Zhipu
  10. Conclusion. The next form of an AI company

1. Designers: The Soul Out of Google TPU

Jalapeño full-stack collaboration chain: OpenAI / Broadcom / TSMC / Celestica
Jalapeño full-stack collaboration chain: OpenAI / Broadcom / TSMC / Celestica

The Core Engineer: Richard Ho

The soul of Jalapeño is OpenAI's hardware project lead, Richard Ho. He previously spent nearly nine years at Google, where he was a core architect on the Cloud TPU project, leading multiple TPU generations from concept to mass production.

Richard Ho's key background:

  • 2014–2023: Google TPU team, designer of systolic array architecture, memory hierarchy, on-chip network, and other critical modules
  • Witnessed the full evolution from TPU v1 (2016) to TPU v7 Ironwood (2025)
  • Has the most direct engineering understanding of why the ASIC route works, why Tomahawk-class interconnect is necessary, why HBM selection is so critical

End of 2023, Sam Altman began aggressively recruiting from Google's TPU team. Richard Ho joined OpenAI during this period and led the formation of the Jalapeño team.

The signal from this personnel flow is extremely clear: OpenAI knew the ASIC route was viable, but needed people who had already learned all the lessons from TPU.not explorers starting from scratch. Richard Ho knows which pitfalls to avoid.

The OpenAI Chip Team: 40 People, Minimalist Architecture

At the time of release, OpenAI's chip team numbered approximately 40 people. To put this in context:

Team Size Notes
OpenAI Jalapeño team ~40 people Architecture design
Google TPU team Thousands Includes compiler, software, verification
NVIDIA GPU design team Tens of thousands Spans multiple product lines
Apple Silicon team Thousands Includes SoC, GPU, Neural Engine

Why can 40 people build a chip?

Because OpenAI's 40 people do only one thing—architecture definition and algorithm-hardware co-design. All RTL-to-GDS (from synthesizable code to physical layout) silicon implementation is handled by Broadcom's hundreds-strong team. This "OpenAI defines 'what to do', Broadcom implements 'how to do it'" division of labor is the most efficient chip-making model in the current AI chip industry.

Broadcom: The Real Silicon Implementation Power

Broadcom is the true giant of ASIC design services:

  • Existing clients: Google, Amazon, Meta, ByteDance, Anthropic
  • AI ASIC design capability: From Tomahawk networking chips to TPU physical implementation.all Broadcom
  • October 2025: Strategic partnership signed with OpenAI, 10GW deployment target
  • CEO Hock Tan personally oversaw: At the launch event, he personally handed the first wafers to Sam Altman

Broadcom's role is not just "foundry" but an extension of OpenAI's chip team. Everything from RTL, verification, synthesis, place-and-route, timing sign-off, to Tomahawk network chip integration is handled by Broadcom.

Celestica

The Canadian electronics manufacturing services provider handles board, rack, and server system industrialization. Celestica's role is underestimated—going from a chip to a deployable server requires power management, thermal design, PCIe interfaces, chassis structure, and a lot more engineering. Celestica is the critical bridge from "successful tape-out" to "data center ready" for OpenAI.


2. Manufacturing: TSMC 3nm + Broadcom Silicon Implementation

Jalapeño's manufacturing division is a typical AI-era ASIC collaboration network:

Link Owner Notes
Architecture design OpenAI Blank-slate design, built from zero around LLM inference
Silicon implementation (RTL-to-GDS) Broadcom Physical design, timing sign-off, DFT
Wafer foundry TSMC 3nm (N3) advanced process
Network interconnect Broadcom Tomahawk networking chips (rack/cluster interconnect)
Board/rack Celestica Industrialization from chip to server system
HBM supply SK Hynix / Samsung High-bandwidth memory (presumed HBM3E or HBM4)

What TSMC 3nm Really Means

The industry needs to be honest about what "3nm" means. TSMC N3's actual physical gate length no longer shrinks; "3nm" is mostly a marketing name. But there are some objectively comparable numbers:

  • TSMC N3 vs N5: logic density +60-70%, same-frequency power -25-30%
  • N3 capacity at TSMC is extremely tight.Apple, AMD, NVIDIA, MediaTek, and Broadcom clients are all fighting for it
  • TSMC has notified 5-10% price increases for 3-5nm customers

For OpenAI, getting a tape-out slot on N3 is itself a hidden resource brought by Broadcom's long-term customer relationships.

9 Months to Tape-Out: "The Fastest ASIC Development Cycle in History" (OpenAI's Claim)

Jalapeño took only 9 months from architecture design to tape-out. OpenAI claims this is "the fastest ASIC development cycle in the history of high-performance advanced semiconductors." Comparisons:

Comparable Development Cycle
OpenAI Jalapeño 9 months (architecture to tape-out)
Google TPU v1 ~15 months (start to deployment)
Google TPU v2-v4 12-18 months/generation
AWS Trainium v1 → v2 ~24 months
Apple M-series 24-36 months
Traditional chip company comparable projects 24-36 months

⚠️ Analysts on Zhihu point out that "9 months" may be counted from some intermediate node. OpenAI and Broadcom's joint development actually started in mid-2024 (described by media at the time as "18 months of secret R&D"), and actual architecture exploration may have begun earlier. Compressing "18 months" into "9 months" is a PR choice, but even so, it's extremely fast.

The Real Substance Behind "9 Months"

Breaking down the time savings:

Time-saving Source Estimated Contribution
18 months of pre-architecture exploration and Broadcom collaboration groundwork 3-4 months saved (counted into "formal R&D")
AI-accelerated PPA optimization (DSO.ai / Cerebrus-class tools) 2-3 months saved
Broadcom IP reuse (Tomahawk, SerDes, HBM PHY) 2-3 months saved
AI-accelerated verification, thermal simulation, sign-off 1-2 months saved
Traditional circuit design and tape-out prep 2-3 months (still required)

OpenAI didn't perform a miracle.they used every available tool in the industry to the extreme, plus Broadcom's ASIC engineering reuse, to compress the timeline to 9 months.


3. Positioning: Inference ASIC, Not Training Chip

"Intelligence Processor"—OpenAI's Name for Its Own Chip

This is not marketing language; it defines the essential difference from a GPU.

Jalapeño does not do training, only inference.

Dimension Training Inference
Purpose Make the model learn Make the model answer questions
Characteristics One-time, high precision, crazy compute Continuous operation, latency-sensitive, cost-king
Cost nature Capital expenditure (CAPEX) Operating expenditure (OPEX)
OpenAI 2025 H1 inference spend . $5.02 billion
Industry estimate of inference share . Inference ~60-70% of total AI operating cost (industry-standard estimate)

Why 2026 Is the Right Time

Launching an inference chip in 2023 would have been too risky.the model architecture was iterating rapidly (from pure Transformer to MoE to hybrid architectures), and an ASIC could be obsolete by the time it shipped. By 2026, Transformer + MoE has become the dominant paradigm, architecture has stabilized, and the commercial risk of custom ASICs has dropped to an acceptable range.

Three strategic considerations:

  1. Technology maturity window: Model architecture is stable; the ASIC won't fall behind
  2. Commercial pressure tipping point: Inference costs crossed a critical threshold in 2025; a cost-reduction solution is needed
  3. Supply chain diversification "floor": Break free from single dependence on NVIDIA

Business Logic: Breaking NVIDIA Dependence

OpenAI compute infrastructure history:

  • 2023–2024: ~20,000 H100 (GPT-4 training + inference)
  • 2025: H200 + Grace Hopper (GPT-5 training)
  • 2026: B200/GB200 (multimodal model training)

Cost structure: In 2025, OpenAI's total spending was $34 billion, revenue $13 billion, net loss $20.9 billion. H1 inference spend was $5.02 billion (annualized ~$10B), accounting for ~30% of total spending (OpenAI's own figures). The industry-standard estimate is that inference is 60-70% of total AI operating cost.the gap between the two calibers is whether "AI operating cost" includes training or not. CFO Sarah Friar has warned: if revenue growth doesn't keep up with data center contract expansion, compute costs could soar to $85 billion by 2028.

Jalapeño targets directly the most frequent and most expensive segment.inference. If Hock Tan's "50% cost reduction" claim holds, it means:

  • Per-token cost halved → ChatGPT free user service capacity doubles
  • Same budget serves more users → directly affects OpenAI's break-even point
  • Same conditions free up budget for training (still buying NVIDIA GPUs) → overall compute expansion accelerates

4. Architecture and Specifications: Systolic Array, HBM, Arm CPU, Tomahawk Interconnect

Core Architecture: Systolic Array

The Systolic Array is the AI accelerator architecture pioneered by Google's TPU v1 (2016). The name comes from cardiac systole—data flows regularly between processing elements (PEs) like blood, with each PE repeatedly doing the same thing (typically multiply-accumulate).

OpenAI's choice of systolic array is doubling down on a paradigm already validated by Google, not adventurously experimenting.

Benefits of Systolic Array for Dense Model Inference

Systolic array vs GPU data flow: 70-90% vs 30-40% utilization
Systolic array vs GPU data flow: 70-90% vs 30-40% utilization

Dense models (like Llama-3-70B style) have all parameters participating in computation during every inference.

  1. Data Movement Minimization
  • Traditional GPU: each CUDA core requires independent register read/write
  • Systolic array: weights flow in from the left, "passing" through the array without "re-reading," only interacting with HBM at the boundary
  • Less data movement = direct power savings = improved performance per watt
  1. Extreme Compute Density
  • A 128×128 systolic array = 16,384 MAC units, all working simultaneously
  • Doesn't need very high frequency, but completes ~3.3 trillion INT8 ops/sec
  • Perfect for the Transformer "matrix multiply → matrix multiply → matrix multiply" pattern
  1. Regular Data Flow
  • Batch size and sequence length are relatively stable during inference
  • Doesn't need the GPU's flexibility to "run any computation pattern"
  • Hardware utilization can stay stable at 70-90% (far above GPU's 30-40%)

Systolic Array for MoE: Matches and Conflicts

MoE inference on systolic array: routing → sub-array allocation → matmul → KV share
MoE inference on systolic array: routing → sub-array allocation → matmul → KV share

OpenAI's GPT-4 / GPT-5 / GPT-5.3 are all based on the MoE architecture. MoE has two key features:

  • Sparse activation: 1.8 trillion total parameters, but only 2% (~36 billion) activated per token
  • Expert routing: Router dynamically selects Top-K experts for each token (typical K=2~8)

Matches

  1. Local Matrix Multiplication Still Dominates Compute
  • Even with MoE, the FFN computation inside activated experts is still huge matrix multiplication
  • DeepSeek-R1's activated expert FFN: ~50B params × batch × seq_len
  • This part is fully suited to systolic array's high-density matrix multiplication
  1. Intra-Expert Computation Is Regular
  • Once the Router decides which experts to activate, the subsequent matrix multiply path is determined
  • Software/hardware can "pre-warm" the corresponding systolic array regions in advance
  • This predictability + local regularity is the sweet spot for systolic arrays
  1. KV Cache Reuse Friendly
  • In MoE inference, the attention layer's KV Cache is shared across all experts
  • The systolic array data flow model is well-suited to "weight resident, KV data flow" attention computation

Conflicts

  1. Dynamic Expert Routing
  • Different tokens go to different experts → array utilization may fluctuate wildly
  • If K=2, perhaps only 25% of the array is busy
  • Response: Hardware reserves "expert broadcast bus" so array sections can load different expert weights; or token sorting for batch scheduling
  1. Increased Communication Overhead
  • MoE's "Expert Parallelism" inherently requires cross-card communication
  • After routing, the activated experts may be on different chips
  • This is why OpenAI must use Broadcom Tomahawk (high-speed interconnect) for large-scale clusters
  1. Load Imbalance
  • Router probability distribution may cause certain experts to be selected more often
  • Requires "hard truncation + re-routing" mechanisms
  • Systolic arrays are sensitive to this imbalance because the PE array is physically fixed

Inferred Jalapeño Response Strategy

Although no ISA documentation is public, from OpenAI's emphasis on "designed from zero for LLM inference" we can infer:

  • Array scale may not be a single oversized array, but multiple medium-scale systolic arrays (e.g., 8-16 sub-arrays of 64×64)
  • Can flexibly assign experts to different sub-arrays based on routing results
  • Tomahawk network chips support large-scale expert parallelism, extending "hardware sparsity" to "cluster sparsity"

Confirmed Technical Details

Item Estimated/Confirmed
Process TSMC N3
Core architecture Systolic Array
Memory HBM3E or HBM4 (estimated 80-144GB)
Task scheduling CPU Arm custom design
Network interconnect Broadcom Tomahawk series switches
Theoretical compute (est.) ~10-13 PFLOPS (INT8/FP8)
Per-token energy ~30% lower than Blackwell
Inference cost reduction ~50% (Hock Tan)

Four Optimization Dimensions (Richard Ho)

OpenAI's hardware lead emphasized Jalapeño has comprehensive optimization across four dimensions:

  1. Kernel . Hardware-level hardcoding for LLM core operators (matrix multiply, attention)
  2. Memory Movement . Reducing data shuttling between HBM and compute units
  3. Network . Tomahawk enables low-latency interconnect for large-scale clusters
  4. Service Model . Co-optimization for online inference batching, KV Cache residency, etc.

These four optimizations let the chip's real-world utilization approach theoretical peak, unlike GPUs that have "high peak, low utilization (30-40%)."


5. Process and Compute Estimation: vs Google TPU v7/v8

Process Estimation Basics

TSMC N3 vs N5: logic density +60-70%, same-frequency power -25-30%. This is an objective benchmark, applicable to all N3 customers.

Google TPU Known Key Data

Chip Release Process FP8 Compute/Chip HBM Cluster Scale Notes
TPU v6 Trillium 2024.5 5nm ~1 PFLOPS 32GB 256 cards 4.7x v5e performance
TPU v7 Ironwood 2025.11 TSMC 3nm 4.6 PFLOPS 192GB HBM3E 9,216 cards (42.5 EFLOPS) 100% liquid cooling, ~980W
TPU 8t "Sunfish" Released 2026.4 Est. 3nm Est. ~12 PFLOPS Est. 192GB+ Same Training-only, Broadcom design
TPU 8i "Zebrafish" Released 2026.4 Est. 3nm Est. 3-4 PFLOPS 384MB SRAM×3 Same Inference-only, MediaTek design

Jalapeño Compute Estimation

OpenAI didn't disclose die size, HBM count, or transistor count. But there are public clues:

  • Process: TSMC N3 (same generation as TPU v7 Ironwood)
  • Network: Broadcom Tomahawk
  • Hock Tan claims "performance comparable to Blackwell"

Using TPU v7 Ironwood's 4.6 PFLOPS as baseline.NVIDIA Blackwell B200 in FP8 is 20 PFLOPS, but that's a dual-die, 208 billion transistors, 1600mm² die size, 1200W TDP monster.

Jalapeño's compute estimate must be done in two steps.

(a) FP8 peak compute estimate (direct Blackwell comparison):

  • Reason 1: ASIC architecture is more efficient than GPU; same transistor budget yields more equivalent compute
  • Reason 2: Hock Tan says "comparable to Blackwell".this refers to effective throughput (see b), not direct FP8 peak comparison
  • Reason 3: Lei Tech's "10 PFLOPS" speculation falls in a reasonable range
  • Reason 4: With FP4/FP6 quantization, could reach 20+ PFLOPS (approaching Blackwell Ultra's FP4 15 PFLOPS)
  • FP8 peak compute estimate: 8-13 PFLOPS

(b) Effective inference throughput comparison (what Hock Tan actually said):

  • ASIC achieves 70-90% utilization on LLM inference workloads; GPUs typically 30-40%
  • For the same "FP8 peak compute", ASIC's effective throughput can be 2x that of a GPU
  • Hock Tan's "comparable to Blackwell" refers to this effective throughput, not peak FP8 equivalence
  • Note: Hock Tan's original quote's specific context is not cited in public reporting.this interpretation is my reasonable inference

Direct Compute Comparison (Estimated)

Performance vs efficiency per watt scatter: Jalapeño sits in inference efficiency sweet spot
Performance vs efficiency per watt scatter: Jalapeño sits in inference efficiency sweet spot
Chip Process FP8 Compute Memory Single-Card Power Interconnect
OpenAI Jalapeño (est.) TSMC 3nm ~10-13 PFLOPS 80-144GB HBM3E/4 300-500W (est.) Broadcom Tomahawk
Google TPU v7 Ironwood TSMC 3nm 4.6 PFLOPS 192GB HBM3E ~980W OCS Jupiter 32Tbps
Google TPU 8i (inference) Est. 3nm Est. 3-4 PFLOPS 1.15GB on-chip SRAM (main HBM undisclosed) Est. 600W Same
Google TPU 8t (training) Est. 3nm Est. 12 PFLOPS Est. 192GB+ HBM3E Est. 980W Same
NVIDIA B200 TSMC 4NP 20 PFLOPS 192GB HBM3E 1200W NVLink 5.0 1.8TB/s
NVIDIA GB300 TSMC 4NP ~7-8 PFLOPS (FP8 est.) / 15 PFLOPS FP4 288GB HBM3E ~1400W NVLink 5.0

Key Observations

  1. Jalapeño Single-Card Compute May Exceed TPU v7 Ironwood If the estimate is correct (10-13 PFLOPS FP8), Jalapeño's single-card compute is 2-3x Ironwood's. But ASIC vs ASIC needs a discount—TPU also has to accommodate VPU (vector processing unit) and other general-purpose parts, with die area not entirely systolic array.
  2. HBM Is Jalapeño's Potential Weakness

TPU Ironwood has 192GB HBM3E, 7.4 TB/s bandwidth. If Jalapeño has 80-144GB, 3-4 TB/s bandwidth:

  • Large model inference (70B-200B parameters) requires HBM to hold weights + KV Cache
  • 144GB just fits 70B FP16 weights (140GB), with little KV space
  • This means Jalapeño must combine with model sharding or quantization to serve large models
  • Speculation: Jalapeño serves GPT-5.3-style mixed-precision + MoE expert-sharded models, with single cards holding only partial experts
  1. Single-Card Power Is Jalapeño's Clear Advantage
  • ASIC energy efficiency is 2-3x GPU.this is industry consensus
  • 300-500W TDP allows simpler cooling (air cooling or light liquid cooling)
  • This is likely the main source of Jalapeño's 50% inference cost reduction.not just cheaper chips, but also electricity savings
  1. Cluster Scalability Is Jalapeño's Open Question

TPU Ironwood clusters support 9,216 cards, 42.5 EFLOPS aggregate compute. How far can Jalapeño's Tomahawk network scale? OpenAI hasn't disclosed, but the 10GW data center target implies needing millions of cards.

Performance Per Watt: The Real Battle

Dimension OpenAI Jalapeño Google TPU v7 Ironwood NVIDIA B200
Single-card FP8 compute ~10-13 PFLOPS 4.6 PFLOPS 20 PFLOPS
Single-card power 300-500W 980W 1200W
FP8 compute per watt (TFLOPS/W) ~29 ~4.7 ~17
Inference cost (relative) 50% 44% TCO advantage Baseline

If Jalapeño truly achieves ~29 TFLOPS/W (theoretical range 20-43 TFLOPS/W, depending on actual power and compute), it will be the most energy-efficient AI accelerator in history. But this is single-card paper numbers; real "watts per useful throughput" in actual inference workloads will be discounted.


6. Physical Chip Analysis: 3nm, Die Size, Transistor Count

Die Size and Transistor Count

Although OpenAI hasn't released die photos or detailed physical dimensions, combining known parameters we can infer:

  • Process: TSMC N3 (same generation as Apple M4)
  • Transistor count: Not disclosed. Considering 3nm density and small team design capability, estimated at 50-150 billion transistors (NVIDIA Blackwell B200 is ~208 billion, dual-die)
  • Die size estimate: ~600-900mm² (single die, much smaller than B200's 1600mm²)
  • Packaging: Speculated to use CoWoS-S or CoWoS-L advanced packaging, with HBM and compute die connected through silicon interposer

Thermal Design

Combined with 3nm process energy efficiency and ASIC architecture's low power characteristics:

  • Estimated TDP: 300-500W (far below Blackwell's 1200W)
  • Cooling solution: Possibly air cooling + partial liquid cooling, or full liquid cooling
  • This means OpenAI's data center cooling costs may be only 1/2-1/3 of Blackwell clusters

HBM and Memory Hierarchy

  • HBM3E (estimated 6-8 stacks)
  • Total capacity estimate: 80-144GB
  • Bandwidth estimate: 3-4 TB/s

Critical Physical Bottleneck: HBM accounts for 30-40% of AI ASIC total cost. Hock Tan has publicly stated that HBM squeezes Broadcom's custom AI chip margins below those of other product lines like network switches.

"AI Designs AI Chips" . The Real Substance

OpenAI's public narrative is "GPT models participated in chip design".but this is PR framing that needs to be split into two levels:

  • ✅ Reality: AI participated in logic circuit routing optimization, thermal performance management, power prediction and other EDA steps. These tasks traditionally require weeks of engineering team iteration; AI can complete them in hours
  • ❌ Misleading framing: This doesn't mean "AI designed the chip." The most critical front-end architecture definition (systolic array choice, memory hierarchy design, ISA definition) is still done by Richard Ho's team

Industry Progress on AI-Assisted Chip Design

AI-assisted chip design 9-month tape-out flow: human + AI division
AI-assisted chip design 9-month tape-out flow: human + AI division

This part is very clear by 2026.AI-assisted chip design is not OpenAI's invention but a paradigm shift the entire EDA industry has been undergoing for the past two years:

  1. Synopsys DSO.ai (Design Space Optimization AI)
  • Released 2020, industry's earliest AI-assisted EDA tool
  • Uses reinforcement learning to search PPA optimum in chip synthesis and place-and-route
  • Has helped Samsung, Renesas, NVIDIA complete hundreds of tape-outs
  • Measured: Compared to manual work, compresses engineering time from weeks to hours, achieves 10-20% additional optimization in power, performance, area
  1. Cadence Cerebrus
  • Released 2022, competing with DSO.ai
  • Automatically parallel-runs Innovus physical implementation flow, AI adjusts parameters
  • Key data: As of June 2026, adopted by over 2,000 chip tape-outs
    • Compute efficiency +~4x
    • Turnaround time -~2x
    • PPA improvement +~15%
  1. Cadence ChipStack (Released 2026, industry's first L5 autonomous chip design AI Agent)
  • Released June 2026, world's first "virtual IC design engineer"
  • EDA autonomy classification (analogous to autonomous driving L1-L5):
    • L1-L3: AI as tool assistance (current state)
    • L4: AI can understand goals, autonomously call tools
    • L5: AI can independently complete RTL generation, verification planning, formal verification, debugging, convergence
  • Cadence's "Mental Model" addresses LLM hallucination problems in chip design
  1. Siemens Celus + Cadence Allegro X AI
  • Focus on PCB-level design
  • AI automatically generates schematic drafts, place-and-route drafts
  1. China EDA Progress
  • Shanghai Peifeng Tunan Semiconductor (headquarters moved to Zhangjiang in 2026)
  • Empyrean Technology, Primarius Technologies (listed companies) accelerating AI-native EDA

Conclusion: The "real substance" of OpenAI's 9-month tape-out is "3-4 months of pre-architecture exploration by the OpenAI + Broadcom team + 2-3 months of AI-accelerated PPA optimization + 2-3 months of Broadcom IP reuse + 1-2 months of AI-accelerated verification and thermal simulation".not "AI designed the chip from start to finish."


7. Software Adaptation: Vertical Integration Advantages and Ecosystem Lock-in Costs

Advantage: OpenAI Is Making a Chip for Itself

Jalapeño's biggest software advantage is no ecosystem migration problem.it doesn't need to sell externally, only adapt to OpenAI's own models:

  • Model adaptation range: Verified with GPT-5.3, Codex, Spark
  • Forward compatibility with model iterations: Richard Ho states "it will adapt well to all future versions of LLMs"
  • Inference framework: OpenAI's own inference stack (likely Triton-based open-source compiler or proprietary engine)
  • MRC protocol groundwork: In 2025-2026, OpenAI introduced MRC (Multi-Path Reliable Connection) protocol at the model layer, working with AMD, Broadcom, Intel, NVIDIA, Microsoft on multi-chip high-speed communication optimization

The Gap with NVIDIA's Ecosystem

NVIDIA's CUDA moat is not just the programming language, but:

  • 500+ optimized libraries (cuBLAS, cuDNN, TensorRT, TensorRT-LLM)
  • Millions of developer ecosystem
  • Complete training → inference → deployment toolchain

In contrast, Jalapeño's software stack is a vertical stack customized for OpenAI.an advantage (extreme optimization) and a disadvantage (not general-purpose, cannot capture external ecosystem dividends).

The Real Bet on Software Adaptation

Jalapeño doesn't need global developers writing CUDA code, but it must solve:

  • Toolchain for migrating models from H100 to Jalapeño
  • Inference optimization (quantization, KV Cache reuse, continuous batching)
  • Integration with OpenAI's existing inference infrastructure
  • Cluster scheduling (Tomahawk interconnect low-latency routing)

This is Jalapeño's real risk.the hardware tape-outped on time, but can the software be polished before large-scale deployment by end of 2026? Historically, hardware delivery is easy; software ecosystem maturity takes 2-3 years.


8. Horizontal Comparison: AI ASIC Battlefield

Flagship ASIC Specifications Comparison

Dimension OpenAI Jalapeño Google TPU 8i Google TPU 8t AWS Trainium3 AWS Inferentia2 NVIDIA B200 Microsoft Maia 200
Positioning Inference ASIC Inference ASIC Training ASIC Training ASIC Inference ASIC General GPU General inference
Process TSMC 3nm Est. 3nm Est. 3nm TSMC 3nm 5nm TSMC 4NP Est. 5nm
Compute (est./conf.) ~10-13 PFLOPS ~3-4 PFLOPS ~12 PFLOPS ~2.5 PFLOPS/card (source "2.52 EFLOPS" likely PFLOPS, per 144-chip cluster arithmetic) Lower 20 PFLOPS Lower
Memory HBM3E/4 (80-144GB) 1.15GB on-chip SRAM (main HBM undisclosed) Est. 192GB+ HBM3E 144GB HBM3E Lower 192GB HBM3E Higher
Interconnect Tomahawk OCS Jupiter 32Tbps Same Custom Custom NVLink 5.0 (~1.8TB/s) Est. Ethernet
Power 300-500W Est. 600W Est. 980W Higher Lower ~1200W Higher
Cost advantage ~50% vs GPU 60-70% vs GPU 40-50% vs GPU ~50% vs H100 ~80% vs H100 Baseline Est. 30%
Availability OpenAI internal only External sales External sales AWS cloud AWS cloud Industry-wide Azure internal
Training capability ✅ Primary ✅ Primary
Developer ecosystem None (proprietary) Open (JAX/XLA) Open Neuron SDK Neuron SDK CUDA ecosystem Est. closed

Key Judgments

  1. Google TPU: Most Mature ASIC Paradigm

Google has accumulated 10 years of ASIC experience since TPU v1 (2016). TPU 8i (inference) / 8t (training) correspond to OpenAI's current path and future direction.

Key differences:

  • Ecosystem openness: In 2026, Google announced external sales of TPU, partnering with Blackstone to establish a $25 billion AI cloud computing company
  • Software maturity: JAX + XLA compiler + TensorFlow deep coupling
  • Cluster capability: TPU v7 single Pod 4,096 cards, v7 Ironwood cluster can reach 9,216 cards, 42.5 EFLOPS
  • Power cost: TPU v7 single-chip 980W
  • Inference cost advantage: TCO 44% lower than GB200
  1. AWS Trainium: Largest-Scale ASIC Deployment on Cloud
  • Trainium3 FP8 compute est. ~2.5 PFLOPS/card (source "2.52 EFLOPS" likely PFLOPS, per 144-chip cluster arithmetic), 144GB HBM3e
  • Trainium3 vs Trainium2: performance +30-40%, power -40%
  • AWS's biggest advantage: bundled with cloud services, developers don't buy chips, they rent
  • Disadvantage: High Neuron SDK migration cost; Cohere/Stability AI report Trainium1/2 underperforming H100 in some scenarios
  1. Microsoft Maia 200: Most Underrated Competitor

Maia 200 released early 2026, two-plus years after Maia 100. Microsoft's special position:

  • Simultaneously OpenAI's largest investor
  • Azure also massively deploying NVIDIA GPUs
  • Dual identity: both chip player and OpenAI's compute supplier
  1. NVIDIA Blackwell: Defensive Posture, but Ample Ammunition
  • MLPerf 6.0 training: all 7 categories #1
  • GB300 NVL72 vs GB200: training speed +1.6x
  • Single-chip power 1200W, next-gen Rubin estimated 2300W
  • Biggest moat: CUDA ecosystem + full-stack software (TensorRT-LLM, NeMo, Megatron)
  • 8,192-GPU cluster has verified DeepSeek-V3 671B training
  1. OpenAI Jalapeño's Unique Positioning

Jalapeño is the only vertical ASIC in the table that from day one serves only one model family (GPT) and one workload (LLM inference). This is its biggest advantage.extreme vertical.and its biggest limitation.no external sales, unable to amortize R&D costs.


9. Chinese Vendor Self-Developed Chip Panorama

China's AI chip battlefield in 2026 shows three parallel paths: "national team + internet giants + model companies."

1. Alibaba Pingtouge: The Only "Full-Stack Self-Development" Aligned with OpenAI's Path

Latest release: Zhenwu M890 (May 20, 2026, at Apsara Conference)

Spec Zhenwu 810E (2024) Zhenwu M890 (May 2026) Zhenwu V900 (2027Q3 roadmap)
Process Undisclosed Est. 5nm Est. 3nm
HBM 96GB HBM2e 144GB HBM 216GB
Inter-chip interconnect 700 GB/s 800 GB/s 1200 GB/s
Relative performance 1x (baseline) 3x 9x
Use case Train-inference integrated Train-inference integrated + Agent optimized Train-inference integrated

Key Facts:

  • Manufacturing: SMIC
  • Cumulative shipments: 560,000 chips (as of May 2026)
  • Financial industry deployment exceeded 100,000 cards, covering 150+ institutions
  • Self-developed ICN Switch 1.0 interconnect chip, can build 64-128 card super-nodes
  • Bundled with Alibaba Xuantie RISC-V CPU, Yitian Arm server, Zhenyue SSD controller, Panmai SmartNIC
  • Roadmap: "One generation per year" iteration (same as OpenAI's plan)

Comparison with OpenAI's Path:

Dimension OpenAI Jalapeño Alibaba Pingtouge Zhenwu
Positioning Pure inference Train-inference integrated
External sales ❌ OpenAI internal only ✅ Alibaba Cloud + enterprise customers
Foundry TSMC 3nm SMIC (est. 5nm)
Ecosystem Proprietary (self-use only) Pingtouge + Alibaba Cloud
Process gap 3nm (leading) Est. 5nm (one generation behind)

2. ByteDance: Buy-Buy-Buy + Some Self-Development

ByteDance's approach is not relying on self-development, but large-scale procurement of domestic chips:

The 50,000-chip procurement in June 2026:

Supplier Quantity Use Case Process
T-Head Intelligence Zhikai MR-V100/MR-V100x ~32,000 Inference main force 7nm
Baidu Kunlunxin P800 15,000 Video understanding, recommendation Est. 7nm
Hygon DCU K100ai 3,000 Edge moderation -

Why isn't ByteDance buying H100/H200?

  • H20 backdoor incident (end of 2025) increased compliance risk
  • H100/H200 not in China's export control exemption range
  • Domestic inference chips are now "usable," and unit price is only 1/3 of H20 (Zhikai 20,000 yuan vs H20 ~60,000 yuan)

ByteDance's Compute Bill:

  • 2026 AI infrastructure capex: 200-700 billion yuan (different accounting)
  • Doubao MAU 345-368 million
  • Daily Token calls 120 trillion
  • Daily compute cost tens of millions of yuan
  • ByteDance 2025 net profit shrank over 70% YoY

3. DeepSeek: Model + Domestic Chip Closed Loop (Not the OpenAI Path)

Latest milestone: DeepSeek V4 Pro (April 2026)

  • Total parameters: 1.6 trillion (MoE)
  • Actually activated: ~5-8%
  • World's first trillion-parameter large model with full-stack adaptation to Huawei Ascend chips
  • Inference speed reaches 35x of the initial Ascend migration baseline (DeepSeek's claim, comparison baseline is the early Ascend port version, not H100 or other model comparisons)

DeepSeek's Distinctiveness:

DeepSeek doesn't self-develop chips, but lets models perfectly adapt to domestic chips. This route is smarter than self-developing chips:

  • The model company directly controls the full training-inference stack
  • Holds the choice of "which operators to ASIC hardcode, which to use general compute"
  • Collaborates with Huawei Ascend, Cambricon, Hygon, Moore Threads, MetaX, etc.

DeepSeek V4 on Huawei Ascend key data:

  • Through PTX operator layer rewrite + communication layer optimization
  • API pricing is only 1/10 of OpenAI

Valuation:

  • June 2026: completed first external financing round of 50 billion yuan (~$7.4 billion)
  • Post-money valuation over $50 billion (soared from $10 billion to $50 billion in two months)
  • National Big Fund led investment, Tencent, Alibaba, CATL, JD, NetEase followed
  • Founder Liang Wenfeng personally invested 20 billion yuan

4. Zhipu AI: Deep Adaptation with Domestic Chip Clusters

Latest action: GLM-5 + Full Domestic Chip Adaptation (Feb 2026)

Domestic Chip Adapted Performance
Huawei Ascend First implementation of W4A8 mixed-precision quantization, single Atlas 800TA3 machine matches H100 dual-card
Moore Threads -
Cambricon -
Kunlunxin -
MetaX -
Enflame -
Hygon -

Zhipu's Special Contribution:

  • First implementation of W4A8 quantization (weights 4-bit, activations 8-bit) on Ascend.this is an industry breakthrough
  • In long-sequence, low-latency scenarios, deployment cost reduced 50%
  • Zhipu itself doesn't self-develop chips, but as the "model layer," it promotes "model-chip" co-design

5. Overall Domestic Chip Landscape

Domestic AI chip shipments (IDC 2025 stats, includes NVIDIA China shipments + Huawei/Alibaba partial self-use estimates):

  • China AI accelerator annual shipments: 4 million cards
  • Domestic combined: 1.65 million cards, market share 41%
  • Morgan Stanley predicts 2030 domestic share 76%
  • Note: 1.65M domestic includes Huawei self-use Ascend + Alibaba self-use Zhenwu; excluding self-use, pure commercial shipments are ~1.2M

9 Domestic AI Chips Receiving National Security Certification Level I (May 26, 2026):

  • Huawei HiSilicon Ascend
  • Hygon DCU
  • Biren Technology
  • Vimicro Technology (Starlight Smart 5)
  • Pingtouge (Zhenwu series)
  • Enflame Technology
  • Moore Threads
  • MetaX
  • Kunlunxin

6. ByteDance vs Alibaba vs DeepSeek vs Zhipu: Path Comparison

China's four AI chip vendor paths: self-dev / procurement / model self-adapt / ecosystem adapt
China's four AI chip vendor paths: self-dev / procurement / model self-adapt / ecosystem adapt
Dimension ByteDance Alibaba DeepSeek Zhipu
Self-developed chip ❌ (rumors only) ✅ Zhenwu series (2 generations)
Main strategy Procurement + integration Full-stack self-development Model adapts to domestic Model adapts to domestic
Process 7nm domestic Est. 5nm Domestic Domestic
Train-inference integrated ✅ (buy GPU) ✅ Zhenwu ✅ Ascend ✅ Ascend etc.
Direct OpenAI equivalent Not directly Full-stack path similar Model path similar Model path similar
OpenAI Jalapeño equivalent degree Low Medium (path similar, process behind) High (path complementary) High (path complementary)
Current compute scale (different calibers) 50K+ this round 560K cumulative shipments Est. 10K cards Est. 1K-10K cards

7. Three Routes' Methodological Differences

Abstracting from the four domestic strategies, three completely different methodologies emerge:

Route A: Self-Developed Chips (Alibaba Pingtouge)

  • Closest to OpenAI's path
  • Advantages: Long-term cost control, technology autonomy
  • Disadvantages: Long R&D cycle, one generation behind at start, ecosystem building difficult

Route B: Models Adapt to Domestic Chips (DeepSeek, Zhipu)

  • Don't touch hardware, let models adapt actively
  • Advantages: Short path, high flexibility, not locked to single supplier
  • Disadvantages: Depends on domestic chip performance ceiling, training side still needs breakthrough

Route C: Large-Scale Procurement + Limited Self-Development (ByteDance)

  • Treat compute as commodity procurement
  • Advantages: Rapid scaling, doesn't disperse energy
  • Disadvantages: Weak bargaining power, long-term cost uncontrollable

Key Judgments:

  1. Alibaba Pingtouge is the only domestic player comparable to OpenAI's full-stack self-development path.but still one process generation behind
  2. DeepSeek + Zhipu represent the "model layer descent" path.more realistic than hard chip-making, rapidly forming "domestic model × domestic chip" ecological closed loop
  3. ByteDance follows the "NVIDIA path".large-scale procurement + limited self-developed special-purpose chips
  4. Core contradiction of the domestic battlefield: Training side still depends on NVIDIA H100/H200, inference side is being replaced by domestic chips
  5. OpenAI Jalapeño's "extreme vertical integration of software and hardware" play.only Alibaba Pingtouge is attempting domestically, but shipment scale, customer diversity, and technology maturity are still an order of magnitude behind

10. Conclusion: The Next Form of an AI Company Is an "AI Full-Stack Company"

Jalapeño's Coordinates in Industry History

Looking at Jalapeño in historical context, the question "do AI companies need self-developed chips?" has traced a clear curve over the past three years:

  • 2023: OpenAI, like most AI labs, only cared about models. Compute was a procurement target.
  • 2024: Compute prices and supply began to bother model companies. Anthropic and Meta internally began discussing the feasibility of self-developed chips.
  • 2025: Google began selling TPU externally, turning "self-developed chips" into a commercial product.
  • June 2026: OpenAI closed this loop with Jalapeño. The first AI lab with a valuation over $100B took the field personally.

This is not an endpoint. It is one coordinate point on the timeline.

Five Hard Metrics for Jalapeño's Success

To assess whether an inference ASIC truly "works," five observable hard metrics over the next 18 months:

  1. First deployment timing: Does OpenAI actually launch by end of 2026, and can scale reach GW level
  2. Energy efficiency delivery: Is Hock Tan's "50% inference cost reduction" verified in 2027 financial reports
  3. Software ecosystem maturity: Is native PyTorch / Triton support for Jalapeño demo-level only, or has it entered daily training pipelines
  4. HBM and capacity: Can HBM4 and TSMC N3 capacity stably supply OpenAI's 10GW roadmap in 2027-2028
  5. Second-generation iteration pace: Can OpenAI's promised "one generation per year" be delivered.if no second-generation by 2028, the self-developed chip project becomes a capital arbitrage rather than technological evolution

Two Observations on Industry Evolution

Observation One: Self-developed chips have shifted from "differentiated advantage" to "infrastructure necessity." Google TPU's 10-year accumulation, AWS Trainium's 3 generations, Microsoft Maia, Meta MTIA, now OpenAI Jalapeño.the major AI companies have all entered. Those still on the sidelines (like Anthropic) will be constrained by supply security and cost structure.

Observation Two: Model-chip coupling will go deeper. Jalapeño, as a "designed only for GPT series" vertical ASIC, essentially bets that OpenAI's model architecture will remain in the Transformer + MoE paradigm. If Transformer is replaced by a new architecture (e.g., Mamba / SSM class), all of Jalapeño's optimizations become obsolete instantly. This is OpenAI's self-developed path's biggest hidden bet.a bet on their own model architecture stability.

Indirect Significance for the Chinese AI Industry

Jalapeño is a stress test case. It proves "AI companies can build chips" is engineering-feasible, but at a high cost for small companies: 40-person core team + Broadcom's hundreds-strong team + hundreds of millions in tape-out costs.

China's most realistic response is not to imitate this path (except Alibaba Pingtouge), but the opposite: let domestic chips adapt to mainstream models, rather than let models adapt to a single chip. DeepSeek V4 + Ascend, Zhipu GLM-5 + domestic chips are walking this path. This is China's realistic answer to export controls.

Summary

OpenAI made Jalapeño in 9 months. Treating it as a defining moment when an AI company challenges NVIDIA is an overreach; treating it as the latest coordinate point in the "model + self-developed chip" full-stack paradigm is closer to the truth.

The real value of this Mexican pepper Jalapeño is not in its compute numbers, but in confirming a trend: the competitive unit of the AI industry is upgrading from a single model to the entire stack of model + chip + data center + interconnect + scheduling. Once this upgrade completes, the reshaping of industry structure will not be subject to any single company's will.

Appendix A: Key Timeline

  • End of 2023: Richard Ho joins OpenAI, forms chip team
  • Mid-2024: OpenAI and Broadcom begin secret joint development
  • October 2025: Partnership publicly disclosed, 10GW deployment target
  • June 24, 2026: Jalapeño officially released
  • End of 2026: First deployment online
  • 2027: Scaled deployment
  • First half of 2028: Full deployment, 1-3GW single cluster
  • 2028: Next-generation Jalapeño released
  • 2029: 10GW compute cluster completed

Appendix B: Glossary

  • ASIC (Application-Specific Integrated Circuit): A chip customized for a specific use case
  • HBM (High Bandwidth Memory): 3D-stacked DRAM
  • MoE (Mixture of Experts): A model architecture with large total parameters but only a fraction activated per token
  • PFLOPS / EFLOPS: PetaFLOPS / ExaFLOPS (10^15 / 10^18 floating-point operations per second)
  • Systolic Array: Data flows regularly through processing elements like blood
  • TDP (Thermal Design Power): Determines cooling requirements
  • Tape-out: Submitting chip layout data to the foundry for manufacturing
  • W4A8: Mixed precision with 4-bit weights and 8-bit activations

Appendix C: References

(See the actual published version for full references)