Inside Jalapeño: What Happens When an AI Company Builds Its Own Heart

On June 24, 2026, OpenAI and Broadcom jointly released Jalapeño, OpenAI's first in-house AI inference chip. From architecture to tape-out in just 9 months, with inference costs projected to drop ~50% and performance matching NVIDIA Blackwell and Google TPU. First deployment by end of 2026, scaling to 10GW of compute capacity by 2029. This is not just a chip. It is the moment OpenAI pivots from "AI company" to "AI full-stack company", and the moment the AI industry's supply chain logic starts to be rewritten.

Introduction: Three and a Half Years from ChatGPT to Jalapeño

Rewind to November 30, 2022—the day ChatGPT launched. Back then, OpenAI was still a "model company." Its compute lived in NVIDIA GPU clusters inside Microsoft Azure, and the trained models were exposed to the world through the OpenAI API. Nobody imagined that the same company would, three and a half years later, stand in a Broadcom office in Silicon Valley and receive a wafer it designed itself from Hock Tan's hands.

But the story of Jalapeño isn't "an AI company suddenly gained chip-making ability." It's the final landing of a series of decisions:

End of 2023: Richard Ho (a core engineer from Google's Cloud TPU team) joined OpenAI and began building a chip team
Mid-2024: OpenAI and Broadcom started secret joint development (the "pre-phase" of the 18-month cycle)
October 2025: The partnership was publicly disclosed—tens of billions of dollars, 10GW deployment target
June 24, 2026: Jalapeño officially released; engineering samples already running GPT-5.3-Codex-Spark

The chip's name, Jalapeño (Mexican pepper), continues Google's TPU tradition of naming chips after snacks (Boardwalk, Bristlecone, Trillium), with OpenAI's signature "seemingly casual, actually precise" branding. The naming style makes it impossible to take as seriously as "B100" or "MI300X," but its strategic significance far exceeds what the name conveys.

This article will dissect Jalapeño's:

Designers . Google TPU soul + 40-person minimalist team + Broadcom's ASIC engineering power
Manufacturing . TSMC 3nm + Broadcom silicon implementation + Celestica system integration
Positioning . Why "inference-only, not training" is the only commercially correct choice
Architecture and Specifications . Systolic array, HBM, Arm CPU, Tomahawk interconnect
Process & Compute. vs Google TPU v7/v8 and Blackwell compute, efficiency, HBM
Physical Analysis. 3nm, Die Size, transistor count, packaging, thermal, memory
Software Adaptation. Vertical integration advantages and ecosystem lock-in costs
Horizontal Comparison. vs NVIDIA / Google TPU / AWS / Microsoft
China Vendor Panorama. Alibaba, ByteDance, DeepSeek, Zhipu
Conclusion. The next form of an AI company

1. Designers: The Soul Out of Google TPU

Jalapeño full-stack collaboration chain: OpenAI / Broadcom / TSMC / Celestica

The Core Engineer: Richard Ho

The soul of Jalapeño is OpenAI's hardware project lead, Richard Ho. He previously spent nearly nine years at Google, where he was a core architect on the Cloud TPU project, leading multiple TPU generations from concept to mass production.

Richard Ho's key background:

2014–2023: Google TPU team, designer of systolic array architecture, memory hierarchy, on-chip network, and other critical modules
Witnessed the full evolution from TPU v1 (2016) to TPU v7 Ironwood (2025)
Has the most direct engineering understanding of why the ASIC route works, why Tomahawk-class interconnect is necessary, why HBM selection is so critical

End of 2023, Sam Altman began aggressively recruiting from Google's TPU team. Richard Ho joined OpenAI during this period and led the formation of the Jalapeño team.

The signal from this personnel flow is extremely clear: OpenAI knew the ASIC route was viable, but needed people who had already learned all the lessons from TPU.not explorers starting from scratch. Richard Ho knows which pitfalls to avoid.

The OpenAI Chip Team: 40 People, Minimalist Architecture

At the time of release, OpenAI's chip team numbered approximately 40 people. To put this in context:

Team	Size	Notes
OpenAI Jalapeño team	~40 people	Architecture design
Google TPU team	Thousands	Includes compiler, software, verification
NVIDIA GPU design team	Tens of thousands	Spans multiple product lines
Apple Silicon team	Thousands	Includes SoC, GPU, Neural Engine

Why can 40 people build a chip?

Because OpenAI's 40 people do only one thing—architecture definition and algorithm-hardware co-design. All RTL-to-GDS (from synthesizable code to physical layout) silicon implementation is handled by Broadcom's hundreds-strong team. This "OpenAI defines 'what to do', Broadcom implements 'how to do it'" division of labor is the most efficient chip-making model in the current AI chip industry.

Broadcom: The Real Silicon Implementation Power

Broadcom is the true giant of ASIC design services:

Existing clients: Google, Amazon, Meta, ByteDance, Anthropic
AI ASIC design capability: From Tomahawk networking chips to TPU physical implementation.all Broadcom
October 2025: Strategic partnership signed with OpenAI, 10GW deployment target
CEO Hock Tan personally oversaw: At the launch event, he personally handed the first wafers to Sam Altman

Broadcom's role is not just "foundry" but an extension of OpenAI's chip team. Everything from RTL, verification, synthesis, place-and-route, timing sign-off, to Tomahawk network chip integration is handled by Broadcom.

Celestica

The Canadian electronics manufacturing services provider handles board, rack, and server system industrialization. Celestica's role is underestimated—going from a chip to a deployable server requires power management, thermal design, PCIe interfaces, chassis structure, and a lot more engineering. Celestica is the critical bridge from "successful tape-out" to "data center ready" for OpenAI.

2. Manufacturing: TSMC 3nm + Broadcom Silicon Implementation

Jalapeño's manufacturing division is a typical AI-era ASIC collaboration network:

Link	Owner	Notes
Architecture design	OpenAI	Blank-slate design, built from zero around LLM inference
Silicon implementation (RTL-to-GDS)	Broadcom	Physical design, timing sign-off, DFT
Wafer foundry	TSMC	3nm (N3) advanced process
Network interconnect	Broadcom	Tomahawk networking chips (rack/cluster interconnect)
Board/rack	Celestica	Industrialization from chip to server system
HBM supply	SK Hynix / Samsung	High-bandwidth memory (presumed HBM3E or HBM4)

What TSMC 3nm Really Means

The industry needs to be honest about what "3nm" means. TSMC N3's actual physical gate length no longer shrinks; "3nm" is mostly a marketing name. But there are some objectively comparable numbers:

TSMC N3 vs N5: logic density +60-70%, same-frequency power -25-30%
N3 capacity at TSMC is extremely tight.Apple, AMD, NVIDIA, MediaTek, and Broadcom clients are all fighting for it
TSMC has notified 5-10% price increases for 3-5nm customers

For OpenAI, getting a tape-out slot on N3 is itself a hidden resource brought by Broadcom's long-term customer relationships.

9 Months to Tape-Out: "The Fastest ASIC Development Cycle in History" (OpenAI's Claim)

Jalapeño took only 9 months from architecture design to tape-out. OpenAI claims this is "the fastest ASIC development cycle in the history of high-performance advanced semiconductors." Comparisons:

Comparable	Development Cycle
OpenAI Jalapeño	9 months (architecture to tape-out)
Google TPU v1	~15 months (start to deployment)
Google TPU v2-v4	12-18 months/generation
AWS Trainium v1 → v2	~24 months
Apple M-series	24-36 months
Traditional chip company comparable projects	24-36 months

⚠️ Analysts on Zhihu point out that "9 months" may be counted from some intermediate node. OpenAI and Broadcom's joint development actually started in mid-2024 (described by media at the time as "18 months of secret R&D"), and actual architecture exploration may have begun earlier. Compressing "18 months" into "9 months" is a PR choice, but even so, it's extremely fast.

The Real Substance Behind "9 Months"

Breaking down the time savings:

Time-saving Source	Estimated Contribution
18 months of pre-architecture exploration and Broadcom collaboration groundwork	3-4 months saved (counted into "formal R&D")
AI-accelerated PPA optimization (DSO.ai / Cerebrus-class tools)	2-3 months saved
Broadcom IP reuse (Tomahawk, SerDes, HBM PHY)	2-3 months saved
AI-accelerated verification, thermal simulation, sign-off	1-2 months saved
Traditional circuit design and tape-out prep	2-3 months (still required)

OpenAI didn't perform a miracle.they used every available tool in the industry to the extreme, plus Broadcom's ASIC engineering reuse, to compress the timeline to 9 months.

3. Positioning: Inference ASIC, Not Training Chip

"Intelligence Processor"—OpenAI's Name for Its Own Chip

This is not marketing language; it defines the essential difference from a GPU.

Jalapeño does not do training, only inference.

Dimension	Training	Inference
Purpose	Make the model learn	Make the model answer questions
Characteristics	One-time, high precision, crazy compute	Continuous operation, latency-sensitive, cost-king
Cost nature	Capital expenditure (CAPEX)	Operating expenditure (OPEX)
OpenAI 2025 H1 inference spend	.	$5.02 billion
Industry estimate of inference share	.	Inference ~60-70% of total AI operating cost (industry-standard estimate)

Why 2026 Is the Right Time

Launching an inference chip in 2023 would have been too risky.the model architecture was iterating rapidly (from pure Transformer to MoE to hybrid architectures), and an ASIC could be obsolete by the time it shipped. By 2026, Transformer + MoE has become the dominant paradigm, architecture has stabilized, and the commercial risk of custom ASICs has dropped to an acceptable range.

Three strategic considerations:

Technology maturity window: Model architecture is stable; the ASIC won't fall behind
Commercial pressure tipping point: Inference costs crossed a critical threshold in 2025; a cost-reduction solution is needed
Supply chain diversification "floor": Break free from single dependence on NVIDIA

Business Logic: Breaking NVIDIA Dependence

OpenAI compute infrastructure history:

2023–2024: ~20,000 H100 (GPT-4 training + inference)
2025: H200 + Grace Hopper (GPT-5 training)
2026: B200/GB200 (multimodal model training)

Cost structure: In 2025, OpenAI's total spending was $34 billion, revenue $13 billion, net loss $20.9 billion. H1 inference spend was $5.02 billion (annualized ~$10B), accounting for ~30% of total spending (OpenAI's own figures). The industry-standard estimate is that inference is 60-70% of total AI operating cost.the gap between the two calibers is whether "AI operating cost" includes training or not. CFO Sarah Friar has warned: if revenue growth doesn't keep up with data center contract expansion, compute costs could soar to $85 billion by 2028.

Jalapeño targets directly the most frequent and most expensive segment.inference. If Hock Tan's "50% cost reduction" claim holds, it means:

Per-token cost halved → ChatGPT free user service capacity doubles
Same budget serves more users → directly affects OpenAI's break-even point
Same conditions free up budget for training (still buying NVIDIA GPUs) → overall compute expansion accelerates

4. Architecture and Specifications: Systolic Array, HBM, Arm CPU, Tomahawk Interconnect

Core Architecture: Systolic Array

The Systolic Array is the AI accelerator architecture pioneered by Google's TPU v1 (2016). The name comes from cardiac systole—data flows regularly between processing elements (PEs) like blood, with each PE repeatedly doing the same thing (typically multiply-accumulate).

OpenAI's choice of systolic array is doubling down on a paradigm already validated by Google, not adventurously experimenting.

Benefits of Systolic Array for Dense Model Inference

Systolic array vs GPU data flow: 70-90% vs 30-40% utilization

Dense models (like Llama-3-70B style) have all parameters participating in computation during every inference.

Data Movement Minimization

Traditional GPU: each CUDA core requires independent register read/write
Systolic array: weights flow in from the left, "passing" through the array without "re-reading," only interacting with HBM at the boundary
Less data movement = direct power savings = improved performance per watt

Extreme Compute Density

A 128×128 systolic array = 16,384 MAC units, all working simultaneously
Doesn't need very high frequency, but completes ~3.3 trillion INT8 ops/sec
Perfect for the Transformer "matrix multiply → matrix multiply → matrix multiply" pattern

Regular Data Flow

Batch size and sequence length are relatively stable during inference
Doesn't need the GPU's flexibility to "run any computation pattern"
Hardware utilization can stay stable at 70-90% (far above GPU's 30-40%)

Systolic Array for MoE: Matches and Conflicts

MoE inference on systolic array: routing → sub-array allocation → matmul → KV share

OpenAI's GPT-4 / GPT-5 / GPT-5.3 are all based on the MoE architecture. MoE has two key features:

Sparse activation: 1.8 trillion total parameters, but only 2% (~36 billion) activated per token
Expert routing: Router dynamically selects Top-K experts for each token (typical K=2~8)

Matches

Local Matrix Multiplication Still Dominates Compute

Even with MoE, the FFN computation inside activated experts is still huge matrix multiplication
DeepSeek-R1's activated expert FFN: ~50B params × batch × seq_len
This part is fully suited to systolic array's high-density matrix multiplication

Intra-Expert Computation Is Regular

Once the Router decides which experts to activate, the subsequent matrix multiply path is determined
Software/hardware can "pre-warm" the corresponding systolic array regions in advance
This predictability + local regularity is the sweet spot for systolic arrays

KV Cache Reuse Friendly

In MoE inference, the attention layer's KV Cache is shared across all experts
The systolic array data flow model is well-suited to "weight resident, KV data flow" attention computation

Conflicts

Dynamic Expert Routing

Different tokens go to different experts → array utilization may fluctuate wildly
If K=2, perhaps only 25% of the array is busy
Response: Hardware reserves "expert broadcast bus" so array sections can load different expert weights; or token sorting for batch scheduling

Increased Communication Overhead

MoE's "Expert Parallelism" inherently requires cross-card communication
After routing, the activated experts may be on different chips
This is why OpenAI must use Broadcom Tomahawk (high-speed interconnect) for large-scale clusters

Load Imbalance

Router probability distribution may cause certain experts to be selected more often
Requires "hard truncation + re-routing" mechanisms
Systolic arrays are sensitive to this imbalance because the PE array is physically fixed

Inferred Jalapeño Response Strategy

Although no ISA documentation is public, from OpenAI's emphasis on "designed from zero for LLM inference" we can infer:

Array scale may not be a single oversized array, but multiple medium-scale systolic arrays (e.g., 8-16 sub-arrays of 64×64)
Can flexibly assign experts to different sub-arrays based on routing results
Tomahawk network chips support large-scale expert parallelism, extending "hardware sparsity" to "cluster sparsity"

Confirmed Technical Details

Item	Estimated/Confirmed
Process	TSMC N3
Core architecture	Systolic Array
Memory	HBM3E or HBM4 (estimated 80-144GB)
Task scheduling CPU	Arm custom design
Network interconnect	Broadcom Tomahawk series switches
Theoretical compute (est.)	~10-13 PFLOPS (INT8/FP8)
Per-token energy	~30% lower than Blackwell
Inference cost reduction	~50% (Hock Tan)

Four Optimization Dimensions (Richard Ho)

OpenAI's hardware lead emphasized Jalapeño has comprehensive optimization across four dimensions:

Kernel . Hardware-level hardcoding for LLM core operators (matrix multiply, attention)
Memory Movement . Reducing data shuttling between HBM and compute units
Network . Tomahawk enables low-latency interconnect for large-scale clusters
Service Model . Co-optimization for online inference batching, KV Cache residency, etc.

These four optimizations let the chip's real-world utilization approach theoretical peak, unlike GPUs that have "high peak, low utilization (30-40%)."

5. Process and Compute Estimation: vs Google TPU v7/v8

Process Estimation Basics

TSMC N3 vs N5: logic density +60-70%, same-frequency power -25-30%. This is an objective benchmark, applicable to all N3 customers.

Google TPU Known Key Data

Chip	Release	Process	FP8 Compute/Chip	HBM	Cluster Scale	Notes
TPU v6 Trillium	2024.5	5nm	~1 PFLOPS	32GB	256 cards	4.7x v5e performance
TPU v7 Ironwood	2025.11	TSMC 3nm	4.6 PFLOPS	192GB HBM3E	9,216 cards (42.5 EFLOPS)	100% liquid cooling, ~980W
TPU 8t "Sunfish"	Released 2026.4	Est. 3nm	Est. ~12 PFLOPS	Est. 192GB+	Same	Training-only, Broadcom design
TPU 8i "Zebrafish"	Released 2026.4	Est. 3nm	Est. 3-4 PFLOPS	384MB SRAM×3	Same	Inference-only, MediaTek design

Jalapeño Compute Estimation

OpenAI didn't disclose die size, HBM count, or transistor count. But there are public clues:

Process: TSMC N3 (same generation as TPU v7 Ironwood)
Network: Broadcom Tomahawk
Hock Tan claims "performance comparable to Blackwell"

Using TPU v7 Ironwood's 4.6 PFLOPS as baseline.NVIDIA Blackwell B200 in FP8 is 20 PFLOPS, but that's a dual-die, 208 billion transistors, 1600mm² die size, 1200W TDP monster.

Jalapeño's compute estimate must be done in two steps.

(a) FP8 peak compute estimate (direct Blackwell comparison):

Reason 1: ASIC architecture is more efficient than GPU; same transistor budget yields more equivalent compute
Reason 2: Hock Tan says "comparable to Blackwell".this refers to effective throughput (see b), not direct FP8 peak comparison
Reason 3: Lei Tech's "10 PFLOPS" speculation falls in a reasonable range
Reason 4: With FP4/FP6 quantization, could reach 20+ PFLOPS (approaching Blackwell Ultra's FP4 15 PFLOPS)
FP8 peak compute estimate: 8-13 PFLOPS

(b) Effective inference throughput comparison (what Hock Tan actually said):

ASIC achieves 70-90% utilization on LLM inference workloads; GPUs typically 30-40%
For the same "FP8 peak compute", ASIC's effective throughput can be 2x that of a GPU
Hock Tan's "comparable to Blackwell" refers to this effective throughput, not peak FP8 equivalence
Note: Hock Tan's original quote's specific context is not cited in public reporting.this interpretation is my reasonable inference

Direct Compute Comparison (Estimated)

Performance vs efficiency per watt scatter: Jalapeño sits in inference efficiency sweet spot

Chip	Process	FP8 Compute	Memory	Single-Card Power	Interconnect
OpenAI Jalapeño (est.)	TSMC 3nm	~10-13 PFLOPS	80-144GB HBM3E/4	300-500W (est.)	Broadcom Tomahawk
Google TPU v7 Ironwood	TSMC 3nm	4.6 PFLOPS	192GB HBM3E	~980W	OCS Jupiter 32Tbps
Google TPU 8i (inference)	Est. 3nm	Est. 3-4 PFLOPS	1.15GB on-chip SRAM (main HBM undisclosed)	Est. 600W	Same
Google TPU 8t (training)	Est. 3nm	Est. 12 PFLOPS	Est. 192GB+ HBM3E	Est. 980W	Same
NVIDIA B200	TSMC 4NP	20 PFLOPS	192GB HBM3E	1200W	NVLink 5.0 1.8TB/s
NVIDIA GB300	TSMC 4NP	~7-8 PFLOPS (FP8 est.) / 15 PFLOPS FP4	288GB HBM3E	~1400W	NVLink 5.0

Key Observations

Jalapeño Single-Card Compute May Exceed TPU v7 Ironwood If the estimate is correct (10-13 PFLOPS FP8), Jalapeño's single-card compute is 2-3x Ironwood's. But ASIC vs ASIC needs a discount—TPU also has to accommodate VPU (vector processing unit) and other general-purpose parts, with die area not entirely systolic array.
HBM Is Jalapeño's Potential Weakness

TPU Ironwood has 192GB HBM3E, 7.4 TB/s bandwidth. If Jalapeño has 80-144GB, 3-4 TB/s bandwidth:

Large model inference (70B-200B parameters) requires HBM to hold weights + KV Cache
144GB just fits 70B FP16 weights (140GB), with little KV space
This means Jalapeño must combine with model sharding or quantization to serve large models
Speculation: Jalapeño serves GPT-5.3-style mixed-precision + MoE expert-sharded models, with single cards holding only partial experts

Single-Card Power Is Jalapeño's Clear Advantage

ASIC energy efficiency is 2-3x GPU.this is industry consensus
300-500W TDP allows simpler cooling (air cooling or light liquid cooling)
This is likely the main source of Jalapeño's 50% inference cost reduction.not just cheaper chips, but also electricity savings

Cluster Scalability Is Jalapeño's Open Question

TPU Ironwood clusters support 9,216 cards, 42.5 EFLOPS aggregate compute. How far can Jalapeño's Tomahawk network scale? OpenAI hasn't disclosed, but the 10GW data center target implies needing millions of cards.

Performance Per Watt: The Real Battle

Dimension	OpenAI Jalapeño	Google TPU v7 Ironwood	NVIDIA B200
Single-card FP8 compute	~10-13 PFLOPS	4.6 PFLOPS	20 PFLOPS
Single-card power	300-500W	980W	1200W
FP8 compute per watt (TFLOPS/W)	~29	~4.7	~17
Inference cost (relative)	50%	44% TCO advantage	Baseline

If Jalapeño truly achieves ~29 TFLOPS/W (theoretical range 20-43 TFLOPS/W, depending on actual power and compute), it will be the most energy-efficient AI accelerator in history. But this is single-card paper numbers; real "watts per useful throughput" in actual inference workloads will be discounted.

6. Physical Chip Analysis: 3nm, Die Size, Transistor Count

Die Size and Transistor Count

Although OpenAI hasn't released die photos or detailed physical dimensions, combining known parameters we can infer:

Process: TSMC N3 (same generation as Apple M4)
Transistor count: Not disclosed. Considering 3nm density and small team design capability, estimated at 50-150 billion transistors (NVIDIA Blackwell B200 is ~208 billion, dual-die)
Die size estimate: ~600-900mm² (single die, much smaller than B200's 1600mm²)
Packaging: Speculated to use CoWoS-S or CoWoS-L advanced packaging, with HBM and compute die connected through silicon interposer

Thermal Design

Combined with 3nm process energy efficiency and ASIC architecture's low power characteristics:

Estimated TDP: 300-500W (far below Blackwell's 1200W)
Cooling solution: Possibly air cooling + partial liquid cooling, or full liquid cooling
This means OpenAI's data center cooling costs may be only 1/2-1/3 of Blackwell clusters

HBM and Memory Hierarchy

HBM3E (estimated 6-8 stacks)
Total capacity estimate: 80-144GB
Bandwidth estimate: 3-4 TB/s

Critical Physical Bottleneck: HBM accounts for 30-40% of AI ASIC total cost. Hock Tan has publicly stated that HBM squeezes Broadcom's custom AI chip margins below those of other product lines like network switches.

"AI Designs AI Chips" . The Real Substance

OpenAI's public narrative is "GPT models participated in chip design".but this is PR framing that needs to be split into two levels:

✅ Reality: AI participated in logic circuit routing optimization, thermal performance management, power prediction and other EDA steps. These tasks traditionally require weeks of engineering team iteration; AI can complete them in hours
❌ Misleading framing: This doesn't mean "AI designed the chip." The most critical front-end architecture definition (systolic array choice, memory hierarchy design, ISA definition) is still done by Richard Ho's team

Industry Progress on AI-Assisted Chip Design

AI-assisted chip design 9-month tape-out flow: human + AI division

This part is very clear by 2026.AI-assisted chip design is not OpenAI's invention but a paradigm shift the entire EDA industry has been undergoing for the past two years:

Synopsys DSO.ai (Design Space Optimization AI)

Released 2020, industry's earliest AI-assisted EDA tool
Uses reinforcement learning to search PPA optimum in chip synthesis and place-and-route
Has helped Samsung, Renesas, NVIDIA complete hundreds of tape-outs
Measured: Compared to manual work, compresses engineering time from weeks to hours, achieves 10-20% additional optimization in power, performance, area

Cadence Cerebrus

Released 2022, competing with DSO.ai
Automatically parallel-runs Innovus physical implementation flow, AI adjusts parameters
Key data: As of June 2026, adopted by over 2,000 chip tape-outs
- Compute efficiency +~4x
- Turnaround time -~2x
- PPA improvement +~15%

Cadence ChipStack (Released 2026, industry's first L5 autonomous chip design AI Agent)

Released June 2026, world's first "virtual IC design engineer"
EDA autonomy classification (analogous to autonomous driving L1-L5):
- L1-L3: AI as tool assistance (current state)
- L4: AI can understand goals, autonomously call tools
- L5: AI can independently complete RTL generation, verification planning, formal verification, debugging, convergence
Cadence's "Mental Model" addresses LLM hallucination problems in chip design

Siemens Celus + Cadence Allegro X AI

Focus on PCB-level design
AI automatically generates schematic drafts, place-and-route drafts

China EDA Progress

Shanghai Peifeng Tunan Semiconductor (headquarters moved to Zhangjiang in 2026)
Empyrean Technology, Primarius Technologies (listed companies) accelerating AI-native EDA

Conclusion: The "real substance" of OpenAI's 9-month tape-out is "3-4 months of pre-architecture exploration by the OpenAI + Broadcom team + 2-3 months of AI-accelerated PPA optimization + 2-3 months of Broadcom IP reuse + 1-2 months of AI-accelerated verification and thermal simulation".not "AI designed the chip from start to finish."

7. Software Adaptation: Vertical Integration Advantages and Ecosystem Lock-in Costs

Advantage: OpenAI Is Making a Chip for Itself

Jalapeño's biggest software advantage is no ecosystem migration problem.it doesn't need to sell externally, only adapt to OpenAI's own models:

Model adaptation range: Verified with GPT-5.3, Codex, Spark
Forward compatibility with model iterations: Richard Ho states "it will adapt well to all future versions of LLMs"
Inference framework: OpenAI's own inference stack (likely Triton-based open-source compiler or proprietary engine)
MRC protocol groundwork: In 2025-2026, OpenAI introduced MRC (Multi-Path Reliable Connection) protocol at the model layer, working with AMD, Broadcom, Intel, NVIDIA, Microsoft on multi-chip high-speed communication optimization

The Gap with NVIDIA's Ecosystem

NVIDIA's CUDA moat is not just the programming language, but:

500+ optimized libraries (cuBLAS, cuDNN, TensorRT, TensorRT-LLM)
Millions of developer ecosystem
Complete training → inference → deployment toolchain

In contrast, Jalapeño's software stack is a vertical stack customized for OpenAI.an advantage (extreme optimization) and a disadvantage (not general-purpose, cannot capture external ecosystem dividends).

The Real Bet on Software Adaptation

Jalapeño doesn't need global developers writing CUDA code, but it must solve:

Toolchain for migrating models from H100 to Jalapeño
Inference optimization (quantization, KV Cache reuse, continuous batching)
Integration with OpenAI's existing inference infrastructure
Cluster scheduling (Tomahawk interconnect low-latency routing)

This is Jalapeño's real risk.the hardware tape-outped on time, but can the software be polished before large-scale deployment by end of 2026? Historically, hardware delivery is easy; software ecosystem maturity takes 2-3 years.

8. Horizontal Comparison: AI ASIC Battlefield

Flagship ASIC Specifications Comparison

Dimension	OpenAI Jalapeño	Google TPU 8i	Google TPU 8t	AWS Trainium3	AWS Inferentia2	NVIDIA B200	Microsoft Maia 200
Positioning	Inference ASIC	Inference ASIC	Training ASIC	Training ASIC	Inference ASIC	General GPU	General inference
Process	TSMC 3nm	Est. 3nm	Est. 3nm	TSMC 3nm	5nm	TSMC 4NP	Est. 5nm
Compute (est./conf.)	~10-13 PFLOPS	~3-4 PFLOPS	~12 PFLOPS	~2.5 PFLOPS/card (source "2.52 EFLOPS" likely PFLOPS, per 144-chip cluster arithmetic)	Lower	20 PFLOPS	Lower
Memory	HBM3E/4 (80-144GB)	1.15GB on-chip SRAM (main HBM undisclosed)	Est. 192GB+ HBM3E	144GB HBM3E	Lower	192GB HBM3E	Higher
Interconnect	Tomahawk	OCS Jupiter 32Tbps	Same	Custom	Custom	NVLink 5.0 (~1.8TB/s)	Est. Ethernet
Power	300-500W	Est. 600W	Est. 980W	Higher	Lower	~1200W	Higher
Cost advantage	~50% vs GPU	60-70% vs GPU	40-50% vs GPU	~50% vs H100	~80% vs H100	Baseline	Est. 30%
Availability	OpenAI internal only	External sales	External sales	AWS cloud	AWS cloud	Industry-wide	Azure internal
Training capability	❌	❌	✅ Primary	✅ Primary	❌	✅	❌
Developer ecosystem	None (proprietary)	Open (JAX/XLA)	Open	Neuron SDK	Neuron SDK	CUDA ecosystem	Est. closed

Key Judgments

Google TPU: Most Mature ASIC Paradigm

Google has accumulated 10 years of ASIC experience since TPU v1 (2016). TPU 8i (inference) / 8t (training) correspond to OpenAI's current path and future direction.

Key differences:

Ecosystem openness: In 2026, Google announced external sales of TPU, partnering with Blackstone to establish a $25 billion AI cloud computing company
Software maturity: JAX + XLA compiler + TensorFlow deep coupling
Cluster capability: TPU v7 single Pod 4,096 cards, v7 Ironwood cluster can reach 9,216 cards, 42.5 EFLOPS
Power cost: TPU v7 single-chip 980W
Inference cost advantage: TCO 44% lower than GB200

AWS Trainium: Largest-Scale ASIC Deployment on Cloud

Trainium3 FP8 compute est. ~2.5 PFLOPS/card (source "2.52 EFLOPS" likely PFLOPS, per 144-chip cluster arithmetic), 144GB HBM3e
Trainium3 vs Trainium2: performance +30-40%, power -40%
AWS's biggest advantage: bundled with cloud services, developers don't buy chips, they rent
Disadvantage: High Neuron SDK migration cost; Cohere/Stability AI report Trainium1/2 underperforming H100 in some scenarios

Microsoft Maia 200: Most Underrated Competitor

Maia 200 released early 2026, two-plus years after Maia 100. Microsoft's special position:

Simultaneously OpenAI's largest investor
Azure also massively deploying NVIDIA GPUs
Dual identity: both chip player and OpenAI's compute supplier

NVIDIA Blackwell: Defensive Posture, but Ample Ammunition

MLPerf 6.0 training: all 7 categories #1
GB300 NVL72 vs GB200: training speed +1.6x
Single-chip power 1200W, next-gen Rubin estimated 2300W
Biggest moat: CUDA ecosystem + full-stack software (TensorRT-LLM, NeMo, Megatron)
8,192-GPU cluster has verified DeepSeek-V3 671B training

OpenAI Jalapeño's Unique Positioning

Jalapeño is the only vertical ASIC in the table that from day one serves only one model family (GPT) and one workload (LLM inference). This is its biggest advantage.extreme vertical.and its biggest limitation.no external sales, unable to amortize R&D costs.

9. Chinese Vendor Self-Developed Chip Panorama

China's AI chip battlefield in 2026 shows three parallel paths: "national team + internet giants + model companies."

1. Alibaba Pingtouge: The Only "Full-Stack Self-Development" Aligned with OpenAI's Path

Latest release: Zhenwu M890 (May 20, 2026, at Apsara Conference)

Spec	Zhenwu 810E (2024)	Zhenwu M890 (May 2026)	Zhenwu V900 (2027Q3 roadmap)
Process	Undisclosed	Est. 5nm	Est. 3nm
HBM	96GB HBM2e	144GB HBM	216GB
Inter-chip interconnect	700 GB/s	800 GB/s	1200 GB/s
Relative performance	1x (baseline)	3x	9x
Use case	Train-inference integrated	Train-inference integrated + Agent optimized	Train-inference integrated

Key Facts:

Manufacturing: SMIC
Cumulative shipments: 560,000 chips (as of May 2026)
Financial industry deployment exceeded 100,000 cards, covering 150+ institutions
Self-developed ICN Switch 1.0 interconnect chip, can build 64-128 card super-nodes
Bundled with Alibaba Xuantie RISC-V CPU, Yitian Arm server, Zhenyue SSD controller, Panmai SmartNIC
Roadmap: "One generation per year" iteration (same as OpenAI's plan)

Comparison with OpenAI's Path:

Dimension	OpenAI Jalapeño	Alibaba Pingtouge Zhenwu
Positioning	Pure inference	Train-inference integrated
External sales	❌ OpenAI internal only	✅ Alibaba Cloud + enterprise customers
Foundry	TSMC 3nm	SMIC (est. 5nm)
Ecosystem	Proprietary (self-use only)	Pingtouge + Alibaba Cloud
Process gap	3nm (leading)	Est. 5nm (one generation behind)

2. ByteDance: Buy-Buy-Buy + Some Self-Development

ByteDance's approach is not relying on self-development, but large-scale procurement of domestic chips:

The 50,000-chip procurement in June 2026:

Supplier	Quantity	Use Case	Process
T-Head Intelligence Zhikai MR-V100/MR-V100x	~32,000	Inference main force	7nm
Baidu Kunlunxin P800	15,000	Video understanding, recommendation	Est. 7nm
Hygon DCU K100ai	3,000	Edge moderation	-

Why isn't ByteDance buying H100/H200?

H20 backdoor incident (end of 2025) increased compliance risk
H100/H200 not in China's export control exemption range
Domestic inference chips are now "usable," and unit price is only 1/3 of H20 (Zhikai 20,000 yuan vs H20 ~60,000 yuan)

ByteDance's Compute Bill:

2026 AI infrastructure capex: 200-700 billion yuan (different accounting)
Doubao MAU 345-368 million
Daily Token calls 120 trillion
Daily compute cost tens of millions of yuan
ByteDance 2025 net profit shrank over 70% YoY

3. DeepSeek: Model + Domestic Chip Closed Loop (Not the OpenAI Path)

Latest milestone: DeepSeek V4 Pro (April 2026)

Total parameters: 1.6 trillion (MoE)
Actually activated: ~5-8%
World's first trillion-parameter large model with full-stack adaptation to Huawei Ascend chips
Inference speed reaches 35x of the initial Ascend migration baseline (DeepSeek's claim, comparison baseline is the early Ascend port version, not H100 or other model comparisons)

DeepSeek's Distinctiveness:

DeepSeek doesn't self-develop chips, but lets models perfectly adapt to domestic chips. This route is smarter than self-developing chips:

The model company directly controls the full training-inference stack
Holds the choice of "which operators to ASIC hardcode, which to use general compute"
Collaborates with Huawei Ascend, Cambricon, Hygon, Moore Threads, MetaX, etc.

DeepSeek V4 on Huawei Ascend key data:

Through PTX operator layer rewrite + communication layer optimization
API pricing is only 1/10 of OpenAI

Valuation:

June 2026: completed first external financing round of 50 billion yuan (~$7.4 billion)
Post-money valuation over $50 billion (soared from $10 billion to $50 billion in two months)
National Big Fund led investment, Tencent, Alibaba, CATL, JD, NetEase followed
Founder Liang Wenfeng personally invested 20 billion yuan

4. Zhipu AI: Deep Adaptation with Domestic Chip Clusters

Latest action: GLM-5 + Full Domestic Chip Adaptation (Feb 2026)

Domestic Chip	Adapted	Performance
Huawei Ascend	✅	First implementation of W4A8 mixed-precision quantization, single Atlas 800TA3 machine matches H100 dual-card
Moore Threads	✅	-
Cambricon	✅	-
Kunlunxin	✅	-
MetaX	✅	-
Enflame	✅	-
Hygon	✅	-

Zhipu's Special Contribution:

First implementation of W4A8 quantization (weights 4-bit, activations 8-bit) on Ascend.this is an industry breakthrough
In long-sequence, low-latency scenarios, deployment cost reduced 50%
Zhipu itself doesn't self-develop chips, but as the "model layer," it promotes "model-chip" co-design

5. Overall Domestic Chip Landscape

Domestic AI chip shipments (IDC 2025 stats, includes NVIDIA China shipments + Huawei/Alibaba partial self-use estimates):

China AI accelerator annual shipments: 4 million cards
Domestic combined: 1.65 million cards, market share 41%
Morgan Stanley predicts 2030 domestic share 76%
Note: 1.65M domestic includes Huawei self-use Ascend + Alibaba self-use Zhenwu; excluding self-use, pure commercial shipments are ~1.2M

9 Domestic AI Chips Receiving National Security Certification Level I (May 26, 2026):

Huawei HiSilicon Ascend
Hygon DCU
Biren Technology
Vimicro Technology (Starlight Smart 5)
Pingtouge (Zhenwu series)
Enflame Technology
Moore Threads
MetaX
Kunlunxin

6. ByteDance vs Alibaba vs DeepSeek vs Zhipu: Path Comparison

China's four AI chip vendor paths: self-dev / procurement / model self-adapt / ecosystem adapt

Dimension	ByteDance	Alibaba	DeepSeek	Zhipu
Self-developed chip	❌ (rumors only)	✅ Zhenwu series (2 generations)	❌	❌
Main strategy	Procurement + integration	Full-stack self-development	Model adapts to domestic	Model adapts to domestic
Process	7nm domestic	Est. 5nm	Domestic	Domestic
Train-inference integrated	✅ (buy GPU)	✅ Zhenwu	✅ Ascend	✅ Ascend etc.
Direct OpenAI equivalent	Not directly	Full-stack path similar	Model path similar	Model path similar
OpenAI Jalapeño equivalent degree	Low	Medium (path similar, process behind)	High (path complementary)	High (path complementary)
Current compute scale (different calibers)	50K+ this round	560K cumulative shipments	Est. 10K cards	Est. 1K-10K cards

7. Three Routes' Methodological Differences

Abstracting from the four domestic strategies, three completely different methodologies emerge:

Route A: Self-Developed Chips (Alibaba Pingtouge)

Closest to OpenAI's path
Advantages: Long-term cost control, technology autonomy
Disadvantages: Long R&D cycle, one generation behind at start, ecosystem building difficult

Route B: Models Adapt to Domestic Chips (DeepSeek, Zhipu)

Don't touch hardware, let models adapt actively
Advantages: Short path, high flexibility, not locked to single supplier
Disadvantages: Depends on domestic chip performance ceiling, training side still needs breakthrough

Route C: Large-Scale Procurement + Limited Self-Development (ByteDance)

Treat compute as commodity procurement
Advantages: Rapid scaling, doesn't disperse energy
Disadvantages: Weak bargaining power, long-term cost uncontrollable

Key Judgments:

Alibaba Pingtouge is the only domestic player comparable to OpenAI's full-stack self-development path.but still one process generation behind
DeepSeek + Zhipu represent the "model layer descent" path.more realistic than hard chip-making, rapidly forming "domestic model × domestic chip" ecological closed loop
ByteDance follows the "NVIDIA path".large-scale procurement + limited self-developed special-purpose chips
Core contradiction of the domestic battlefield: Training side still depends on NVIDIA H100/H200, inference side is being replaced by domestic chips
OpenAI Jalapeño's "extreme vertical integration of software and hardware" play.only Alibaba Pingtouge is attempting domestically, but shipment scale, customer diversity, and technology maturity are still an order of magnitude behind

10. Conclusion: The Next Form of an AI Company Is an "AI Full-Stack Company"

Jalapeño's Coordinates in Industry History

Looking at Jalapeño in historical context, the question "do AI companies need self-developed chips?" has traced a clear curve over the past three years:

2023: OpenAI, like most AI labs, only cared about models. Compute was a procurement target.
2024: Compute prices and supply began to bother model companies. Anthropic and Meta internally began discussing the feasibility of self-developed chips.
2025: Google began selling TPU externally, turning "self-developed chips" into a commercial product.
June 2026: OpenAI closed this loop with Jalapeño. The first AI lab with a valuation over $100B took the field personally.

This is not an endpoint. It is one coordinate point on the timeline.

Five Hard Metrics for Jalapeño's Success

To assess whether an inference ASIC truly "works," five observable hard metrics over the next 18 months:

First deployment timing: Does OpenAI actually launch by end of 2026, and can scale reach GW level
Energy efficiency delivery: Is Hock Tan's "50% inference cost reduction" verified in 2027 financial reports
Software ecosystem maturity: Is native PyTorch / Triton support for Jalapeño demo-level only, or has it entered daily training pipelines
HBM and capacity: Can HBM4 and TSMC N3 capacity stably supply OpenAI's 10GW roadmap in 2027-2028
Second-generation iteration pace: Can OpenAI's promised "one generation per year" be delivered.if no second-generation by 2028, the self-developed chip project becomes a capital arbitrage rather than technological evolution

Two Observations on Industry Evolution

Observation One: Self-developed chips have shifted from "differentiated advantage" to "infrastructure necessity." Google TPU's 10-year accumulation, AWS Trainium's 3 generations, Microsoft Maia, Meta MTIA, now OpenAI Jalapeño.the major AI companies have all entered. Those still on the sidelines (like Anthropic) will be constrained by supply security and cost structure.

Observation Two: Model-chip coupling will go deeper. Jalapeño, as a "designed only for GPT series" vertical ASIC, essentially bets that OpenAI's model architecture will remain in the Transformer + MoE paradigm. If Transformer is replaced by a new architecture (e.g., Mamba / SSM class), all of Jalapeño's optimizations become obsolete instantly. This is OpenAI's self-developed path's biggest hidden bet.a bet on their own model architecture stability.

Indirect Significance for the Chinese AI Industry

Jalapeño is a stress test case. It proves "AI companies can build chips" is engineering-feasible, but at a high cost for small companies: 40-person core team + Broadcom's hundreds-strong team + hundreds of millions in tape-out costs.

China's most realistic response is not to imitate this path (except Alibaba Pingtouge), but the opposite: let domestic chips adapt to mainstream models, rather than let models adapt to a single chip. DeepSeek V4 + Ascend, Zhipu GLM-5 + domestic chips are walking this path. This is China's realistic answer to export controls.

Summary

OpenAI made Jalapeño in 9 months. Treating it as a defining moment when an AI company challenges NVIDIA is an overreach; treating it as the latest coordinate point in the "model + self-developed chip" full-stack paradigm is closer to the truth.

The real value of this Mexican pepper Jalapeño is not in its compute numbers, but in confirming a trend: the competitive unit of the AI industry is upgrading from a single model to the entire stack of model + chip + data center + interconnect + scheduling. Once this upgrade completes, the reshaping of industry structure will not be subject to any single company's will.

Appendix A: Key Timeline

End of 2023: Richard Ho joins OpenAI, forms chip team
Mid-2024: OpenAI and Broadcom begin secret joint development
October 2025: Partnership publicly disclosed, 10GW deployment target
June 24, 2026: Jalapeño officially released
End of 2026: First deployment online
2027: Scaled deployment
First half of 2028: Full deployment, 1-3GW single cluster
2028: Next-generation Jalapeño released
2029: 10GW compute cluster completed

Appendix B: Glossary

ASIC (Application-Specific Integrated Circuit): A chip customized for a specific use case
HBM (High Bandwidth Memory): 3D-stacked DRAM
MoE (Mixture of Experts): A model architecture with large total parameters but only a fraction activated per token
PFLOPS / EFLOPS: PetaFLOPS / ExaFLOPS (10^15 / 10^18 floating-point operations per second)
Systolic Array: Data flows regularly through processing elements like blood
TDP (Thermal Design Power): Determines cooling requirements
Tape-out: Submitting chip layout data to the foundry for manufacturing
W4A8: Mixed precision with 4-bit weights and 8-bit activations

Appendix C: References

(See the actual published version for full references)