Rack-scale agentic AI supercomputer

Vera Rubin NVL72 for next-generation AI factories.

A technical guide to the NVIDIA Vera Rubin NVL72 concept: Rubin GPUs, Vera CPUs, NVLink-scale communication, accelerated networking, liquid-cooled racks, and infrastructure designed for frontier AI training and high-throughput inference.

Compute Rubin GPUs and Vera CPUs in a unified rack-scale design
Fabric NVLink, accelerated networking, and liquid-cooled density
72 Rubin GPUs in an NVL72-class rack-scale system
36 Vera CPUs supporting accelerated AI workloads
6th Generation NVLink fabric for scale-up performance
AI Training, inference, reasoning, and agentic workloads

Platform overview

Designed for the next phase of accelerated computing.

The Vera Rubin NVL72 reference is positioned around rack-scale AI infrastructure. It combines compute, memory, networking, cooling, and software into a platform for large models, reasoning systems, synthetic data, inference services, and agentic AI operations.

Rubin GPUs

High-density accelerated compute for frontier models

Built for massive training runs, high-throughput inference, multimodal models, and reasoning workloads.

Vera CPUs

Host processing aligned with GPU-scale performance

Coordinate data movement, system services, orchestration, and workload management inside the rack.

NVLink Fabric

Scale-up communication across the full rack

Fast GPU-to-GPU connectivity is central to keeping model parallel workloads efficient.

AI Factory

Infrastructure for continuous model production

Support data pipelines, training, fine-tuning, evaluation, inference, and agentic application services.

Rack-scale architecture

Compute, fabric, and cooling designed as one system.

Vera Rubin NVL72-style systems are not standalone servers arranged in a cabinet. The rack is the computer: accelerated compute, CPU support, scale-up interconnect, scale-out networking, power, and thermal design are planned together to keep utilization high.

Scale-Up NVLink-class connectivity links GPU resources into a dense, low-latency compute domain.
Scale-Out Networking links racks into larger AI factory clusters for training and inference fleets.
Thermals Liquid-cooled infrastructure enables the density required by next-generation accelerators.

Networking and system services

AI factories depend on the fabric around the rack.

The NVIDIA reference highlights technologies such as ConnectX-9, BlueField-4, and Spectrum-X as part of the broader Vera Rubin platform story. In practical terms, the network must carry model traffic, storage traffic, observability, security, and inference requests without becoming the bottleneck.

ConnectX-class adaptersHigh-bandwidth network interfaces support fast east-west traffic, storage paths, and cluster communication.
BlueField DPUsInfrastructure acceleration can offload networking, security, storage, and telemetry work from host CPUs.
Spectrum-X EthernetOptimized networking helps AI clusters handle congestion, latency, and predictable collective performance.
Storage pipelineTraining systems need fast data ingest, checkpoint movement, metadata handling, and durable artifact paths.
ObservabilityOperators need traces for GPU utilization, fabric health, thermals, failures, scheduling, and cost efficiency.
Security controlsAgentic AI infrastructure should separate tenants, protect data, and enforce trusted execution boundaries.

Performance themes

Built for training, inference, and reasoning at scale.

The Vera Rubin NVL72 positioning emphasizes efficiency across both model creation and model serving. The same AI factory may need to pretrain large models, fine-tune specialized systems, generate synthetic data, run evaluators, and serve agentic applications with strict latency targets.

  • Large-model training with high GPU utilization and scale-up bandwidth.
  • Inference fleets for language, vision, multimodal, retrieval, and reasoning services.
  • Agentic AI workloads that call tools, coordinate memory, evaluate plans, and execute workflows.

Deployment path

How teams turn rack-scale hardware into an AI factory.

  1. 01 Plan power and cooling

    Validate rack density, liquid cooling loops, redundancy, facility constraints, and service access.

  2. 02 Design the fabric

    Map scale-up domains, scale-out networks, storage lanes, management paths, and security zones.

  3. 03 Build the software layer

    Deploy schedulers, containers, drivers, telemetry, model pipelines, evaluation suites, and inference stacks.

  4. 04 Operate continuously

    Monitor utilization, latency, faults, thermal margins, costs, model quality, and agentic workflow safety.

AI factory guide

Vera Rubin as a platform story, not just a chip story.

The strongest interpretation of Vera Rubin NVL72 is a full-stack system: accelerators, CPUs, NVLink, networking, DPUs, switches, software, operations, and data-center design all supporting the same goal of efficient AI production.

Rubin GPUs, Vera CPUs, NVLink, networking, cooling, and software for rack-scale agentic AI.

FAQ

Vera Rubin NVL72 questions

What is Vera Rubin NVL72?

Vera Rubin NVL72 is presented here as a rack-scale NVIDIA AI supercomputer platform concept combining Rubin GPUs, Vera CPUs, NVLink-scale communication, accelerated networking, liquid cooling, and AI factory operations.

What keywords does this page target?

The page targets Vera Rubin, NVIDIA Vera Rubin NVL72, Rubin GPU, Vera CPU, NVL72, rack-scale AI supercomputer, agentic AI, AI factory, NVLink 6, ConnectX-9, BlueField-4, Spectrum-X, AI training, and AI inference.

Why does rack-scale design matter?

Rack-scale design matters because modern AI workloads are limited by communication, memory movement, thermal density, scheduling, networking, and operations as much as by individual accelerator performance.