NVIDIA Drops Open Blueprint for Physical AI Training Data at GTC

NVIDIA just handed robotics and autonomous vehicle developers a major shortcut. The company unveiled its Physical AI Data Factory Blueprint at GTC on March 16, an open reference architecture that automates the messy, expensive process of generating training data for physical AI systems.

The pitch is straightforward: take limited real-world data, run it through NVIDIA's Cosmos foundation models, and multiply it into massive datasets complete with rare edge cases that would take years to capture organically. Microsoft Azure and Nebius are already integrating the blueprint into their cloud infrastructure.

Who's Actually Using This

The adopter list reads like a physical AI who's who. Uber is applying the blueprint to accelerate autonomous vehicle development. Skild AI is using it for general-purpose robot foundation models. FieldAI, Hexagon Robotics, Linker Vision, Milestone Systems, RoboForce, and Teradyne Robotics have all signed on.

NVIDIA itself is using the architecture to train Alpamayo, which it calls "the world's first open reasoning-based vision language action models for long-tail autonomous driving."

The Technical Stack

Three components do the heavy lifting. Cosmos Curator handles data processing and annotation. Cosmos Transfer multiplies and diversifies that curated data across different environments and lighting conditions. Cosmos Evaluator—now available on GitHub—automatically scores and filters generated data for physical accuracy.

The orchestration layer, OSMO, now integrates with coding agents including Claude Code, OpenAI Codex, and Cursor. This lets AI agents manage resources and resolve bottlenecks autonomously rather than requiring manual oversight.

"Physical AI is the next frontier of the AI revolution, where success depends on the ability to generate massive amounts of data," said Rev Lebaredian, NVIDIA's VP of Omniverse and simulation technologies. "In this new era, compute is data."

Cloud Integration Details

Microsoft Azure is building an open physical AI toolchain around the blueprint with integrations into Azure IoT Operations, Microsoft Fabric, and GitHub Copilot. FieldAI, Hexagon Robotics, Linker Vision, and Teradyne Robotics are testing this toolchain for their perception and reinforcement learning pipelines.

Nebius has integrated OSMO directly into its AI Cloud, pairing it with NVIDIA RTX PRO 6000 Blackwell Server Edition GPUs. Milestone Systems, Voxel51, and RoboForce are already running data pipelines on Nebius infrastructure.

What This Means

The "contact data" problem—getting enough real-world training examples for robots and AVs—has been a persistent bottleneck. NVIDIA's approach creates a data flywheel through simulation, which could compress development timelines significantly for companies that lack the resources to collect millions of hours of real-world driving or manipulation data.

The full blueprint hits GitHub in April. With NVIDIA stock trading at $180.25 as of March 16, the company continues stacking its position as the infrastructure layer for AI development across every domain—now including the physical world.

NVIDIA Drops Open Blueprint for Physical AI Training Data at GTC

Who's Actually Using This

The Technical Stack

Cloud Integration Details

What This Means

Read More