Copied


NVIDIA Unveils NemoClaw Stack and Nemotron 3 Models at GTC 2026

Caroline Bishop   Mar 17, 2026 14:00 0 Min Read


NVIDIA dropped a stack of AI announcements at GTC 2026 this week, headlined by NemoClaw—an open-source framework that lets users run autonomous AI agents locally on RTX PCs and DGX systems without paying per-token cloud fees. The company also released Nemotron 3 Super, a 120-billion-parameter model that scored 85.6% on PinchBench, making it the top-performing open model for agentic AI tasks.

The timing matters. NVIDIA shares traded at $180.20 on March 17, down 2.8% on the day, as the company pushes deeper into the personal AI computing space. Orders for the DGX Station opened earlier this week, bringing data center-grade AI performance to desktop form factors.

What NemoClaw Actually Does

NemoClaw addresses two pain points that have slowed AI agent adoption: token costs and privacy concerns. The stack includes Nemotron open models for local inference and OpenShell, a runtime designed for safer agent execution. Running inference locally means no API fees and your data stays on your machine.

The framework targets OpenClaw, an emerging category of autonomous AI assistants that can access personal files, apps, and workflows to automate tasks. Think of it as giving an AI agent the keys to your computer—which explains why security features matter here.

The Model Lineup

NVIDIA released three tiers of Nemotron 3 models to cover different hardware configurations:

Nemotron 3 Super (120B parameters, 12B active): Designed for DGX Spark and RTX PRO workstations. The DGX Spark's 128GB unified memory handles models this size without breaking a sweat.

Nemotron 3 Nano 4B: The compact option for GeForce RTX users. NVIDIA positioned this for gaming NPCs and conversational apps running on consumer hardware with limited VRAM.

Mistral also contributed Mistral Small 4, a 119B-parameter model with just 6B active parameters, optimized for chat and coding tasks. Both large models run locally on DGX Spark and RTX PRO GPUs.

For context, the DGX Spark uses a Grace Blackwell Superchip with shared CPU-GPU memory—a design that lets it run models up to 200 billion parameters that would choke standard RTX cards.

Creative Tools Get Faster

On the content creation side, Lightricks' LTX 2.3 audio-video model now supports NVFP4 and FP8 quantization, delivering 2.1x performance gains. Black Forest Labs' FLUX.2 Klein 9B received similar optimizations, cutting image editing time in half on RTX hardware.

NVIDIA also previewed DLSS 5, arriving this fall, which promises to inject "photoreal lighting and materials" into game rendering through AI upscaling.

The Bigger Picture

NVIDIA is betting that personal AI computing follows the same trajectory as personal computing itself. The company's framing of "agent computers" as a new device category signals where it sees the market heading—dedicated hardware for running AI assistants that know your files, your schedule, and your preferences.

Whether users actually want an always-on AI with that level of access remains the open question. But NVIDIA isn't waiting to find out. GTC attendees can visit the "build-a-claw" event through March 19 to customize their own agent and connect it to their preferred messaging app.

The models are available now through Ollama, LM Studio, and llama.cpp. Unsloth Studio, a new web-based fine-tuning interface supporting 500+ models, launched alongside the announcements for users who want to customize these models for specific workflows.


Read More