NVIDIA NeMo Curator Enhances Video Processing on DGX Cloud

NVIDIA NeMo Curator Enhances Video Processing on DGX Cloud

Alvin Lang Mar 18, 2025 20:51 0 Min Read

The advent of physical AI has significantly increased video content generation, with a single autonomous vehicle producing over 1 TB of video daily, according to NVIDIA. To manage and utilize this vast data efficiently, NVIDIA has launched the NeMo Curator, a GPU-accelerated streaming pipeline available on the NVIDIA DGX Cloud.

Challenges with Traditional Processing

Traditional batch processing systems have struggled with the exponential data growth, often leading to underutilization of GPUs and increased costs. These systems accumulate large data volumes for processing, which can cause inefficiencies and latency issues in AI model development.

GPU-Accelerated Streaming Solution

To address these challenges, the NeMo Curator introduces a flexible streaming pipeline that leverages GPU acceleration for large-scale video curation. This advanced pipeline incorporates auto-scaling and load-balancing techniques to optimize throughput across various stages, maximizing hardware utilization and reducing total cost of ownership (TCO).

Optimized Throughput and Resource Utilization

The streaming processing approach allows for direct piping of intermediate data between stages, reducing latency and improving efficiency. By separating CPU-intensive tasks from GPU-intensive ones, the system can better align with the actual capacity of available infrastructure, avoiding idle resources and ensuring balanced throughput.

Architecture and Implementation

The NeMo Curator pipeline, built on the Ray framework, is divided into several stages, from video decoding to embedding computation. Each stage uses a pool of Ray actors to process data in parallel, with the orchestration thread managing input and output queues to maintain optimal throughput. The system dynamically adjusts the actor pool size to accommodate varying stage speeds, ensuring consistent flow and efficiency.

Performance and Future Prospects

Compared to traditional batch processing, the streaming pipeline achieves a 1.8x speedup, processing one hour of video per GPU in approximately 195 seconds. The NeMo Curator pipeline has shown an 89x performance improvement over baseline, capable of processing around 1 million hours of 720p video on 2,000 H100 GPUs in a single day. NVIDIA continues to work with early access partners to refine the system further and expand its capabilities.

For more detailed insights, visit the NVIDIA blog.