NVIDIA Expands Riva ASR Capabilities with Whisper and Canary Models

NVIDIA Expands Riva ASR Capabilities with Whisper and Canary Models

Rebeca Moen Feb 21, 2025 10:54 0 Min Read

NVIDIA has taken significant strides in advancing its Automatic Speech Recognition (ASR) systems by introducing enhanced capabilities through the Riva 2.18.0 container and SDK. These developments are part of NVIDIA's ongoing efforts to refine its GPU-accelerated speech and translation AI microservices, as detailed by Sven Chilton on the NVIDIA Developer Blog.

Integration of New Models

The latest iteration of Riva includes support for the Parakeet architecture, which facilitates streaming multilingual ASR, and the Whisper and Canary models for offline ASR and Automatic Speech Translation (AST). Whisper, developed by OpenAI, and the Distil-Whisper models by HuggingFace, are now integral to Riva's offline ASR capabilities, allowing for transcription and translation of audio recordings in numerous languages directly to English.

Canary models further extend Riva's functionality by supporting offline ASR and AST in multiple language combinations, including Any-to-English, English-to-Any, and Any-to-Any translations. These models cater to diverse linguistic needs, offering robust support for language detection and translation tasks.

Selective NMT Deactivation

One of the notable features introduced in this update is the ability to selectively deactivate parts of the Neural Machine Translation (NMT) process using the <dnt> SSML tag. This feature allows users to specify text segments that should not be translated, providing greater control over the translation outputs. Additionally, a new DNT dictionary enables the specification of how certain words or phrases should be translated, enhancing the customization of translation processes.

Deployment and Usage

Deploying these new capabilities is streamlined through the Riva Skills Quick Start resource folder, which includes scripts and configuration files necessary for setting up a Riva server with Whisper and Canary functionalities. Users can choose between Whisper and Canary models based on their specific ASR needs, utilizing provided scripts to optimize model deployment according to their GPU architecture.

NVIDIA's commitment to expanding the linguistic and functional scope of its ASR systems is evident in the integration of these advanced models and features. By supporting a wider range of languages and offering enhanced translation controls, Riva continues to set industry standards in speech recognition and translation technology.

For further information on NVIDIA's latest ASR advancements, visit the NVIDIA Developer Blog.

NVIDIA Expands Riva ASR Capabilities with Whisper and Canary Models

Integration of New Models

Selective NMT Deactivation

Deployment and Usage

Read More