Copied


NVIDIA Introduces NeMo Retriever for Enhanced RAG Pipelines

Jessie A Ellis   Jul 24, 2024 02:55 2 Min Read


Enterprises are increasingly looking to leverage their vast data reserves to improve operational efficiency, reduce costs, and boost productivity. NVIDIA's latest offering, the NeMo Retriever, aims to facilitate this by enabling developers to build and deploy advanced retrieval-augmented generation (RAG) pipelines. According to the NVIDIA Technical Blog, the NeMo Retriever collection introduces four new community-based NeMo Retriever NIMs designed for text embedding and reranking.

New Models for Enhanced Text Retrieval

NVIDIA has announced the release of three NeMo Retriever Embedding NIMs and one NeMo Retriever Reranking NIM. These models are:

  • NV-EmbedQA-E5-v5: Optimized for text question-answering retrieval.
  • NV-EmbedQA-Mistral7B-v2: A multilingual model fine-tuned for text embedding and accurate question answering.
  • Snowflake-Arctic-Embed-L: An optimized model for text embedding.
  • NV-RerankQA-Mistral4B-v3: Fine-tuned for text reranking and accurate question answering.

Understanding the Retrieval Pipeline

The retrieval pipeline employs embedding models to generate vector representations of text for semantic encoding, stored in a vector database. When a user queries the database, the question is encoded into a vector, which is matched against stored vectors to retrieve relevant information. Reranking models then score the relevance of the retrieved text chunks, ensuring the most accurate information is presented.

Embedding models offer speed and cost-efficiency, while reranking models provide higher accuracy. By combining these models, enterprises can achieve a balance between performance and cost, using embedding models to identify relevant chunks and reranking models to refine the results.

NeMo Retriever NIMs: Cost and Stability

Cost

NeMo Retriever NIMs are designed to reduce time-to-market and operational costs. These containerized solutions, equipped with industry-standard APIs and Helm charts, facilitate easy and scalable model deployment. Utilizing the NVIDIA AI Enterprise software suite, NIMs maximize model inference efficiency, thereby lowering deployment costs.

Stability

The NIMs are part of the NVIDIA AI Enterprise license, which ensures API stability, security patches, quality assurance, and support, providing a seamless transition from prototype to production for AI-driven enterprises.

Selecting NIMs for Your Pipeline

When designing a retrieval pipeline, developers need to balance accuracy, latency, data ingestion throughput, and production throughput. NVIDIA offers guidelines for selecting the appropriate NIMs based on these factors:

  • Maximize throughput and minimize latency: Use NV-EmbedQA-E5-v5 for optimized lightweight embedding model inference.
  • Optimize for low-volume, low-velocity databases: Use NV-EmbedQA-Mistral7B-v2 for both ingestion and production to balance throughput and accuracy with low latency.
  • Optimize for high-volume, high-velocity data: Combine NV-EmbedQA-E5-v5 for document ingestion with NV-RerankQA-Mistral-4B-v3 for reranking to enhance retrieval accuracy.

Performance benchmarks, such as NQ, HotpotQA, FiQA, and TechQA, show that NeMo Retriever NIMs achieve significant improvements in embedding and reranking performance, making them suitable for various enterprise retrieval use cases.

Getting Started

Developers can explore the NVIDIA NeMo Retriever NIMs in the API catalog and access NVIDIA's generative AI examples on GitHub. NVIDIA also offers labs to try the AI Chatbot with RAG workflow through NVIDIA LaunchPad, allowing for customization and deployment of NIMs across various data environments.


Read More