NVIDIA Unveils Mistral-NeMo-Minitron 8B Model with Superior Accuracy

Tony Kim Aug 22, 2024 05:37 3 Min Read

NVIDIA, in collaboration with Mistral AI, has announced the release of the Mistral-NeMo-Minitron 8B model, a highly advanced open-access large language model (LLM). According to the NVIDIA Technical Blog, this model surpasses other models of a similar size in terms of accuracy on nine popular benchmarks.

Advanced Model Pruning and Distillation

The Mistral-NeMo-Minitron 8B model was developed by width-pruning the larger Mistral NeMo 12B model, followed by a light retraining process using knowledge distillation. This methodology, originally proposed by NVIDIA in their paper on Compact Language Models via Pruning and Knowledge Distillation, has been validated through multiple successful implementations, including the NVIDIA Minitron 8B and 4B models, as well as the Llama-3.1-Minitron 4B model.

Model pruning involves reducing the size and complexity of a model by either dropping layers (depth pruning) or neurons and attention heads (width pruning). This process is often paired with retraining to recover any lost accuracy. Model distillation, on the other hand, transfers knowledge from a large, complex model (the teacher model) to a smaller, simpler model (the student model), aiming to retain much of the predictive power of the original model while being more efficient.

The combination of pruning and distillation allows for the creation of progressively smaller models from a large pretrained model. This approach significantly reduces the computational cost, as only 100-400 billion tokens are needed for retraining, compared to the much larger datasets required for training from scratch.

Mistral-NeMo-Minitron 8B Performance

The Mistral-NeMo-Minitron 8B model demonstrates leading accuracy on several benchmarks, outperforming other models in its class, including the Llama 3.1 8B and Gemma 7B models. The table below highlights the performance metrics:

	Training tokens	Wino-Grande 5-shot	ARC Challenge 25-shot	MMLU 5-shot	Hella Swag 10-shot	GSM8K 5-shot	TruthfulQA 0-shot	XLSum en (20%) 3-shot	MBPP 0-shot	Human Eval 0-shot
Llama 3.1 8B	15T	77.27	57.94	65.28	81.80	48.60	45.06	30.05	42.27	24.76
Gemma 7B	6T	78	61	64	82	50	45	17	39	32
Mistral-NeMo-Minitron 8B	380B	80.35	64.42	69.51	83.03	58.45	47.56	31.94	43.77	36.22
Mistral NeMo 12B	N/A	82.24	65.10	68.99	85.16	56.41	49.79	33.43	42.63	23.78

Table 1. Accuracy of the Mistral-NeMo-Minitron 8B base model compared to the teacher Mistral-NeMo 12B, Gemma 7B, and Llama-3.1 8B base models. Bold numbers represent the best among the 8B model class

Implementation and Future Work

Following the best practices of structured weight pruning and knowledge distillation, the Mistral-NeMo 12B model was width-pruned to yield the 8B target model. The process involved fine-tuning the unpruned Mistral NeMo 12B model using 127 billion tokens to correct for distribution shifts, followed by width-only pruning and distillation using 380 billion tokens.

The Mistral-NeMo-Minitron 8B model showcases superior performance and efficiency, making it a significant advancement in the field of AI. NVIDIA plans to continue refining the distillation process to produce even smaller and more accurate models. The implementation of this technique will be gradually integrated into the NVIDIA NeMo framework for generative AI.

For further details, visit the NVIDIA Technical Blog.

News

NVIDIA NeMo Curator Enhances Video Processing on DGX Cloud

NVIDIA introduces the NeMo Curator, a GPU-accelerated streaming pipeline for efficient video processing on DGX Cloud, optimizing AI model development and reducing costs.

Alvin Lang

Mar 19, 2025 | 0 Min Read

News

Sei Giga's Autobahn: Revolutionizing Blockchain with Multi-Proposer Consensus

Sei Giga introduces the Autobahn consensus mechanism, boosting blockchain throughput by 50x through a multi-proposer model, enhancing scalability and maintaining Byzantine Fault Tolerance.

Ted Hisokawa

Apr 12, 2025 | 0 Min Read

News

AI Revolutionizes Forex Trading: Transforming Currency Markets

AI is transforming forex trading, with algorithms executing 70-75% of trades. Human traders now focus on strategy and oversight, adapting to a fast-paced market.

by Khushi V Rangdhol

Apr 10, 2025 | 0 Min Read

News

Liberland: Can a Blockchain Nation Actually Work?

Liberland, a self-proclaimed blockchain nation, aims for innovative governance but faces challenges like unverified claims, lack of recognition, and economic instability.

by Khushi. V. Rangdhol

Apr 10, 2025 | 3 Min Read

News

NVIDIA and SoftBank Accelerate AI Factory Deployment in Japan

NVIDIA collaborates with SoftBank to rapidly deploy AI factories using DGX SuperPOD technology, marking a significant step in Japan's AI innovation landscape.

Luisa Crawford

Apr 12, 2025 | 0 Min Read

News

Sui's Web3 Tools Revolutionize Game Development

Sui offers comprehensive tools for game developers to seamlessly integrate Web3 features, enhancing gameplay without compromising performance, according to Sui Foundation.

Terrill Dicki

Apr 12, 2025 | 0 Min Read

News

NVIDIA and Meta's PyTorch Team Enhance Federated Learning for Mobile Devices

NVIDIA and Meta's PyTorch team introduce federated learning to mobile devices through NVIDIA FLARE and ExecuTorch. This collaboration ensures privacy-preserving AI model training across distributed devices.

Joerg Hiller

Apr 12, 2025 | 0 Min Read

News

BitMEX Launches BABYUSDT Perpetual Swaps with 50x Leverage

BitMEX introduces BABYUSDT perpetual swaps, offering traders up to 50x leverage. The new listing commenced trading on April 11, 2025, enhancing opportunities for crypto enthusiasts.

James Ding

Apr 12, 2025 | 0 Min Read

NVIDIA Unveils Mistral-NeMo-Minitron 8B Model with Superior Accuracy

Advanced Model Pruning and Distillation

Mistral-NeMo-Minitron 8B Performance

Implementation and Future Work

Read More

Newsletter