Together AI and NVIDIA Join Forces to Enhance Llama 3.1 Models on DGX Cloud

Together AI has announced a strategic collaboration with NVIDIA to enhance the capabilities of Llama 3.1 models for enterprises by leveraging NVIDIA's DGX Cloud. This partnership aims to empower businesses and developers to utilize openly available models, enabling optimized AI inference on NVIDIA's advanced infrastructure.

Optimized AI Inference for Enterprises

The collaboration introduces the Together Inference Engine to NVIDIA AI Foundry customers, offering a robust platform for running Llama 3.1 models on the NVIDIA DGX Cloud. According to Together AI, this integration allows enterprises to achieve superior performance, accuracy, and cost-efficiency at production scale.

“Enterprises want to leverage the power of openly available AI models like Llama 3.1, customized to their specific needs,” said Alexis Bjorlin, vice president of DGX Cloud at NVIDIA. “By collaborating with Together AI, we’re introducing the highly optimized Together Inference Engine to DGX Cloud, offering companies efficient and scalable AI inference capabilities.”

Innovative Technology and Benefits

The Together Inference Engine is built on several technological advancements, including FlashAttention-3 kernels, custom-built speculators based on RedPajama, and advanced quantization techniques. These innovations optimize enterprise workloads for NVIDIA Tensor Core GPUs, facilitating the development and deployment of generative AI applications with unmatched efficiency.

With this collaboration, NVIDIA AI Foundry customers can utilize the latest NVIDIA AI architecture, optimized for faster deployment. Enterprises have the flexibility to fine-tune models with proprietary data, ensuring higher accuracy and performance while maintaining data ownership.

Impact on Open-Source AI

This partnership marks a significant milestone for open-source AI with the launch of Llama 3.1 405B, the largest openly available foundation model. It offers comprehensive capabilities in general knowledge, steerability, math, tool use, and multilingual translation, rivaling top closed-source models while providing safety tools for responsible development.

At Together AI, the focus remains on advancing open research and trust between researchers, developers, and enterprises. The company has pioneered methods like FlashAttention 3, Mixture of Agents, Medusa, Sequoia, Hyena, Mamba, and CocktailSGD, driving faster innovation and time-to-market for AI solutions.

Real-World Applications

Enterprises such as Zomato, DuckDuckGo, and the Washington Post are already leveraging Together Inference for their generative AI applications. With the NVIDIA collaboration, businesses with sophisticated workloads can deploy open-source models on DGX Cloud with enhanced performance, scalability, and security.

This partnership is set to accelerate the adoption of open-source AI, providing developers and enterprises with the tools needed to build advanced AI solutions efficiently and effectively.

Together AI and NVIDIA Join Forces to Enhance Llama 3.1 Models on DGX Cloud

Optimized AI Inference for Enterprises

Innovative Technology and Benefits

Impact on Open-Source AI

Real-World Applications

Read More