Meta Partners with Together AI to Launch High-Performance Llama 3.1 Models

Meta has partnered with Together AI to unveil the Llama 3.1 models, marking a significant milestone in the open-source AI landscape. The release includes the Llama 3.1 405B, 70B, 8B, and LlamaGuard models, all of which are now available for inference and fine-tuning through Together AI's platform. This collaboration aims to deliver accelerated performance while maintaining full accuracy, according to Together AI.

Unmatched Performance and Scalability

The Together Inference Platform promises horizontal scalability with industry-leading performance metrics. The Llama 3.1 405B model can process up to 80 tokens per second, while the 8B model can handle up to 400 tokens per second. This represents a speed improvement of 1.9x to 4.5x over vLLM, all while maintaining full accuracy.

These advancements are built on Together AI's proprietary inference optimization research, incorporating technologies like FlashAttention-3 kernels and custom-built speculators based on RedPajama. The platform supports both serverless and dedicated endpoints, offering flexibility for developers and enterprises to build generative AI applications at production scale.

Broad Adoption and Use Cases

Over 100,000 developers and companies, including Zomato, DuckDuckGo, and the Washington Post, are already leveraging the Together Platform for their generative AI needs. The Llama 3.1 models offer unmatched flexibility and control, making them suitable for a range of applications from general knowledge tasks to multilingual translation and tool use.

The Llama 3.1 405B model, in particular, stands out as the largest openly available foundation model, rivaling the best closed-source alternatives. It includes advanced features like synthetic data generation and model distillation, which are expected to accelerate the adoption of open-source AI.

Advanced Features and Tools

The Together Inference Engine also includes LlamaGuard, a moderation model that can be used as a standalone classifier or as a filter to safeguard responses. This feature allows developers to screen for potentially unsafe content, enhancing the safety and reliability of AI applications.

The Llama 3.1 models also expand context length to 128K and add support for eight languages. These enhancements, along with new security and safety tools, make the models highly versatile and suitable for a wide range of applications.

Available Through API and Dedicated Endpoints

All Llama 3.1 models are accessible via the Together API, and the 405B model is available for QLoRA fine-tuning, allowing enterprises to tailor the models to their specific needs. The Together Turbo endpoints offer best-in-class throughput and accuracy, making them the most cost-effective solution for building with Llama 3.1 at scale.

Future Prospects

The partnership between Meta and Together AI aims to democratize access to high-performance AI models, fostering innovation and collaboration within the AI community. The open-source nature of the Llama 3.1 models aligns with Together AI's vision of open research and trust between researchers, developers, and enterprises.

As the launch partner for the Llama 3.1 models, Together AI is committed to providing the best performance, accuracy, and cost-efficiency for generative AI workloads, ensuring that developers and enterprises can keep their data and models secure.