Copied


AMD Unveils ROCm 6.2: Boosting AI and HPC Performance with New Enhancements

Terrill Dicki   Aug 06, 2024 03:21 0 Min Read


AMD has announced the release of ROCm 6.2, a major update aimed at enhancing the performance, efficiency, and scalability of AI and high-performance computing (HPC) applications. According to AMD.com, this release includes several key improvements that solidify ROCm's position as a leading platform for AI and HPC development.

Extending vLLM Support

ROCm 6.2 expands vLLM support to improve the efficiency and scalability of AI models on AMD Instinct™ Accelerators. Designed for large language models (LLMs), vLLM addresses key inferencing challenges such as efficient multi-GPU computation, reduced memory usage, and minimized computational bottlenecks. This update enables various upstream vLLM features like multi-GPU execution and FP8 KV cache, making it easier for developers to tackle complex AI tasks.

Bitsandbytes Quantization

The inclusion of the Bitsandbytes quantization library in ROCm 6.2 significantly boosts memory efficiency and performance on AMD Instinct™ GPU accelerators. Utilizing 8-bit optimizers, it reduces memory usage during AI training, allowing developers to work with larger models on limited hardware. The LLM.Int8() quantization optimizes AI deployment, making advanced AI capabilities more accessible and cost-effective.

New Offline Installer Creator

The new ROCm Offline Installer Creator simplifies the installation process for systems without internet access. It creates a single installer file that includes all necessary dependencies, making deployment straightforward. This tool integrates functionalities into a unified interface, automates post-installation tasks, and ensures correct and consistent installations, improving overall system stability.

Omnitrace and Omniperf Profiler Tools

The introduction of Omnitrace and Omniperf Profiler Tools (Beta) in ROCm 6.2 aims to revolutionize AI and HPC development. Omnitrace provides a holistic view of system performance across CPUs, GPUs, NICs, and network fabrics, while Omniperf offers detailed GPU kernel analysis for fine-tuning. These tools help developers identify and resolve performance bottlenecks, ensuring efficient resource utilization and faster AI training and HPC simulations.

Broader FP8 Support

ROCm 6.2 extends FP8 support across its ecosystem, enhancing AI inferencing by addressing memory bottlenecks and high latency associated with higher precision formats. The update includes FP8 GEMM support in PyTorch and JAX, FP8-specific collective operations in RCCL, and FP8-based Fused Flash attention in MIOPEN. These enhancements enable more efficient training and inference processes, maximizing throughput and reducing latency.

AMD continues to demonstrate its commitment to providing robust, competitive, and innovative solutions for the AI and HPC community with the ROCm 6.2 release. Developers now have the tools and support needed to push the boundaries of what’s possible, fostering confidence in ROCm as the open platform of choice for next-generation computational tasks.

Discover the range of new features introduced in ROCm 6.2 by reviewing the release notes.


Read More
NVIDIA's JetPack 6.2 update introduces Super Mode, significantly boosting AI performance on Jetson Orin Nano and NX modules, enhancing their capabilities for edge AI applications.
Riot Platforms is assessing the use of its Corsicana Facility’s power capacity for AI and HPC, pausing Bitcoin mining expansion to focus on potential partnerships.
The Hong Kong Monetary Authority has issued a warning about a fraudulent website posing as OCBC Bank (Hong Kong) Limited, urging public vigilance.
BitMEX has changed the Mark Method for NILUSDTH25 and REDUSDTZ25 to Fair Price marking, effective March 25, 2025, enhancing price accuracy.
BitMEX introduces NILUSDT perpetual swaps, offering traders up to 50x leverage. This new listing enhances trading options on the platform.
Bitcoin remains vulnerable to downward pressure due to tight liquidity conditions and weak investor sentiment, with ETF outflows and cautious market behavior persisting.
Vodafone implements AI-driven solutions using LangChain and LangGraph to optimize data operations and improve performance metrics monitoring and information retrieval across its data centers.
BitMEX announces the introduction of NILUSDT perpetual swap listing, offering traders up to 50x leverage. The NIL token will be available for trading starting March 25, 2024.