AMD Enhances AI Algorithm Efficiency with Innovative Depth Pruning Method
AMD, a leading semiconductor supplier, has made significant strides in optimizing hardware efficiency for artificial intelligence (AI) algorithms. According to AMD.com, the company's latest research paper titled 'A Unified Progressive Depth Pruner for CNN and Vision Transformer' has been accepted at the prestigious AAAI 2024 conference. This paper introduces a novel depth pruning method designed to enhance performance across various AI models.
Motivation for Model Optimization
Deep neural networks (DNNs) have become integral to various industrial applications, necessitating continuous model optimization. Techniques such as model pruning, quantization, and efficient model design are crucial in this context. Traditional channel-wise pruning methods face challenges with depth-wise convolutional layers due to sparse computation and fewer parameters. These methods also often struggle with high parallel computing demands, leading to suboptimal hardware utilization.
To address these issues, AMD's research team proposed DepthShrinker and Layer-Folding techniques to optimize MobileNetV2 by reducing model depth through reparameterization. Despite their promise, these methods have limitations, such as potential accuracy loss and constraints with certain normalization layers like LayerNorm, making them unsuitable for vision transformer models.
Innovative Depth Pruning Approach
AMD's new depth pruning method introduces a progressive training strategy and a novel block pruning technique that can optimize both CNN and vision transformer models. This approach ensures high utilization of baseline model weights, resulting in higher accuracy. Moreover, the method can handle existing normalization layers, including LayerNorm, enabling effective pruning of vision transformer models.
The AMD depth pruning strategy converts complex and slow blocks into simpler, faster blocks through block merging. This involves replacing activation layers with identity layers and LayerNorm layers with BatchNorm layers, facilitating reparameterization. The reparameterization technique then merges BatchNorm layers, adjacent convolutional or fully connected layers, and skip connections.
Key Technologies
The depth pruning process involves four main steps: Supernet training, Subnet searching, Subnet training, and Subnet merging. Initially, a Supernet is constructed based on the baseline model, incorporating block modifications. After Supernet training, an optimal subnet is identified using a search algorithm. The progressive training strategy is then applied to optimize the subnet with minimal accuracy loss. Finally, the subnet is merged into a shallower model using the reparameterization technique.
Benefits and Performance
AMD's depth pruning method offers several key contributions:
- A unified and efficient depth pruning method for CNN and vision transformer models.
- A progressive training strategy for subnet optimization coupled with a novel block pruning strategy using reparameterization.
- Comprehensive experiments demonstrating superior pruning performance across various AI models.
Experimental results show that AMD's method achieves up to 1.26X speedup on the AMD Instinct™ MI100 GPU accelerator, with only a 1.9% top-1 accuracy drop. The approach has been tested on multiple models, including ResNet34, MobileNetV2, ConvNeXtV1, and DeiT-Tiny, showcasing its versatility and effectiveness.
In conclusion, AMD's unified depth pruning method represents a significant advancement in optimizing AI model performance. Its applicability to both CNN and vision transformer models highlights its potential impact on future AI developments. AMD plans to explore further applications of this method on more transformer models and tasks.