Augmented Grad-CAM++: Super-Resolution Saliency Maps for Visual Interpretation of Deep Neural Network
Yongshun Gao, Jie Liu, Weihan Li, Ming Hou, Yang Li, Huimin Zhao- Electrical and Electronic Engineering
- Computer Networks and Communications
- Hardware and Architecture
- Signal Processing
- Control and Systems Engineering
In recent years, deep neural networks have shown superior performance in various fields, but interpretability has always been the Achilles’ heel of deep neural networks. The existing visual interpretation methods for deep neural networks still suffer from inaccurate and insufficient target localization and low-resolution saliency maps. To address the above issues, this paper presents a saliency map generation method based on image geometry augmentation and super-resolution called augmented high-order gradient weighting class activation mapping (augmented grad-CAM++). Unlike previous approaches that rely on a single input image to generate saliency maps, this method first introduces the image geometry augmentation technique to create a set of augmented images for the input image and generate activation mappings separately. Secondly, the augmented activation mappings are combined to form the final saliency map. Finally, a super-resolution technique is introduced to add pixel points to reconstruct the saliency map pixels to improve the resolution of the saliency map. The proposed method is applied to analyze standard image data and industrial surface defect images. The results indicate that, in experiments conducted on standard image data, the proposed method achieved a 3.1% improvement in the accuracy of capturing target objects compared to traditional methods. Furthermore, the resolution of saliency maps was three times higher than that of traditional methods. In the application of industrial surface defect detection, the proposed method demonstrated an 11.6% enhancement in the accuracy of capturing target objects, concurrently reducing the false positive rate. The presented approach enables more accurate and comprehensive capture of target objects with higher resolution, thereby enhancing the visual interpretability of deep neural networks. This improvement contributes to the greater interpretability of deep learning models in industrial applications, offering substantial performance gains for the practical deployment of deep learning networks in the industrial domain.