GIG-CAM+M: A Class Activation Mapping Method Incorporating Guided Integrated Gradients and Multi-scale Strategy
GIG-CAM+M: A Class Activation Mapping Method Incorporating Guided Integrated Gradients and Multi-scale Strategy
Yanfei Gao(Shanxi Finance & Taxation College, China); Xiongwei Miao(Shanxi Intelligent Big Data Industry Technology Innovation Research Institute, China); Guoye Zhang(Shanxi Provincial Digital Government Service Center, China)
19권 4호, 1122~1139쪽
초록
The interpretability of convolutional neural networks has garnered widespread attention, with class activation mapping (CAM)-based methods emerging as a prominent research direction. Integrated Grad-CAM is a widely used backpropagation-based CAM method, but its use of a linear path introduces noise during the integration process. To address this issue, we propose GIG-CAM, which replaces the linear path with an adaptive path. Unlike previous methods that require path specification, GIG-CAM dynamically determines the next input in the path based on saliency maps. Additionally, to enhance the resolution of saliency maps, we introduce a novel multi-scale fusion method, which recursively optimizes saliency maps at smaller scales using saliency maps at larger scales. This preserves the localization capability of the original-scale saliency maps while enhancing their resolution. Experimental results on the VOC2012 and ILSVRC2012 datasets demonstrate that GIG-CAM with fusion (GIG-CAM(F)) outperforms existing methods, achieving the highest scores in the Pointing Game (82.80% and 85.90% on ResNet50 for VOC2012 and ILSVRC2012, respectively) and Energy-Based Pointing Game (62.41% and 59.69%, respectively). Furthermore, GIG-CAM(F) achieves the lowest Drop% (22.59% and 17.04%) and highest Increase% (31.00% and 21.95%), validating its superior interpretability. Our results highlight the effectiveness of GIG-CAM in improving the quality and reliability of saliency maps, making it a robust solution for enhancing deep model transparency.
Abstract
The interpretability of convolutional neural networks has garnered widespread attention, with class activation mapping (CAM)-based methods emerging as a prominent research direction. Integrated Grad-CAM is a widely used backpropagation-based CAM method, but its use of a linear path introduces noise during the integration process. To address this issue, we propose GIG-CAM, which replaces the linear path with an adaptive path. Unlike previous methods that require path specification, GIG-CAM dynamically determines the next input in the path based on saliency maps. Additionally, to enhance the resolution of saliency maps, we introduce a novel multi-scale fusion method, which recursively optimizes saliency maps at smaller scales using saliency maps at larger scales. This preserves the localization capability of the original-scale saliency maps while enhancing their resolution. Experimental results on the VOC2012 and ILSVRC2012 datasets demonstrate that GIG-CAM with fusion (GIG-CAM(F)) outperforms existing methods, achieving the highest scores in the Pointing Game (82.80% and 85.90% on ResNet50 for VOC2012 and ILSVRC2012, respectively) and Energy-Based Pointing Game (62.41% and 59.69%, respectively). Furthermore, GIG-CAM(F) achieves the lowest Drop% (22.59% and 17.04%) and highest Increase% (31.00% and 21.95%), validating its superior interpretability. Our results highlight the effectiveness of GIG-CAM in improving the quality and reliability of saliency maps, making it a robust solution for enhancing deep model transparency.
- 발행기관:
- 한국인터넷정보학회
- 분류:
- 컴퓨터학