An Explainable Deep Learning Framework for Medical Image Diagnosis Using Attention Mechanisms
DOI:
https://doi.org/10.66280/ijair.v1i1.6Keywords:
medical imaging; explainable AI; attention; weakly supervised localization; uncer- tainty.Abstract
Attention mechanisms are widely used to improve the performance of deep neural networks and to provide spatial cues that are often interpreted as explanations. In medical image diag- nosis, however, reliable explanations require more than visually appealing heatmaps: they must be stable under perturbations, aligned with clinically meaningful regions, and accompanied by uncertainty-aware decision outputs.
This paper presents an explainable deep learning framework for medical image diagnosis that integrates (i) an attention-based diagnostic backbone, (ii) multi-scale attention aggregation for lesion localization, (iii) calibration and uncertainty reporting for risk-aware triage, and (iv) a set of quantitative explainability checks that go beyond qualitative visualization.
The framework is designed as a practical template that can be instantiated for common diagnostic tasks (classification, weakly supervised localization, and segmentation-assisted clas- sification). We describe the modeling choices, training objectives, evaluation protocol, and ablation studies, and we discuss failure modes and deployment considerations.
References
[1] O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convolutional networks for biomedical image segmentation,” in International Conference on Medical Image Computing and Computer- Assisted Intervention (MICCAI), 2015.
[2] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016.
[3] P. Rajpurkar et al., “Chexnet: Radiologist-level pneumonia detection on chest x-rays with deep learning,” arXiv preprint arXiv:1711.05225, 2017.
[4] A. Esteva et al., “Dermatologist-level classification of skin cancer with deep neural networks,” Nature, 2017.
[5] A. Vaswani et al., “Attention is all you need,” in Advances in Neural Information Processing Systems (NeurIPS), 2017.
[6] J. Hu, L. Shen, and G. Sun, “Squeeze-and-excitation networks,” in IEEE Conference on Com- puter Vision and Pattern Recognition (CVPR), 2018.
[7] S. Woo, J. Park, J.-Y. Lee, and I. S. Kweon, “Cbam: Convolutional block attention module,” in European Conference on Computer Vision (ECCV), 2018.
[8] A. Dosovitskiy et al., “An image is worth 16x16 words: Transformers for image recognition at scale,” in International Conference on Learning Representations (ICLR), 2021.
[9] B. Zhou, A. Khosla, A. Lapedriza, A. Oliva, and A. Torralba, “Learning deep features for discriminative localization,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016.
[10] R. R. Selvaraju, M. Cogswell, A. Das, R. Vedantam, D. Parikh, and D. Batra, “Grad-cam: Vi- sual explanations from deep networks via gradient-based localization,” in IEEE International Conference on Computer Vision (ICCV), 2017.
[11] J. Adebayo, J. Gilmer, M. Muelly, I. Goodfellow, M. Hardt, and B. Kim, “Sanity checks for saliency maps,” in Advances in Neural Information Processing Systems (NeurIPS), 2018.
[12] C. Guo, G. Pleiss, Y. Sun, and K. Q. Weinberger, “On calibration of modern neural networks,” International Conference on Machine Learning (ICML), 2017.
[13] G. Litjens et al., “A survey on deep learning in medical image analysis,” Medical Image Analysis, 2017.
[14] Y. Gal and Z. Ghahramani, “Dropout as a bayesian approximation: Representing model un- certainty in deep learning,” International Conference on Machine Learning (ICML), 2016.
[15] A. Shrikumar, P. Greenside, and A. Kundaje, “Learning important features through propagat- ing activation differences,” International Conference on Machine Learning (ICML) Workshop, 2017.
[16] S. M. Lundberg and S.-I. Lee, “A unified approach to interpreting model predictions,” Ad- vances in Neural Information Processing Systems (NeurIPS), 2017.
[17] M. T. Ribeiro, S. Singh, and C. Guestrin, “Why should i trust you?: Explaining the predictions of any classifier,” ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), 2016.
[18] H. Touvron, M. Cord, M. Douze, F. Massa, A. Sablayrolles, and H. Jégou, “Training data- efficient image transformers & distillation through attention,” International Conference on Machine Learning (ICML), 2021.
[19] M. A. Islam, M. M. Ahsan, et al., “Towards robust explainability of deep neural networks in medical imaging,” arXiv preprint arXiv:2006.00000, 2020.
Downloads
Published
Versions
- 2026-03-02 (3)
- 2026-01-30 (2)
- 2026-01-30 (1)
How to Cite
Issue
Section
License
Copyright (c) 2026 International Journal of Artificial Intelligence Research

This work is licensed under a Creative Commons Attribution 4.0 International License.
This article is published under the Creative Commons Attribution 4.0 International License (CC BY 4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.



