Advancing Model Transparency via Self-Explaining Deep Learning Architectures Integrating Symbolic Logic and Neural Representations

Authors

  • Timothy Langford Department of Engineering and Public Policy, Carnegie Mellon University

Abstract

The rapid integration of deep learning architectures into critical societal infrastructures has highlighted a fundamental tension between predictive efficacy and structural interpretability. While contemporary neural networks exhibit unprecedented performance in high-dimensional pattern recognition, their inherent opacity poses substantial risks to accountability, safety, and institutional trust. This paper explores the advancement of model transparency through the development of self-explaining architectures that natively integrate symbolic logic with connectionist neural representations. Unlike post-hoc interpretability methods that provide approximate justifications for black-box decisions, self-explaining systems aim to produce human-legible reasoning as an intrinsic component of the computational process. By embedding logical constraints and symbolic abstractions directly into the neural fabric, these hybrid systems offer a robust pathway toward verifiable and auditable artificial intelligence. The research provides a comprehensive system-level analysis of the structural trade-offs involved in hybridizing logic and deep learning, with a specific focus on architecture, governance, and deployment sustainability. Furthermore, the discussion extends to the socio-technical implications of these architectures, examining how integrated transparency affects fairness, policy compliance, and the long-term robustness of critical decision systems. Through detailed conceptual analysis and cross-domain comparisons, this work argues that the convergence of symbolic reasoning and neural learning is not merely a technical improvement but a necessary evolution for the responsible deployment of large-scale intelligent infrastructures.

References

1.Barocas, S., & Selbst, A. D. (2016). Big data's disparate impact. California Law Review, 104, 671-732.

2.Bathaee, Y. (2018). The tyranny of the algorithm: Predictive analytics and the termination of the human-centered decision-making process. Florida State University Law Review, 45(2), 617-640.

3.Binns, R. (2018). Fairness in machine learning: Lessons from political philosophy. Proceedings of the 1st Conference on Fairness, Accountability and Transparency, PMLR 81:149-159.

4.Doshi-Velez, F., & Kim, B. (2017). Towards a rigorous science of interpretable machine learning. arXiv preprint arXiv:1702.08608.

5.Floridi, L., & Cowls, J. (2019). A unified framework of five principles for AI in society. Harvard Data Science Review, 1(1).

6.Garcez, A. D., & Lamb, L. C. (2020). Neurosymbolic AI: The 3rd Wave. arXiv preprint arXiv:2012.05876.

7.Gunning, D., & Aha, D. W. (2019). DARPA’s Explainable Artificial Intelligence (XAI) Program. AI Magazine, 40(2), 44-58.

8.Guidotti, R., Monreale, A., Ruggieri, S., Turini, F., Giannotti, F., & Pedreschi, D. (2018). A survey of methods for explaining black box models. ACM Computing Surveys, 51(5), 1-42.

9.Jobin, A., Ienca, M., & Vayena, E. (2019). The global landscape of AI ethics guidelines. Nature Machine Intelligence, 1(9), 389-399.

10.Karimi, A. H., Barthe, G., Schölkopf, B., & Valera, I. (2020). A survey of algorithmic recourse: Definitions, formulations, solutions, and prospects. arXiv preprint arXiv:2010.04050.

11.Leslie, D. (2019). Understanding artificial intelligence ethics and safety: A guide for the responsible design and implementation of AI systems in the public sector. The Alan Turing Institute.

12.Lundberg, S. M., & Lee, S. I. (2017). A unified approach to interpreting model predictions. Advances in Neural Information Processing Systems, 30.

13.Mittelstadt, B. D., Allo, P., Taddeo, M., Wachter, S., & Floridi, L. (2016). The ethics of algorithms: Mapping the debate. Big Data & Society, 3(2).

14.Molnar, C. (2020). Interpretable Machine Learning: A Guide for Making Black Box Models Explainable.

15.Pasquale, F. (2015). The Black Box Society: The Secret Algorithms That Control Money and Information. Harvard University Press.

16.Pearl, J. (2018). The Book of Why: The New Science of Cause and Effect. Basic Books.

17.Ribeiro, M. T., Singh, S., & Guestrin, C. (2016). "Why should I trust you?": Explaining the predictions of any classifier. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 1135-1144.

18.Shi, C., Li, S., Guo, S., Xie, S., Wu, W., Dou, J., ... & Chua, T. S. (2025). Where Culture Fades: Revealing the Cultural Gap in Text-to-Image Generation. arXiv preprint arXiv:2511.17282.

19.Rudin, C. (2019). Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5), 206-215.

20.Slack, D., Hilgard, S., Jia, E., Singh, S., & Lakkaraju, H. (2020). Fooling LIME and SHAP: Adversarial attacks on post hoc explanation methods. Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society, 180-186.

21.Sokol, K., & Flach, P. (2019). Explainability fact sheets: A framework for systematic assessment of explainable AI. Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency, 56-67.

22.Valtolina, S., Barricelli, B. R., & Mesiti, M. (2021). A neural-symbolic approach for explainable AI. Information, 12(11), 476.

23.van der Waa, J., Schoonderwoerd, T., van Diggelen, J., & Neerincx, M. (2021). Interpretable confidence measures for AI-assisted decision making. International Journal of Human-Computer Studies, 146, 102558.

24.Veale, M., & Binns, R. (2017). Fairer machine learning in the real world: Mitigating discrimination without collecting sensitive data. Big Data & Society, 4(2).

25.Verma, S., Dickerson, J. P., & Hines, K. (2020). Counterfactual explanations for machine learning: A review. arXiv preprint arXiv:2010.10596.

26.Wachter, S., Mittelstadt, B., & Floridi, L. (2017). Transparent, explainable, and accountable AI for robotics. Science Robotics, 2(6).

27.Wachter, S., Mittelstadt, B., & Russell, C. (2017). Counterfactual explanations without opening the black box: Automated decisions and the GDPR. Harvard Journal of Law & Technology, 31, 841.

28.Wang, D., Yang, Q., Abdul, A., & Lim, B. Y. (2019). Designing theory-driven user-centric explainable AI. Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems, 1-15.

29.Yang, G. Z., et al. (2018). The grand challenges of Science Robotics. Science Robotics, 3(14).

30.Zhang, Y., & Chen, X. (2020). Explainable recommendation: A survey and new perspectives. Foundations and Trends in Information Retrieval, 14(1), 1-101.

Downloads

Published

2026-05-12

How to Cite

Timothy Langford. (2026). Advancing Model Transparency via Self-Explaining Deep Learning Architectures Integrating Symbolic Logic and Neural Representations. International Journal of Artificial Intelligence Research, 1(2). Retrieved from https://isipress.org/index.php/IJAIR/article/view/138