Advancing Model Transparency via Self-Explaining Deep Learning Architectures Integrating Symbolic Logic and Neural Representations

Timothy Langford

Authors

Timothy Langford Department of Engineering and Public Policy, Carnegie Mellon University

Abstract

The rapid integration of deep learning architectures into critical societal infrastructures has highlighted a fundamental tension between predictive efficacy and structural interpretability. While contemporary neural networks exhibit unprecedented performance in high-dimensional pattern recognition, their inherent opacity poses substantial risks to accountability, safety, and institutional trust. This paper explores the advancement of model transparency through the development of self-explaining architectures that natively integrate symbolic logic with connectionist neural representations. Unlike post-hoc interpretability methods that provide approximate justifications for black-box decisions, self-explaining systems aim to produce human-legible reasoning as an intrinsic component of the computational process. By embedding logical constraints and symbolic abstractions directly into the neural fabric, these hybrid systems offer a robust pathway toward verifiable and auditable artificial intelligence. The research provides a comprehensive system-level analysis of the structural trade-offs involved in hybridizing logic and deep learning, with a specific focus on architecture, governance, and deployment sustainability. Furthermore, the discussion extends to the socio-technical implications of these architectures, examining how integrated transparency affects fairness, policy compliance, and the long-term robustness of critical decision systems. Through detailed conceptual analysis and cross-domain comparisons, this work argues that the convergence of symbolic reasoning and neural learning is not merely a technical improvement but a necessary evolution for the responsible deployment of large-scale intelligent infrastructures.

References

1.Barocas, S., & Selbst, A. D. (2016). Big data's disparate impact. California Law Review, 104, 671-732.

2.Bathaee, Y. (2018). The tyranny of the algorithm: Predictive analytics and the termination of the human-centered decision-making process. Florida State University Law Review, 45(2), 617-640.

3.Binns, R. (2018). Fairness in machine learning: Lessons from political philosophy. Proceedings of the 1st Conference on Fairness, Accountability and Transparency, PMLR 81:149-159.

4.Doshi-Velez, F., & Kim, B. (2017). Towards a rigorous science of interpretable machine learning. arXiv preprint arXiv:1702.08608.

5.Floridi, L., & Cowls, J. (2019). A unified framework of five principles for AI in society. Harvard Data Science Review, 1(1).

6.Garcez, A. D., & Lamb, L. C. (2020). Neurosymbolic AI: The 3rd Wave. arXiv preprint arXiv:2012.05876.

7.Gunning, D., & Aha, D. W. (2019). DARPA’s Explainable Artificial Intelligence (XAI) Program. AI Magazine, 40(2), 44-58.

8.Guidotti, R., Monreale, A., Ruggieri, S., Turini, F., Giannotti, F., & Pedreschi, D. (2018). A survey of methods for explaining black box models. ACM Computing Surveys, 51(5), 1-42.

9.Jobin, A., Ienca, M., & Vayena, E. (2019). The global landscape of AI ethics guidelines. Nature Machine Intelligence, 1(9), 389-399.

10.Karimi, A. H., Barthe, G., Schölkopf, B., & Valera, I. (2020). A survey of algorithmic recourse: Definitions, formulations, solutions, and prospects. arXiv preprint arXiv:2010.04050.

11.Leslie, D. (2019). Understanding artificial intelligence ethics and safety: A guide for the responsible design and implementation of AI systems in the public sector. The Alan Turing Institute.

12.Lundberg, S. M., & Lee, S. I. (2017). A unified approach to interpreting model predictions. Advances in Neural Information Processing Systems, 30.

13.Mittelstadt, B. D., Allo, P., Taddeo, M., Wachter, S., & Floridi, L. (2016). The ethics of algorithms: Mapping the debate. Big Data & Society, 3(2).

14.Molnar, C. (2020). Interpretable Machine Learning: A Guide for Making Black Box Models Explainable.

15.Pasquale, F. (2015). The Black Box Society: The Secret Algorithms That Control Money and Information. Harvard University Press.

16.Pearl, J. (2018). The Book of Why: The New Science of Cause and Effect. Basic Books.

17.Ribeiro, M. T., Singh, S., & Guestrin, C. (2016). "Why should I trust you?": Explaining the predictions of any classifier. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 1135-1144.

18.Shi, C., Li, S., Guo, S., Xie, S., Wu, W., Dou, J., ... & Chua, T. S. (2025). Where Culture Fades: Revealing the Cultural Gap in Text-to-Image Generation. arXiv preprint arXiv:2511.17282.

19.Rudin, C. (2019). Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5), 206-215.

20.Slack, D., Hilgard, S., Jia, E., Singh, S., & Lakkaraju, H. (2020). Fooling LIME and SHAP: Adversarial attacks on post hoc explanation methods. Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society, 180-186.

21.Sokol, K., & Flach, P. (2019). Explainability fact sheets: A framework for systematic assessment of explainable AI. Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency, 56-67.

22.Valtolina, S., Barricelli, B. R., & Mesiti, M. (2021). A neural-symbolic approach for explainable AI. Information, 12(11), 476.

23.van der Waa, J., Schoonderwoerd, T., van Diggelen, J., & Neerincx, M. (2021). Interpretable confidence measures for AI-assisted decision making. International Journal of Human-Computer Studies, 146, 102558.

24.Veale, M., & Binns, R. (2017). Fairer machine learning in the real world: Mitigating discrimination without collecting sensitive data. Big Data & Society, 4(2).

25.Verma, S., Dickerson, J. P., & Hines, K. (2020). Counterfactual explanations for machine learning: A review. arXiv preprint arXiv:2010.10596.

26.Wachter, S., Mittelstadt, B., & Floridi, L. (2017). Transparent, explainable, and accountable AI for robotics. Science Robotics, 2(6).

27.Wachter, S., Mittelstadt, B., & Russell, C. (2017). Counterfactual explanations without opening the black box: Automated decisions and the GDPR. Harvard Journal of Law & Technology, 31, 841.

28.Wang, D., Yang, Q., Abdul, A., & Lim, B. Y. (2019). Designing theory-driven user-centric explainable AI. Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems, 1-15.

29.Yang, G. Z., et al. (2018). The grand challenges of Science Robotics. Science Robotics, 3(14).

30.Zhang, Y., & Chen, X. (2020). Explainable recommendation: A survey and new perspectives. Foundations and Trends in Information Retrieval, 14(1), 1-101.

Advancing Model Transparency via Self-Explaining Deep Learning Architectures Integrating Symbolic Logic and Neural Representations

Authors

Abstract

References

Downloads

Published

How to Cite

Issue

Section

License

Make a Submission

Journal Information

Current Issue

Information

Indexing & Infrastructure