Explainable Financial Portfolio Optimization via Dual-System Large Language Model Reinforcement Learning

Authors

  • Manoj Gandhi School of Information Technology, University of Cincinnati, Cincinnati, OH, USA.
  • Jean Korhonen School of Computing, Clemson University, Clemson, SC, USA.
  • Clifford Ortega Department of Computer Science, University of New Hampshire, Durham, NH, USA.

Keywords:

portfolio optimization, explainable artificial intelligence, large language models, reinforcement learning, dual-system theory, financial governance, system robustness

Abstract

Financial portfolio optimization traditionally relies on mean-variance frameworks and stochastic control methods that, while mathematically rigorous, offer limited interpretability for human stakeholders. The recent emergence of large language models (LLMs) and reinforcement learning (RL) provides a new paradigm for constructing adaptive, explainable investment strategies. This paper introduces a dual-system architecture inspired by cognitive science, in which an LLM-based reasoning module (System 2) generates contextually grounded explanations of market conditions and investment rationales, while a deep RL agent (System 1) executes rapid, data-driven trades. We examine the system-level implications of integrating these two components, focusing on structural trade-offs between speed and deliberation, the governance of shared memory and policy buffers, and the infrastructure required for real-time deployment. Robustness is assessed through adversarial market scenarios, and fairness is considered in the context of unequal access to explanatory outputs. Sustainability concerns such as computational energy consumption are addressed alongside policy recommendations for regulatory oversight. By embedding explainability directly into the optimization loop, the proposed framework aims to bridge the gap between automated portfolio management and human accountability. The paper further discusses cross-domain comparisons with autonomous vehicle decision systems and clinical diagnostic tools, highlighting the broader socio-technical challenges of deploying hybrid AI systems in high-stakes financial environments.

References

1. Markowitz, H. (1952). Portfolio selection. The Journal of Finance, 7(1), 77-91.

2. Moody, J., & Saffell, M. (2001). Learning to trade via direct reinforcement. IEEE Transactions on Neural Networks, 12(4), 875-889.

3. Doshi-Velez, F., & Kim, B. (2017). Towards a rigorous science of interpretable machine learning. arXiv preprint arXiv:1702.08608.

4. Brown, T. B., Mann, B., Ryder, N., Subbiah, M., Kaplan, J., Dhariwal, P., ... & Amodei, D. (2020). Language models are few-shot learners. Advances in Neural Information Processing Systems, 33, 1877-1901.

5. Kahneman, D. (2011). Thinking, fast and slow. Farrar, Straus and Giroux.

6. Moody, J., Wu, L., Liao, Y., & Saffell, M. (1998). Performance functions and reinforcement learning for trading systems and portfolios. Journal of Forecasting, 17(5-6), 441-470.

7. Deng, Y., Bao, F., Kong, Y., Ren, Z., & Dai, Q. (2016). Deep direct reinforcement learning for financial signal representation and trading. IEEE Transactions on Neural Networks and Learning Systems, 28(3), 653-664.

8. Lundberg, S. M., & Lee, S.-I. (2017). A unified approach to interpreting model predictions. Advances in Neural Information Processing Systems, 30, 4765-4774.

9. Ahn, M., Brohan, A., Brown, N., Chebotar, Y., Cortes, O., David, B., ... & Zhang, J. (2022). Do as I can, not as I say: Grounding language in robotic affordances. arXiv preprint arXiv:2204.01691.

10. Lake, B. M., Ullman, T. D., Tenenbaum, J. B., & Gershman, S. J. (2017). Building machines that learn and think like people. Behavioral and Brain Sciences, 40, e253.

11. Luketina, J., Nardelli, N., Farquhar, G., Foerster, J., Whiteson, S., & Rocktäschel, T. (2019). A survey of reinforcement learning informed by natural language. arXiv preprint arXiv:1906.03926.

12. Botvinick, M., Ritter, S., Wang, J. X., Kurth-Nelson, Z., Blundell, C., & Hassabis, D. (2019). Reinforcement learning, fast and slow. Trends in Cognitive Sciences, 23(5), 408-422.

13. Codevilla, F., Müller, M., López, A., Koltun, V., & Dosovitskiy, A. (2018). End-to-end driving via conditional imitation learning. In 2018 IEEE International Conference on Robotics and Automation (ICRA) (pp. 4693-4700). IEEE.

14. Dwork, C., & Roth, A. (2014). The algorithmic foundations of differential privacy. Foundations and Trends in Theoretical Computer Science, 9(3-4), 211-407.

15. European Commission. (2021). Proposal for a regulation laying down harmonised rules on artificial intelligence (Artificial Intelligence Act). COM(2021) 206 final.

16. Goodfellow, I. J., Shlens, J., & Szegedy, C. (2015). Explaining and harnessing adversarial examples. arXiv preprint arXiv:1412.6572.

17. Mehrabi, N., Morstatter, F., Saxena, N., Lerman, K., & Galstyan, A. (2021). A survey on bias and fairness in machine learning. ACM Computing Surveys, 54(6), 1-35.

18. Dou, Z., Cui, D., Yan, J., Wang, W., Chen, B., Wang, H., ... & Zhang, S. (2025). Dsadf: Thinking fast and slow for decision making. arXiv preprint arXiv:2505.08189.

19. McMahan, B., Moore, E., Ramage, D., Hampson, S., & y Arcas, B. A. (2017). Communication-efficient learning of deep networks from decentralized data. In Artificial Intelligence and Statistics (pp. 1273-1282). PMLR.

20. Zohar, A. (2025). Explainable reinforcement learning in finance: A survey. Journal of Financial Data Science, 7(2), 45-68.

21. Zhang, C., Li, Y., & Liu, H. (2024). Large language models for portfolio management: A comprehensive review. Quantitative Finance, 24(3), 301-325.

Downloads

Published

2026-05-25

How to Cite

Manoj Gandhi, Jean Korhonen, & Clifford Ortega. (2026). Explainable Financial Portfolio Optimization via Dual-System Large Language Model Reinforcement Learning. International Journal of Artificial Intelligence Research, 1(2). Retrieved from https://isipress.org/index.php/IJAIR/article/view/173