Deep Reinforcement Learning for Dynamic Portfolio Optimization in Financial Markets

Authors

  • Helena J. Sterling Department of Systems Engineering, Colorado School of Mines
  • Marcus V. Thorne School of Computing and Information, University of Pittsburgh

DOI:

https://doi.org/10.66280/ijair.v1i1.81

Abstract

The integration of Deep Reinforcement Learning (DRL) into portfolio management represents a significant evolution from classical Mean-Variance Optimization and modern econometric frameworks. In a landscape defined by high-frequency data, non-linear dependencies, and stochastic market regimes, the ability of autonomous agents to learn optimal sequential decision-making policies offers a compelling alternative to static or rule-based allocation strategies. This paper provides an extensive system-level investigation into the deployment of DRL architectures for dynamic portfolio optimization. We explore the architectural tensions between actor-critic frameworks and value-based methods, emphasizing the importance of state-space representation and reward function engineering in complex financial environments. Beyond technical performance, the research scrutinizes the socio-technical infrastructure required for such deployments, addressing critical dimensions of algorithmic governance, systemic risk, and the environmental cost of large-scale computational finance. We analyze the implications of model convergence and crowded trades, arguing for a robust regulatory framework that balances innovation with market stability. Furthermore, the paper examines the ethical imperatives of fairness and transparency in automated wealth management, proposing a roadmap for the transition toward sustainable and interpretable financial AI. By synthesizing insights from computer science, engineering, and financial policy, this work situates DRL not merely as a mathematical tool, but as a transformative agent within the global socio-technical infrastructure of capital markets.

References

1.Abadie, A. (2021). Using machine learning for volatility estimation and prediction. Journal of Economic Literature, 59(2), 606-640.

2.Qi, R. (2025, August). Interpretable Slow-Moving Inventory Forecasting: A Hybrid Neural Network Approach with Interactive Visualization. In Proceedings of the 2025 International Conference on Generative Artificial Intelligence for Business (pp. 41-46).

3.Arratia, A. (2014). Computational Finance: An Introductory Course with R. Atlantis Press.

4.Baird, L. (1995). Residual algorithms: Reinforcement learning with function approximation. Proceedings of the 12th International Conference on Machine Learning.

5.Bengio, Y., et al. (2013). Representation learning: A review and new perspectives. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(8), 1798-1828.

6.Liu, T. (2022, December). Financial Constraint’Impact on Firms’ ESG Rating Based on Chinese Stock Market. In 2022 4th International Conference on Economic Management and Cultural Industry (ICEMCI 2022) (pp. 1085-1095). Atlantis Press.

7.Bertsekas, D. P. (2019). Reinforcement Learning and Optimal Control. Athena Scientific.

8.Black, F., & Litterman, R. (1992). Global portfolio optimization. Financial Analysts Journal, 48(5), 28-43.

9.Bollerslev, T. (1986). Generalized autoregressive conditional heteroskedasticity. Journal of Econometrics, 31(3), 307-327.

10.Brock, W. A., Lakonishok, J., & LeBaron, B. (1992). Simple technical trading rules and the stochastic properties of stock returns. The Journal of Finance, 47(5), 1731-1764.

11.Yi, X. (2026). Trusted AI Commercialization Infrastructure for SMBs: A Unified Multi-Tenant Architecture Integrating Incentive Systems, Content Governance, and Standardized Recommendation APIs.

12.Chen, T., & Guestrin, C. (2016). XGBoost: A scalable tree boosting system. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.

13.Dixon, M. B., Halperin, I., & Bilokon, P. (2020). Machine Learning in Institutional Finance. O'Reilly Media.

14.Fischer, T., & Krauss, C. (2018). Deep learning with long short-term memory networks for financial market predictions. European Journal of Operational Research, 270(2), 654-669.

15.Goodfellow, I., Bengio, Y., & Courville, A. (2016). Deep Learning. MIT Press.

16.Gu, S., Kelly, B., & Xiu, D. (2020). Empirical asset pricing via machine learning. The Review of Financial Studies, 33(5), 2223-2273.

17.Liu, T. (2026). A Comparative Study of Transformer-Based and Classical Models for Financial Time-Series Forecasting. Journal of Risk and Financial Management, 19(3), 203.

18.Haarnoja, T., et al. (2018). Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. International Conference on Machine Learning.

19.Hull, J. C. (2021). Machine Learning in Business: An Introduction to the World of Data Science. Pearson.

20.Qi, R. (2025, July). DecisionFlow for SMEs: A lightweight visual framework for multi-task joint prediction and anomaly detection. In Proceedings of the 2025 International Conference on Economic Management and Big Data Application (pp. 899-903).

21.Jiang, Z., Xu, D., & Liang, J. (2017). A deep reinforcement learning framework for the financial portfolio management problem. arXiv preprint arXiv:1706.10059.

22.Zhou, D. (2025, December). M-VP2: Microservice-Oriented Vulnerability Patch Planning-A Cost-Aware Approachusing Multi-Agent Reinforcement Learning. In 2025 5th International Conference on Computer, Internet of Things and Control Engineering (CITCE) (pp. 248-254). IEEE.

23.Kingma, D. P., & Ba, J. (2014). Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980.

24.Lillicrap, T. P., et al. (2015). Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971.

25.Yi, X. (2025, October). Compliance-by-Design Micro-Licensing for AI-Generated Content in Social Commerce Using C2PA Content Credentials and W3C ODRL Policies. In 2025 7th International Conference on Machine Learning, Big Data and Business Intelligence (MLBDBI) (pp. 204-208). IEEE.

26.Lim, B., & Zohren, S. (2021). Time-series forecasting with deep learning: A survey. Philosophical Transactions of the Royal Society A, 379(2194), 20200209.

27.Lopez de Prado, M. (2018). Advances in Financial Machine Learning. John Wiley & Sons.

28.Markowitz, H. (1952). Portfolio selection. The Journal of Finance, 7(1), 77-91.

29.Mnih, V., et al. (2015). Human-level control through deep reinforcement learning. Nature, 518(7540), 529-533.

30.Yi, X. (2025, October). Real-Time Fair-Exposure Ad Allocation for SMBs and Underserved Creators via Contextual Bandits-with-Knapsacks. In Proceedings of the 2025 2nd International Conference on Digital Economy and Computer Science (pp. 1602-1607).

31.Paszke, A., et al. (2019). PyTorch: An imperative style, high-performance deep learning library. Advances in Neural Information Processing Systems.

32.Rossi, G. (2018). Socio-Technical Systems and the Finance Industry. Routledge.

33.Schulman, J., et al. (2017). Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347.

34.Zhang, T. (2025, October). From Black Box to Actionable Insights: An Adaptive Explainable AI Framework for Proactive Tax Risk Mitigation in Small and Medium Enterprises. In Proceedings of the 2025 2nd International Conference on Digital Economy and Computer Science (pp. 193-199).

35.Schwartz, R., et al. (2020). Green AI. Communications of the ACM, 63(12), 54-63.

36.Sutton, R. S., & Barto, A. G. (2018). Reinforcement Learning: An Introduction. MIT Press.

37.Taleb, N. N. (2007). The Black Swan: The Impact of the Highly Improbable. Random House.

38.Vaswani, A., et al. (2017). Attention is all you need. Advances in Neural Information Processing Systems.

Downloads

Published

2026-03-16

How to Cite

Helena J. Sterling, & Marcus V. Thorne. (2026). Deep Reinforcement Learning for Dynamic Portfolio Optimization in Financial Markets. International Journal of Artificial Intelligence Research, 1(1). https://doi.org/10.66280/ijair.v1i1.81