Deep Reinforcement Learning for Dynamic Portfolio Optimization in Financial Markets

Helena J. Sterling; Marcus V. Thorne

doi:10.66280/ijair.v1i1.81

Authors

Helena J. Sterling Department of Systems Engineering, Colorado School of Mines
Marcus V. Thorne School of Computing and Information, University of Pittsburgh

DOI:

https://doi.org/10.66280/ijair.v1i1.81

Abstract

The integration of Deep Reinforcement Learning (DRL) into portfolio management represents a significant evolution from classical Mean-Variance Optimization and modern econometric frameworks. In a landscape defined by high-frequency data, non-linear dependencies, and stochastic market regimes, the ability of autonomous agents to learn optimal sequential decision-making policies offers a compelling alternative to static or rule-based allocation strategies. This paper provides an extensive system-level investigation into the deployment of DRL architectures for dynamic portfolio optimization. We explore the architectural tensions between actor-critic frameworks and value-based methods, emphasizing the importance of state-space representation and reward function engineering in complex financial environments. Beyond technical performance, the research scrutinizes the socio-technical infrastructure required for such deployments, addressing critical dimensions of algorithmic governance, systemic risk, and the environmental cost of large-scale computational finance. We analyze the implications of model convergence and crowded trades, arguing for a robust regulatory framework that balances innovation with market stability. Furthermore, the paper examines the ethical imperatives of fairness and transparency in automated wealth management, proposing a roadmap for the transition toward sustainable and interpretable financial AI. By synthesizing insights from computer science, engineering, and financial policy, this work situates DRL not merely as a mathematical tool, but as a transformative agent within the global socio-technical infrastructure of capital markets.

References

1.Abadie, A. (2021). Using machine learning for volatility estimation and prediction. Journal of Economic Literature, 59(2), 606-640.

2.Qi, R. (2025, August). Interpretable Slow-Moving Inventory Forecasting: A Hybrid Neural Network Approach with Interactive Visualization. In Proceedings of the 2025 International Conference on Generative Artificial Intelligence for Business (pp. 41-46).

3.Arratia, A. (2014). Computational Finance: An Introductory Course with R. Atlantis Press.

4.Baird, L. (1995). Residual algorithms: Reinforcement learning with function approximation. Proceedings of the 12th International Conference on Machine Learning.

5.Bengio, Y., et al. (2013). Representation learning: A review and new perspectives. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(8), 1798-1828.

6.Liu, T. (2022, December). Financial Constraint’Impact on Firms’ ESG Rating Based on Chinese Stock Market. In 2022 4th International Conference on Economic Management and Cultural Industry (ICEMCI 2022) (pp. 1085-1095). Atlantis Press.

7.Bertsekas, D. P. (2019). Reinforcement Learning and Optimal Control. Athena Scientific.

8.Black, F., & Litterman, R. (1992). Global portfolio optimization. Financial Analysts Journal, 48(5), 28-43.

9.Bollerslev, T. (1986). Generalized autoregressive conditional heteroskedasticity. Journal of Econometrics, 31(3), 307-327.

10.Brock, W. A., Lakonishok, J., & LeBaron, B. (1992). Simple technical trading rules and the stochastic properties of stock returns. The Journal of Finance, 47(5), 1731-1764.

11.Yi, X. (2026). Trusted AI Commercialization Infrastructure for SMBs: A Unified Multi-Tenant Architecture Integrating Incentive Systems, Content Governance, and Standardized Recommendation APIs.

12.Chen, T., & Guestrin, C. (2016). XGBoost: A scalable tree boosting system. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.

13.Dixon, M. B., Halperin, I., & Bilokon, P. (2020). Machine Learning in Institutional Finance. O'Reilly Media.

14.Fischer, T., & Krauss, C. (2018). Deep learning with long short-term memory networks for financial market predictions. European Journal of Operational Research, 270(2), 654-669.

15.Goodfellow, I., Bengio, Y., & Courville, A. (2016). Deep Learning. MIT Press.

16.Gu, S., Kelly, B., & Xiu, D. (2020). Empirical asset pricing via machine learning. The Review of Financial Studies, 33(5), 2223-2273.

17.Liu, T. (2026). A Comparative Study of Transformer-Based and Classical Models for Financial Time-Series Forecasting. Journal of Risk and Financial Management, 19(3), 203.

18.Haarnoja, T., et al. (2018). Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. International Conference on Machine Learning.

19.Hull, J. C. (2021). Machine Learning in Business: An Introduction to the World of Data Science. Pearson.

20.Qi, R. (2025, July). DecisionFlow for SMEs: A lightweight visual framework for multi-task joint prediction and anomaly detection. In Proceedings of the 2025 International Conference on Economic Management and Big Data Application (pp. 899-903).

21.Jiang, Z., Xu, D., & Liang, J. (2017). A deep reinforcement learning framework for the financial portfolio management problem. arXiv preprint arXiv:1706.10059.

22.Zhou, D. (2025, December). M-VP2: Microservice-Oriented Vulnerability Patch Planning-A Cost-Aware Approachusing Multi-Agent Reinforcement Learning. In 2025 5th International Conference on Computer, Internet of Things and Control Engineering (CITCE) (pp. 248-254). IEEE.

23.Kingma, D. P., & Ba, J. (2014). Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980.

24.Lillicrap, T. P., et al. (2015). Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971.

25.Yi, X. (2025, October). Compliance-by-Design Micro-Licensing for AI-Generated Content in Social Commerce Using C2PA Content Credentials and W3C ODRL Policies. In 2025 7th International Conference on Machine Learning, Big Data and Business Intelligence (MLBDBI) (pp. 204-208). IEEE.

26.Lim, B., & Zohren, S. (2021). Time-series forecasting with deep learning: A survey. Philosophical Transactions of the Royal Society A, 379(2194), 20200209.

27.Lopez de Prado, M. (2018). Advances in Financial Machine Learning. John Wiley & Sons.

28.Markowitz, H. (1952). Portfolio selection. The Journal of Finance, 7(1), 77-91.

29.Mnih, V., et al. (2015). Human-level control through deep reinforcement learning. Nature, 518(7540), 529-533.

30.Yi, X. (2025, October). Real-Time Fair-Exposure Ad Allocation for SMBs and Underserved Creators via Contextual Bandits-with-Knapsacks. In Proceedings of the 2025 2nd International Conference on Digital Economy and Computer Science (pp. 1602-1607).

31.Paszke, A., et al. (2019). PyTorch: An imperative style, high-performance deep learning library. Advances in Neural Information Processing Systems.

32.Rossi, G. (2018). Socio-Technical Systems and the Finance Industry. Routledge.

33.Schulman, J., et al. (2017). Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347.

34.Zhang, T. (2025, October). From Black Box to Actionable Insights: An Adaptive Explainable AI Framework for Proactive Tax Risk Mitigation in Small and Medium Enterprises. In Proceedings of the 2025 2nd International Conference on Digital Economy and Computer Science (pp. 193-199).

35.Schwartz, R., et al. (2020). Green AI. Communications of the ACM, 63(12), 54-63.

36.Sutton, R. S., & Barto, A. G. (2018). Reinforcement Learning: An Introduction. MIT Press.

37.Taleb, N. N. (2007). The Black Swan: The Impact of the Highly Improbable. Random House.

38.Vaswani, A., et al. (2017). Attention is all you need. Advances in Neural Information Processing Systems.

Deep Reinforcement Learning for Dynamic Portfolio Optimization in Financial Markets

Authors

DOI:

Abstract

References

Downloads

Published

How to Cite

Issue

Section

License

Make a Submission

Journal Information

Current Issue

Information

Indexing & Infrastructure