Adaptive Deep Reinforcement Learning for Robust Control of Uncertain Dynamic Systems

Authors

  • Zhe Liu School of Automation Science and Electrical Engineering, Beihang University
  • Kevin M. Reynolds Department of Aerospace Engineering, Texas A&M University
  • Bo Li Department of Mechanical Engineering, The Hong Kong Polytechnic University

DOI:

https://doi.org/10.66280/ijair.v1i1.3

Keywords:

uncertain systems; adaptive control; robust control; deep reinforcement learning; actor–critic.

Abstract

Uncertainty is a defining feature of many control problems: parameters drift, actuators saturate or slow down, and disturbances do not follow a single tidy model. When the mismatch between a nominal model and the real plant grows, classical designs that work well in a narrow envelope can lose tracking quality or violate safety limits. This paper examines adaptive deep reinforcement learning for robust control of uncertain nonlinear systems. The control task is posed as a constrained continuous-control problem. We train an actor–critic policy over a family of randomized dynamics and augment the observa- tion with lightweight identification cues extracted from short histories of state and input. At execution time, a small safety layer enforces hard command bounds. Across several uncertain benchmark systems, the resulting controller shows improved ro- bustness to parameter drift and disturbance bursts, with lower violation rates than fixed-gain baselines. We also report sensitivity studies (randomization width, observation latency, and history length) and summarize engineering lessons that matter for deployment.

References

[1] H. K. Khalil, Nonlinear Systems, 3rd. Prentice Hall, 2002.

[2] E. F. Camacho and C. Bordons, Model Predictive Control. Springer, 2013.

[3] R. S. Sutton and A. G. Barto, Reinforcement Learning: An Introduction, 2nd. MIT Press, 2018.

[4] R. S. Sutton, D. A. McAllester, S. Singh, and Y. Mansour, “Policy gradient methods for re- inforcement learning with function approximation,” in Advances in Neural Information Pro- cessing Systems (NeurIPS), 2000.

[5] V. R. Konda and J. N. Tsitsiklis, “Actor-critic algorithms,” in Advances in Neural Information Processing Systems (NeurIPS), 2000.

[6] V. Mnih, K. Kavukcuoglu, D. Silver, A. A. Rusu, J. Veness, et al., “Human-level control through deep reinforcement learning,” Nature, vol. 518, no. 7540, pp. 529–533, 2015.

[7] T. P. Lillicrap et al., “Continuous control with deep reinforcement learning,” arXiv preprint arXiv:1509.02971, 2015.

[8] S. Fujimoto, H. van Hoof, and D. Meger, “Addressing function approximation error in actor- critic methods,” in Proceedings of the 35th International Conference on Machine Learning (ICML), 2018.

[9] T. Haarnoja, A. Zhou, P. Abbeel, and S. Levine, “Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor,” in Proceedings of the 35th International Conference on Machine Learning (ICML), 2018.

[10] G. Brockman et al., “Openai gym,” in arXiv preprint arXiv:1606.01540, 2016.

[11] Y. Tassa et al., “Deepmind control suite,” arXiv preprint arXiv:1801.00690, 2018.

[12] P. Henderson, R. Islam, P. Bachman, J. Pineau, D. Precup, and D. Meger, “Deep reinforce- ment learning that matters,” Proceedings of the AAAI Conference on Artificial Intelligence, 2018.

[13] J. Tobin, R. Fong, A. Ray, J. Schneider, W. Zaremba, and P. Abbeel, “Domain randomiza- tion for transferring deep neural networks from simulation to the real world,” in IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2017.

[14] X. B. Peng, M. Andrychowicz, W. Zaremba, and P. Abbeel, “Sim-to-real transfer of robotic control with dynamics randomization,” in Robotics: Science and Systems (RSS), 2018.

[15] J. Achiam, D. Held, A. Tamar, and P. Abbeel, “Constrained policy optimization,” in Proceed- ings of the 34th International Conference on Machine Learning (ICML), 2017.

[16] J. García and F. Fernández, “A comprehensive survey on safe reinforcement learning,” Journal of Machine Learning Research, vol. 16, pp. 1437–1480, 2015.

[17] D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” in International Conference on Learning Representations (ICLR), 2015.

[18] J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov, “Proximal policy optimiza- tion algorithms,” in arXiv preprint arXiv:1707.06347, 2017.

[19] V. Mnih et al., “Asynchronous methods for deep reinforcement learning,” in Proceedings of the 33rd International Conference on Machine Learning (ICML), 2016.

[20] J. Schulman, S. Levine, P. Moritz, M. I. Jordan, and P. Abbeel, “Trust region policy opti- mization,” in Proceedings of the 32nd International Conference on Machine Learning (ICML), 2015.

[21] M. Hessel et al., “Rainbow: Combining improvements in deep reinforcement learning,” Pro- ceedings of the AAAI Conference on Artificial Intelligence, 2018.

[22] A. Kumar, A. Zhou, G. Tucker, and S. Levine, “Conservative q-learning for offline reinforce- ment learning,” Advances in Neural Information Processing Systems (NeurIPS), 2020.

[23] J. Schulman, P. Moritz, S. Levine, M. I. Jordan, and P. Abbeel, “High-dimensional continuous control using generalized advantage estimation,” in International Conference on Learning Representations (ICLR), 2016.

[24] G. Dalal, D. Gilboa, S. Mannor, and N. Shimkin, “Safe exploration in continuous action spaces,” arXiv preprint arXiv:1801.08757, 2018.

[25] D. Silver, G. Lever, N. Heess, T. Degris, D. Wierstra, and M. Riedmiller, “Deterministic policy gradient algorithms,” Proceedings of the 31st International Conference on Machine Learning (ICML), 2014.

Downloads

Published

2026-01-30 — Updated on 2026-03-02

Versions

How to Cite

Liu, Z., Reynolds, K. M., & Li, B. (2026). Adaptive Deep Reinforcement Learning for Robust Control of Uncertain Dynamic Systems. International Journal of Artificial Intelligence Research, 1(1). https://doi.org/10.66280/ijair.v1i1.3 (Original work published January 30, 2026)