Adaptive Deep Reinforcement Learning for Robust Control of Uncertain Dynamic Systems
DOI:
https://doi.org/10.66280/ijair.v1i1.3Keywords:
uncertain systems; adaptive control; robust control; deep reinforcement learning; actor–critic.Abstract
Uncertainty is a defining feature of many control problems: parameters drift, actuators saturate or slow down, and disturbances do not follow a single tidy model. When the mismatch between a nominal model and the real plant grows, classical designs that work well in a narrow envelope can lose tracking quality or violate safety limits. This paper examines adaptive deep reinforcement learning for robust control of uncertain nonlinear systems. The control task is posed as a constrained continuous-control problem. We train an actor–critic policy over a family of randomized dynamics and augment the observa- tion with lightweight identification cues extracted from short histories of state and input. At execution time, a small safety layer enforces hard command bounds. Across several uncertain benchmark systems, the resulting controller shows improved ro- bustness to parameter drift and disturbance bursts, with lower violation rates than fixed-gain baselines. We also report sensitivity studies (randomization width, observation latency, and history length) and summarize engineering lessons that matter for deployment.
References
[1] H. K. Khalil, Nonlinear Systems, 3rd. Prentice Hall, 2002.
[2] E. F. Camacho and C. Bordons, Model Predictive Control. Springer, 2013.
[3] R. S. Sutton and A. G. Barto, Reinforcement Learning: An Introduction, 2nd. MIT Press, 2018.
[4] R. S. Sutton, D. A. McAllester, S. Singh, and Y. Mansour, “Policy gradient methods for re- inforcement learning with function approximation,” in Advances in Neural Information Pro- cessing Systems (NeurIPS), 2000.
[5] V. R. Konda and J. N. Tsitsiklis, “Actor-critic algorithms,” in Advances in Neural Information Processing Systems (NeurIPS), 2000.
[6] V. Mnih, K. Kavukcuoglu, D. Silver, A. A. Rusu, J. Veness, et al., “Human-level control through deep reinforcement learning,” Nature, vol. 518, no. 7540, pp. 529–533, 2015.
[7] T. P. Lillicrap et al., “Continuous control with deep reinforcement learning,” arXiv preprint arXiv:1509.02971, 2015.
[8] S. Fujimoto, H. van Hoof, and D. Meger, “Addressing function approximation error in actor- critic methods,” in Proceedings of the 35th International Conference on Machine Learning (ICML), 2018.
[9] T. Haarnoja, A. Zhou, P. Abbeel, and S. Levine, “Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor,” in Proceedings of the 35th International Conference on Machine Learning (ICML), 2018.
[10] G. Brockman et al., “Openai gym,” in arXiv preprint arXiv:1606.01540, 2016.
[11] Y. Tassa et al., “Deepmind control suite,” arXiv preprint arXiv:1801.00690, 2018.
[12] P. Henderson, R. Islam, P. Bachman, J. Pineau, D. Precup, and D. Meger, “Deep reinforce- ment learning that matters,” Proceedings of the AAAI Conference on Artificial Intelligence, 2018.
[13] J. Tobin, R. Fong, A. Ray, J. Schneider, W. Zaremba, and P. Abbeel, “Domain randomiza- tion for transferring deep neural networks from simulation to the real world,” in IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2017.
[14] X. B. Peng, M. Andrychowicz, W. Zaremba, and P. Abbeel, “Sim-to-real transfer of robotic control with dynamics randomization,” in Robotics: Science and Systems (RSS), 2018.
[15] J. Achiam, D. Held, A. Tamar, and P. Abbeel, “Constrained policy optimization,” in Proceed- ings of the 34th International Conference on Machine Learning (ICML), 2017.
[16] J. García and F. Fernández, “A comprehensive survey on safe reinforcement learning,” Journal of Machine Learning Research, vol. 16, pp. 1437–1480, 2015.
[17] D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” in International Conference on Learning Representations (ICLR), 2015.
[18] J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov, “Proximal policy optimiza- tion algorithms,” in arXiv preprint arXiv:1707.06347, 2017.
[19] V. Mnih et al., “Asynchronous methods for deep reinforcement learning,” in Proceedings of the 33rd International Conference on Machine Learning (ICML), 2016.
[20] J. Schulman, S. Levine, P. Moritz, M. I. Jordan, and P. Abbeel, “Trust region policy opti- mization,” in Proceedings of the 32nd International Conference on Machine Learning (ICML), 2015.
[21] M. Hessel et al., “Rainbow: Combining improvements in deep reinforcement learning,” Pro- ceedings of the AAAI Conference on Artificial Intelligence, 2018.
[22] A. Kumar, A. Zhou, G. Tucker, and S. Levine, “Conservative q-learning for offline reinforce- ment learning,” Advances in Neural Information Processing Systems (NeurIPS), 2020.
[23] J. Schulman, P. Moritz, S. Levine, M. I. Jordan, and P. Abbeel, “High-dimensional continuous control using generalized advantage estimation,” in International Conference on Learning Representations (ICLR), 2016.
[24] G. Dalal, D. Gilboa, S. Mannor, and N. Shimkin, “Safe exploration in continuous action spaces,” arXiv preprint arXiv:1801.08757, 2018.
[25] D. Silver, G. Lever, N. Heess, T. Degris, D. Wierstra, and M. Riedmiller, “Deterministic policy gradient algorithms,” Proceedings of the 31st International Conference on Machine Learning (ICML), 2014.
Downloads
Published
Versions
- 2026-03-02 (3)
- 2026-01-30 (2)
- 2026-01-30 (1)
How to Cite
Issue
Section
License
Copyright (c) 2026 International Journal of Artificial Intelligence Research

This work is licensed under a Creative Commons Attribution 4.0 International License.
This article is published under the Creative Commons Attribution 4.0 International License (CC BY 4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.



