Empowering Sequential Decision Intelligence via Long Term Memory Augmentation and Recurrent Reinforcement Learning within Large Language Model Architectures

Authors

  • Jason Brooks Department of Computer Science and Engineering, University of North Texas
  • Stephen Hensley School of Electrical Engineering and Computer Science, Oregon State University

DOI:

https://doi.org/10.66280/ijair.v1i2.150

Keywords:

Sequential Decision Intelligence, Long-Term Memory Augmentation, Recurrent Reinforcement Learning, Large Language Model Architectures, Socio-Technical Infrastructure, Algorithmic Governance, Decision Support Systems.

Abstract

The integration of generative pre-trained transformers into complex decision-making pipelines has revealed significant limitations regarding temporal consistency and the retention of stateful information over extended operational horizons. While large language models demonstrate remarkable zero-shot reasoning capabilities, their inherent reliance on static context windows often results in catastrophic forgetting or context-drift when applied to sequential decision tasks. This paper explores the architectural convergence of long-term memory augmentation and recurrent reinforcement learning as a means to empower sequential decision intelligence. By moving beyond the limitations of the attention-only paradigm, we propose a systemic framework that incorporates externalized memory structures and recursive feedback loops to stabilize policy generation. We analyze the structural trade-offs between computational overhead and cognitive fidelity, emphasizing the necessity of robust socio-technical infrastructures to support these high-stakes deployments. Furthermore, the discussion extends to the governance of autonomous systems, addressing critical concerns of fairness, algorithmic bias, and the long-term sustainability of large-scale intelligence infrastructures. Through a detailed conceptual analysis, we argue that the future of decision intelligence lies not in increasing parameter counts alone, but in the sophisticated management of state and experience across multi-modal and multi-temporal environments.

References

1.Abadi, M., Chu, A., Goodfellow, I., McMahan, H. B., Mironov, I., Talwar, K., & Zhang, L. (2016). Deep learning with differential privacy. Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security, 308–318.

2.Amodei, D., Olah, C., Steinhardt, J., Christiano, P., Schulman, J., & Mané, D. (2016). Concrete problems in AI safety. arXiv preprint arXiv:1606.06565.

3.Bender, E. M., Gebru, T., McMillan-Major, A., & Shmitchell, S. (2021). On the dangers of stochastic parrots: Can language models be too big? Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency, 610–623.

4.Bengio, Y., Lecun, Y., & Hinton, G. (2021). Deep learning for AI. Communications of the ACM, 64(7), 58–65.

5.Brown, T., Mann, B., Ryder, N., Subbiah, M., Kaplan, J. D., Dhariwal, P., ... & Amodei, D. (2020). Language models are few-shot learners. Advances in Neural Information Processing Systems, 33, 1877–1901.

6.Chen, B. C., & Zhang, T. (2024). Scaling laws for memory-augmented neural networks. Journal of Artificial Intelligence Research, 79, 445–482.

7.Crawford, K. (2021). The Atlas of AI: Power, Politics, and the Planetary Costs of Artificial Intelligence. Yale University Press.

8.Deng, L., & Liu, Y. (2023). Deep learning in business analytics: A review of sequential decision models. Operations Research Perspectives, 10, 100256.

9.Diakopoulos, N. (2019). Automating the News: How Algorithms Are Rewriting the Media. Harvard University Press.

10.Dou, Z., Cui, D., Yan, J., Wang, W., Chen, B., Wang, H., ... & Zhang, S. (2025). Dsadf: Thinking fast and slow for decision making. arXiv preprint arXiv:2505.08189.

11.Floridi, L., & Cowls, J. (2019). A unified framework of five ethical principles for AI in society. Harvard Data Science Review, 1(1).

12.Gu, A., & Dao, T. (2023). Mamba: Linear-time sequence modeling with selective state spaces. arXiv preprint arXiv:2312.00752.

13.Haenlein, M., & Kaplan, A. (2019). A brief history of artificial intelligence: On the past, present, and future of artificial intelligence. California Management Review, 61(4), 5–14.

14.Hernandez, D., & Brown, T. B. (2020). Measuring the algorithmic efficiency of AI. arXiv preprint arXiv:2005.04305.

15.Jobin, A., Ienca, M., & Vayena, E. (2019). The global landscape of AI ethics guidelines. Nature Machine Intelligence, 1(9), 389–399.

16.Kaplan, J., McCandlish, S., Henighan, T., Brown, T. B., Chess, B., Child, R., ... & Amodei, D. (2020). Scaling laws for neural language models. arXiv preprint arXiv:2001.08361.

17.Kirkpatrick, J., Pascanu, R., Rabinowitz, N., Veness, J., Desjardins, G., Rusu, A. A., ... & Hadsell, R. (2017). Overcoming catastrophic forgetting in neural networks. Proceedings of the National Academy of Sciences, 114(13), 3521–3526.

18.Lake, B. M., Ullman, T. D., Tenenbaum, J. B., & Gershman, S. J. (2017). Building machines that learn and think like people. Behavioral and Brain Sciences, 40.

19.Lewis, P., Perez, E., Piktus, A., Petroni, F., Karpukhin, V., Goyal, N., ... & Kiela, D. (2020). Retrieval-augmented generation for knowledge-intensive NLP tasks. Advances in Neural Information Processing Systems, 33, 9459–9474.

20.Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A. A., Veness, J., Bellemare, M. G., ... & Hassabis, D. (2015). Human-level control through deep reinforcement learning. Nature, 518(7540), 529–533.

21.Noble, S. U. (2018). Algorithms of Oppression: How Search Engines Reinforce Racism. NYU Press.

22.OpenAI. (2023). GPT-4 Technical Report. arXiv preprint arXiv:2303.08774.

23.Parisi, G. I., Kemker, R., Part, J. L., Kanan, C., & Wermter, S. (2019). Continual lifelong learning with neural networks: A review. Neural Networks, 113, 54–71.

24.Paszke, A., Gross, S., Massa, F., Lerer, A., Bradbury, J., Chanan, G., ... & Chintala, S. (2019). PyTorch: An imperative style, high-performance deep learning library. Advances in Neural Information Processing Systems, 32.

25.Russell, S. (2019). Human Compatible: Artificial Intelligence and the Problem of Control. Viking.

26.Silver, D., Hubert, T., Schrittwieser, J., Antonoglou, I., Lai, M., Guez, A., ... & Hassabis, D. (2018). A general reinforcement learning algorithm that masters chess, shogi, and go through self-play. Science, 362(6419), 1140–1144.

27.Sutton, R. S., & Barto, A. G. (2018). Reinforcement Learning: An Introduction. MIT Press.

28.Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., ... & Polosukhin, I. (2017). Attention is all you need. Advances in Neural Information Processing Systems, 30.

29.Wallach, W., & Allen, C. (2008). Moral Machines: Teaching Robots Right from Wrong. Oxford University Press.

30.Wang, J. X., Kurth-Nelson, Z., Tirumala, S., Hubert, T., Alden, M., Rezende, D., ... & Botvinick, M. (2018). Learning to reinforcement learn. arXiv preprint arXiv:1611.05763.

31.Wei, J., Wang, X., Schuurmans, D., Bosma, M., Fei, Fe, Chi, E., ... & Zhou, D. (2022). Chain-of-thought prompting elicits reasoning in large language models. Advances in Neural Information Processing Systems, 35, 24824–24837.

32.Wiener, N. (1948). Cybernetics: Or Control and Communication in the Animal and the Machine. MIT Press.

33.Zuboff, S. (2019). The Age of Surveillance Capitalism: The Fight for a Human Future at the New Frontier of Power. PublicAffairs.

34.Zadeh, A., Liang, P. P., Mazumder, N., Poria, S., Cambria, E., & Morency, L. P. (2018). Multi-attention recurrent network for multi-modal sentiment analysis. AAAI Conference on Artificial Intelligence.

35.Zhao, W. X., Zhou, K., Li, J., Tang, T., Wang, X., Hou, Y., ... & Wen, J. R. (2023). A survey of large language models. arXiv preprint arXiv:2303.18223.

Downloads

Published

2026-05-13

How to Cite

Jason Brooks, & Stephen Hensley. (2026). Empowering Sequential Decision Intelligence via Long Term Memory Augmentation and Recurrent Reinforcement Learning within Large Language Model Architectures. International Journal of Artificial Intelligence Research, 1(2). https://doi.org/10.66280/ijair.v1i2.150