William Whitaker. “Refining Decision Boundaries via Stepwise Reinforcement Learning from Human Feedback Integrating Intermediate Logic Verification and Large Language Model Reasoning”. International Journal of Artificial Intelligence Research 1, no. 2 (May 13, 2026). Accessed May 14, 2026. https://isipress.org/index.php/IJAIR/article/view/152.