(1)
William Whitaker. Refining Decision Boundaries via Stepwise Reinforcement Learning from Human Feedback Integrating Intermediate Logic Verification and Large Language Model Reasoning. IJAIR 2026, 1.