GCL情報理工学特別講義Ⅶ （強化学習）

GCL情報理工学特別講義Ⅶ （強化学習）／ GCL Special Lecture in Information Science and Technology Ⅶ （Reinforcement Learning）

This course aims to provide an overview of reinforcement learning (RL) algorithms. RL has achieved remarkable success in various applications, including robotic manipulation and autonomous driving. In addition, RL is also used to fine-tune large language models, and the potential of RL is still expanding. We start with basic concepts such as a Markov decision process and the value functions and see popular algorithms such as TD3 and soft actor critic. We will also take a look at recent topics such as offline RL.

MIMA Search

授業計画

April 8: Introduction, overview of the fields of reinforcement learning and robot learning April 15: Bandit problems, Bayesian optimization April 22: Categories of RL / MDP / value function / Bellman equations April 30: DQN / value & policy iteration / RL as density estimation / MaxEnt and Boltzmann distribution / Policy gradient May 13: Distributed Q-learning / QT-Opt / Policy learning May 20: Policy gradient, REINFORCE / variance reduction / policy gradient with function approximation May 27: On-policy actor critic / NAC / A3C, TRPO / PPO / GAE June 10: Off-policy actor critic, DDPG / TD3 / KL-regularized RL / SAC / MPO June 17: Inverse RL June 24: Gradient-free methods / Model-based RL July 1: Offline RL / AWAC/ implicit Q-learning / TD3+BC / large models / diffusion policies July 8: Advanced topic (Hierarchical RL, goal-conditioned RL / HER, intrinsic reward) July 15: Review & wrap up

授業の方法

Online lecture via Zoom.

成績評価方法

attendance, and reports.

教科書

There is no specific text book for this lecture.

参考書

Richard S. Sutton and Andrew G. Barto. Reinforcement Learning: An Introduction. Second Edition. MIT Press, Cambridge, MA, 2018.

履修上の注意

We expect students to review the materials for the previous lecture every week. Basic knowledge of machine learning, linear algebra, and probability theory is required.