[2022-Jun-15] Rethinking Policy Improvement in Reinforcement Learning

Institute of Information Systems and Applications
Speaker:	Prof. Ping-Chun Hsieh, Assistant Professor謝秉均教授 National Yang Ming Chiao Tung University
Topic:	Rethinking Policy Improvement in Reinforcement Learning
Date:	13:20-15:00 Wednesday 15-Jun-2022
QR Code:
Link:	https://meet.google.com/zry-pxjc-htx
Hosted by:	Prof. Chun-Yi Lee

Abstract

Policy improvement is one central component of any reinforcement learning (RL) algorithm, and the most widely-used approach is to leverage the policy gradient (PG) theorem to iteratively improve the learned policy. Despite the success of PG, it could suffer from inefficient training in various settings. In this talk, I will go beyond PG and introduce two new policy improvement frameworks:

(i) First, I will introduce the action-constrained RL problem and discuss the critical “zero-gradient issue” resulting from PG. Then, I will present Frank-Wolfe policy optimization, which is a decoupling framework that completely resolves the challenging zero-gradient issue.

(ii) Next, I will present Hinge policy optimization (HPO), which rethinks policy updates as solving a large-margin classification problem with hinge loss. The HPO framework opens up a whole new family of RL algorithms, including PPO with a clipped surrogate objective (PPO-clip) as a special case. Moreover, we formally prove that HPO attains a globally optimal policy. To our knowledge, this is the first global convergence guarantee for the PPO-clip algorithm.

Finally, experimental results will also be presented to corroborate the effectiveness of the two frameworks.

Bio.

Ping-Chun Hsieh is currently an assistant professor in the Department of Computer Science at National Yang Ming Chiao Tung University (NYCU). He received his B.S. and M.S. in Electrical Engineering from National Taiwan University in 2011 and 2013, respectively, and his Ph.D. degree in Electrical and Computer Engineering from Texas A&M University (TAMU) in 2018. His research interests include reinforcement learning, multi-armed bandits, and wireless networks.

His research received the Best Paper Awards from ACM MobiHoc 2020 and ACM MobiHoc 2017. He is a recipient of Junior Faculty Award (黃培城青年講座) from NYCU, Young Scholar Fellowship (愛因斯坦計畫) from the Ministry of Science and Technology in 2019, the Outstanding PhD Student Award from the ECE Department at TAMU in 2016, and the Government Scholarship to Study Abroad from the Ministry of Education, Taiwan.

All faculty and students are welcome to join.

Click Num:

Share