Jump to the main content block

[2023-Nov-29] Reward-Biased MLE: A Unified Framework for Online RL and Bandits via One Simple and Generic Objective Function

Institute of Information Systems and Applications


Dr. Ping-Chun Hsieh(謝秉均助理教授), Assistant professor, Department of Computer Science, National Yang Ming Chiao Tung University


Reward-Biased MLE: A Unified Framework for Online RL and Bandits via One Simple and Generic Objective Function


13:20-15:00 Wednesday 29-Nov-2023

QR Code:




Delta 103

Hosted by:

Prof. Te-Chuan Chiu


In online RL and bandits, one fundamental challenge is to achieve single-lifetime learning capability, which requires the learner to strike a good balance among exploration, environment estimation, and planning for high rewards, in a completely on-the-fly manner. This true online setting manifests a fundamental difference from the typical training/testing paradigm of RL.

To tackle this, we present a new family of learning algorithms, which are formulated in a general way based on the Reward-Biased Maximum Likelihood Estimation (RBMLE) principle. Notably, the RBMLE objective jointly addresses the need for exploration, planning, and model estimation. In this talk, I will introduce RBMLE in the following three steps: (i) Stochastic Bandits: I will start from the simple i.i.d. bandit setting to introduce the main idea behind RBMLE. (ii) Contextual Bandits: Based on (i), I will proceed to present how to extend RBMLE to the more realistic contextual bandits, including the Neural Contextual Bandits that leverage the representation power of neural networks in bandit problems. (iii) Linear Kernel MDPs: Finally, the discussion will be focused on extending RBMLE to one recent important online RL setting, namely the linear kernel MDPs. Some promising research directions will also be discussed.


Ping-Chun Hsieh is currently an assistant professor in the Department of Computer Science at National Yang Ming Chiao Tung University (NYCU). He received his B.S. and M.S. in Electrical Engineering from National Taiwan University in 2011 and 2013, respectively, and his Ph.D. degree in Electrical and Computer Engineering from Texas A&M University (TAMU) in 2018. His current research interests include reinforcement learning, multi-armed bandits, and Bayesian optimization. His research received the Best Paper Awards from ACM MobiHoc 2020 and ACM MobiHoc 2017. He is a recipient of Junior Faculty Award (黃培城青年講座) from NYCU, NSTC 2030 Emerging Young Scholar Program (2030新秀學者), MOST Young Scholar Fellowship (愛因斯坦計畫), the Outstanding PhD Student Award from the ECE Department at TAMU, and the Government Scholarship to Study Abroad from the Ministry of Education, Taiwan.

All faculty and students are welcome to join.

Click Num: