Olivetti Club

Dylan Peifer

Policy Gradient

Tuesday, December 3, 2019 - 4:30pm

Malott 406

Reinforcement learning is the study of methods for making choices and learning strategies. One standard method for solving reinforcement learning problems is policy gradient, which improves the parameters of a differentiable policy by moving them in the direction of the gradient of expected return. In this talk we will derive the most basic policy gradient algorithm, sometimes called REINFORCE, and apply it to a few simple problems. Then we will discuss modern improvements such as generalized advantage estimation and more sophisticated algorithms such as proximal policy optimization.

Refreshments will be served in the lounge at 4:00 PM.