notes from John Schulman's Deep Reinforcement Learning lectures for MLSS 2016 in Cadiz.

Lecture 1
Lecture 2
Lecture 3
Lecture 4


Broadly, two approaches to RL:

There are also actor-critic methods which are policy gradient methods that use value functions.

Deep reinforcement learning is just reinforcement learning with nonlinear function approximators, usually updating parameters with stochastic gradient descent.


Policies:

Policies may be parameterized, i.e. $\pi_{\theta}$


Cross-entropy method (a DFO/evolutionary algorithm, for parameterized policies/policy optimization):