notes from John Schulman's Deep Reinforcement Learning lectures for MLSS 2016 in Cadiz.

Lecture 1
Lecture 2
Lecture 3
Lecture 4

Broadly, two approaches to RL:

There are also actor-critic methods which are policy gradient methods that use value functions.

Deep reinforcement learning is just reinforcement learning with nonlinear function approximators, usually updating parameters with stochastic gradient descent.


Policies may be parameterized, i.e. $\pi_{\theta}$

Cross-entropy method (a DFO/evolutionary algorithm, for parameterized policies/policy optimization):