Next: RL theory II [3 Up: MLB_Exercises_2012 Previous: Genetic Coding [4 P]

RL theory I [3 P]

Prove Corollary 1.3 (p. 9) from the script Theory of Reinforcement Learning ³:

Every policy $\pi$ for which $V^{\pi}$ satisfies the Bellman optimality equations

$\displaystyle V^{\pi}(s) = \max_{a \in A_s} Q^{\pi}(s, a) \forall s \in S$

is optimal.

Haeusler Stefan 2013-01-16