next up previous
Next: RL theory II [3 Up: MLB_Exercises_2012 Previous: Genetic Coding [4 P]

RL theory I [3 P]

Prove Corollary 1.3 (p. 9) from the script Theory of Reinforcement Learning 3:

Every policy $ \pi$ for which $ V^{\pi}$ satisfies the Bellman optimality equations

$\displaystyle V^{\pi}(s) = \max_{a \in A_s} Q^{\pi}(s, a) \forall s \in S$

is optimal.

Haeusler Stefan 2013-01-16