Next: RL application III: Self-play Up: MLB_Exercises_2010 Previous: RL application I: On-

RL application II: Function approximation [3 P]

Solve the cart-pole problem with the SARSA( $\lambda$ ) algorithm and linear function approximation. Download the Reinforcement Learning (RL) MATLAB Toolbox and the example files⁵ and adapt the cart-pole demo example to solve the task. Use the following learning parameters: $\lambda=0.95, \epsilon=0.01, \alpha=0.2$ . Normalize the parameter vector of the linear function approximation so that the sum of its elements is 1. Initialize the action values to zero (optimistic initialization). Measure the steps needed to reach the goal to evaluate the success of your learning algorithm. In order to verify if the cart-pole reached the goal use the learned optimal policy, i.e. set $\epsilon=0$ .

a): Use grid-tilings of size $7 \times 7 \times 15 \times 15$ to discretize the state space. Show in a plot how the number of steps needed to reach the goal evolves during learning.
b): Use radial basis function (RBF) approximation with evenly spaced RBF centers located at the tile center used in a) (i.e. 11025 total centers). Set the widths in every dimension such that one RBF roughly spans 1-2 tiles.

Submit the code of your model and the learning algorithm. Present your results clearly, structured and legible. Document them in such a way that anybody can reproduce them effortless.

Next: RL application III: Self-play Up: MLB_Exercises_2010 Previous: RL application I: On-

Haeusler Stefan 2011-01-25