next up previous
Next: Genetic Algorithm [3* P] Up: MLB_Exercises_2012 Previous: Comparison of Optimization Algorithms

Cart-Pole Controller Optimization [5 P]

Optimize the weights of a neural network that controls the horizontal forces applied to a cart in order to swing up a pole that is mounted on it into the upright position. Use an optimization algorithm of your choice (one of the four algorithms used in homework assignment 1). The state $ \bf {x}$ of the dynamical system is defined as a four dimensional vector with the elements $ (xc,\dot{xc},\varphi,\dot{\varphi})^T$ , where $ xc$ is the position of the cart and $ \varphi$ is the pole angle.

Figure: The cart-pole model.
Image cp

Download the MATLAB code for the cart-pole.2 The dynamics of the cart-pole for one time step is computed in the function cp_dyn.m. The state $ \bf {x}$ of the cart-pole can be visualized with the function cp_vis.m (you don't have to change or call these two functions).

Modify the file learn_cp.m that contains also the parameters for the cart-pole in the structure model (masses, length of the pole etc.) to implement an optimization algorithm that minimizes the error computed with the function cost_function.m.

Modify the file cost_function.m that simulates the cart for a duration of 4 seconds and assigns an error value to the dynamics of the cart. You have to i) complete the function cost_function.m to implement a neural network controller that generates the horizontal force $ u$ that is applied to the cart at each time step, and ii) define an appropriate error function $ e$ (that scores the cart-dynamics for each time step), which has to be minimized.

Neural network controller: Write the code to initialize and simulate a neural network without using the neural network toolbox. The input to the network for each time step should consist of the following 5 values: $ (xc,\dot{xc},\sin{\varphi},\cos{\varphi},\dot{\varphi})^T$ . The network consists of a single hidden layer with 5 hidden units. The scalar output of the neural network, i.e. the force, should be bounded with values between -10 and 10.

Hints: Don't forget to optimize the bias values of the neurons. Use a tansig output neuron to obtained bounded output values.

Error function: Use the state variables to construct an error function $ e$ that outputs a value at each time step that assure that the pole remains in the upright positions after the upswing. Set the error to $ 10^6$ if the cart leaves the interval $ [-1,1]$ . The total error (or cost) assigned to the cart movement is the sum of all $ e$ for all time steps.

Hints: The key to success is an appropriate error function.

Analyze the best solution found and state the reasons for all choices you made in the MATLAB code.

Present your results clearly, structured and legible. Document them in such a way that anybody can reproduce them effortless.

next up previous
Next: Genetic Algorithm [3* P] Up: MLB_Exercises_2012 Previous: Comparison of Optimization Algorithms
Haeusler Stefan 2013-01-16