Learn how to shoot a cannon at different target locations and under changing wind-conditions with reward weighted regression.
Use the matlab shootcannon.m provided in cannon.zip10 to simulate a cannon shot. The function takes the initial angle and velocity of the cannon ball as parameter. In addition, your have to provide the current wind strength . The function returns the impact position (1D) and the duration of the flight of the cannonball. Now we want to use reward weighted regression to learn to shoot at targets at different distances under different wind conditions. Hence, we want to learn a policy which chooses the optimal initial angle and velocity of the cannonball given the target position and the wind strength .
For valid target positions choose the range of and the wind strength can be in located in the interval . The initial shoot angle has to be located in the interval , the initial shoot velocity in the range of .
Use a normalized RBF-network as linear feature representation of your policy. Use as reward function, where is the impact position, is the target position and is the duration of the flight (we punish longer flights because we want to destroy the target as fast as possible).
Use a Gaussian Policy with constant exploration . In order to learn how to shoot the cannon use the following procedure:
Report your training performance, also investigate the resulting policy and the error of this policy as function of the target position and the wind strength. Are you able to improve the performance by using a different feature representation?