Next: Policy Gradient Methods
Up: Policy Gradient Methods: Swimmer
Previous: Policy Gradient Methods: Swimmer
In order to work with the swimmer model add the folder model to your matlab path. The model environment is stored in a structure, which is created with the command
. For implementing various policy gradient methods, the most important function is the function
to perform a single rollout
8. The function takes the model structure
, the policy parameters
(i.e. the linear parameters
in our case) and the variance of the stochastic policy
as arguments. The function simulates the swimmer using
as parameters of the policy for
time steps (
resulting in a simulation time of
) and returns the summed reward (perf) for this episode (
). In addition it returns the used noise vector
for each time step (so
is a
matrix) and the features
for each timestep (
matrix). The single rewards for each time step can also be obtained (rewards). The E.J function has additional output values which return the visited trajectory, the performed torques and the state variables of the dmp (
and
), see evaluate.m for further details.
To visualize a policy use
.
Finally some general remarks: the policy
has
parameters, where
denotes the number of links of the swimmer. The number of Gaussian kernel functions is given by the model and set to
. Make sure, that the parameters are within the interval
.
Next: Policy Gradient Methods
Up: Policy Gradient Methods: Swimmer
Previous: Policy Gradient Methods: Swimmer
Haeusler Stefan
2011-01-25