Assume that for a given continuing MDP with discount factor we modify the reward signal by either

- a)
- adding a constant to all rewards
- b)
- multiplying every reward with a constant
- c)
- linearly transforming the reward signal to ,

Can this change the optimal policy of the MDP? Express for all three cases the new state values in terms of
and the constants (where
is the optimal value of state
under the original reward function).

Now consider the following modifications for deterministic MDPs:

- d)
- Let be the state-action pair that leads to the highest possible immediate reward in the MDP. Set
- e)
- Let be the state-action pair that leads to the lowest possible immediate reward in the MDP. Set