Reward-modulated Hebbian Learning of Decision Making
M. Pfeiffer, B. Nessler, R. Douglas, and W. Maass
We introduce a framework for decision making in which the learning of decision
making is reduced to its simplest andbiologically most plausible form:
Hebbian learning on a linear neuron. We cast our Bayesian-Hebb learning rule
as reinforcement learning in which certain decisions are rewarded and prove
that each synaptic weight will on average converge exponentially fast to the
log-odd of receiving a reward when its pre- and postsynaptic neurons are
active. In our simple architecture, a particular action is selected from the
set of candidate actions by a winner-takeall operation. The global reward
assigned to this action then modulates the update of each synapse. Apart from
this global reward signal, our reward-modulated Bayesian Hebb rule is a pure
Hebb update that depends only on the coactivation of the pre- and
postsynaptic neurons, not on theweighted sum of all presynaptic inputs to the
postsynaptic neuron as in the perceptron learning rule or the Rescorla-Wagner
rule. This simple approach to action-selection learning requires that
information about sensory inputs be presented to the Bayesian decision stage
in a suitably preprocessed form resulting from other adaptive processes
(acting on a larger timescale) that detect salient dependencies among input
features. Hence our proposed framework for fast learning of decisions also
provides interesting new hypotheses regarding neural nodes and computational
goals of cortical areas that provide input to the final decision stage.
Reference: M. Pfeiffer, B. Nessler, R. Douglas, and W. Maass.
Reward-modulated Hebbian Learning of Decision Making.
Neural Computation, 22:1399-1444, 2010.