A learning rule for very simple universal approximators consisting of a
single layer of perceptrons
P. Auer, H. Burgsteiner, and W. Maass
Abstract:
A learning algorithm is presented for circuits consisting of a single layer of
perceptrons (= threshold gates, or equivalently gates with a Heaviside
activation function). We refer to such circuits as parallel perceptrons. In
spite of their simplicity, these circuits can compute any boolean function if
one views the majority of the binary perceptron outputs as the binary outputs
of the parallel perceptron, and they are universal approximators for
arbitrary continuous functions with values in [0,1] if one views the fraction
of perceptrons that output 1 as the analog output of the parallel perceptron.
For a long time one has thought that there exists no competitive learning
algorithms for these extremely simple circuits consisting of gates with
binary outputs, which also became known as committee machines. It is commonly
believed that one has to replace the hard threshold gates by sigmoidal gates
and that one has to tune the weights on at least two successive layers in
order to get satisfactory learning results. We show that this is not true by
exhibiting a simple learning algorithm for parallel perceptrons - the
parallel delta rule (p-delta rule), whose performance is comparable to that
of backprop for multilayer networks consisting of sigmoidal gates. In
contrast to backprop, the p-delta rule does not require the computation and
communication of analog values with high precision, although it does in fact
implement gradient descent - with regard to a suitable error measure.
Therefore it provides an interesting new hypothesis for the organization of
learning in biological neural systems.
Reference: P. Auer, H. Burgsteiner, and W. Maass.
A learning rule for very simple universal approximators consisting of a single
layer of perceptrons.
Neural Networks, 21(5):786-795, 2008.