A learning rule for very simple universal approximators consisting of a single layer of perceptrons

A learning algorithm is presented for circuits consisting of a single layer of perceptrons (= threshold gates, or equivalently gates with a Heaviside activation function). We refer to such circuits as parallel perceptrons. In spite of their simplicity, these circuits can compute any boolean function if one views the majority of the binary perceptron outputs as the binary outputs of the parallel perceptron, and they are universal approximators for arbitrary continuous functions with values in [0,1] if one views the fraction of perceptrons that output 1 as the analog output of the parallel perceptron. For a long time one has thought that there exists no competitive learning algorithms for these extremely simple circuits consisting of gates with binary outputs, which also became known as committee machines. It is commonly believed that one has to replace the hard threshold gates by sigmoidal gates and that one has to tune the weights on at least two successive layers in order to get satisfactory learning results. We show that this is not true by exhibiting a simple learning algorithm for parallel perceptrons - the parallel delta rule (p-delta rule), whose performance is comparable to that of backprop for multilayer networks consisting of sigmoidal gates. In contrast to backprop, the p-delta rule does not require the computation and communication of analog values with high precision, although it does in fact implement gradient descent - with regard to a suitable error measure. Therefore it provides an interesting new hypothesis for the organization of learning in biological neural systems.

A learning rule for very simple universal approximators consisting of a single layer of perceptrons

Abstract: