Reducing Communication for Distributed Learning in Neural Networks

A learning algorithm is presented for circuits consisting of a single layer of perceptrons. We refer to such circuits as parallel perceptrons. In spite of their simplicity, these circuits are universal approximators for arbitrary boolean and continuous functions. In contrast to backprop for multi-layer perceptrons, our new learning algorithm - the parallel delta rule (p-delta rule) - only has to tune a single layer of weights, and it does not require the computation and communication of analog values with high precision. This distinguishes our new learning rule also from other learning rules for such circuits such as MADALINE with far higher communication. Our algorithm also provides an interesting new hypothesis for the organization of learning in biological neural systems. A theoretical analysis shows that the p-delta rule does in fact implement gradient descent - with regard to a suitable error measure - although it does not require to compute derivatives. Furthermore it is shown through experiments on common real-world benchmark datasets that its performance is competitive with that of other learning approaches from neural networks and machine learning.

Reducing Communication for Distributed Learning in Neural Networks

Abstract: