Perceptron and Adaline
Networks with threshold activation functions
A single layer feed-forward network consists of one or more output neurons o, each of which is connected with a weighting factor wio to all of the inputs i. In the simplest case the network has only two inputs and a single output, as sketched in figure:(we leave the output index o out). The input of the neuron is the weighted sum of the inputs plus the bias term. The output of the network is formed by the activation of the output neuron, which is some function of the
input:
The activation function F can be linear so that we have a linear network, or nonlinear. In this
section we consider the threshold (or Heaviside or sgn) function:
The output of the network thus is either +1 or -1 depending on the input. The network can now be used for a classi cation task: it can decide whether an input pattern belongs to one of two classes. If the total input is positive, the pattern will be assigned to class +1, if the total input is negative, the sample will be assigned to class -1.The separation between the two
classes in this case is a straight line, given by the equation:
We will describe two learning methods for these types of networks: the 'perceptron'
learning rule and the 'delta' or 'LMS' rule. Both methods are iterative procedures that adjust
the weights. A learning sample is presented to the network. For each weight the new value is
computed by adding a correction to the old value. The threshold is updated in a same way:
Perceptron learning rule and convergence theorem
Suppose we have a set of learning samples consisting of an input vector x and a desired outputd(x). For a classification task the d(x) is usually +1 or -1.The perceptron learning rule is very
simple and can be stated as follows:
- Start with random weights for the connections;
- Select an input vector x from the set of training samples;
- If y ≠d(x) (the perceptron gives an incorrect response), modify all connections wi according
to: Δwi = d(x)xi; - Go back to 2.
network responds correctly, no connection weights are modi ed. Besides modifying the weights,
we must also modify the threshold θ. This θ is considered as a connection w0 between the output
neuron and a 'dummy' predicate unit which is always on: x0 = 1. Given the perceptron learning
rule as stated above, this threshold is modified according to:
The adaptive linear element (Adaline)
An important generalisation of the perceptron training algorithm was presented by Widrow andHoff as the 'least mean square' (LMS) learning procedure, also known as the delta rule. The
main functional diference with the perceptron training rule is the way the output of the system is
used in the learning rule. The perceptron learning rule uses the output of the threshold function (either -1 or +1) for learning.The delta-rule uses the net output without further mapping into
output values -1 or +1.The learning rule was applied to the 'adaptive linear element,' also named Adaline2, developed
by Widrow and Hoff (Widrow & Hoff, 1960). In a simple physical implementation
this device consists of a set of controllable resistors connected to a circuit which can sum up
currents caused by the input voltage signals. Usually the central block, the summer, is also
followed by a quantiser which outputs either +1 of -1,depending on the polarity of the sum.
Although the adaptive process is here exemplified in a case when there is only one output,
it may be clear that a system with many parallel outputs is directly implementable by multiple
units of the above kind.
If the input conductances are denoted by wi, i = 0; 1; : : : ; n, and the input and output signals by xi and y, respectively, then the output of the central block is defined to be:
Networks with linear activation functions: the delta rule
For a single layer network with an output unit with a linear activation function the output issimply given by:
error-function based on these di erences to adjust the weights.
The error function, as indicated by the name least mean square, is the summed squared
error. That is, the total error E is de ned to be
p.
The delta rule modi fies weight appropriately for target and actual outputs of either polarity
and for both continuous and binary input and output units. These characteristics have opened
up a wealth of new applications.
0 comments:
Post a Comment