Simple perceptrons are feed-forward (neural) networks with no hidden layers. They have only an input layer and an output layer.
Input layer is the set of input nodes, each of which provides the value for one coordinate of the input vector. Output layer is the set of output nodes, each of which gives out the value for one coordinate of the output vector. There are weighted connections between the input layer nodes and the output layer nodes.
Say, input vector is -dimensional, and output vector is
-dimensional. Then, there are
input layer nodes and
output layer nodes, and up to
weighted connections between the two layers (for a fully-connected network, there are actually
connections).
For a simple perceptron, the output from the
th output node is given by:
,
where is the weight value on the connection from
th input layer node to
th output layer node,
is the value given by
th input layer node and
is the activation function used by the network.
A simple perceptron is used to compute two kinds of functions: classification functions and regression functions.
A classification function is one that takes an input vector and categorizes it into one of several classes. Thus, with -dimensional input vector and
-dimensional output vector, its input is an element of
, while its input is
, where
is a finite set. As a simple example, each output node may produce only either
or
output, in which case
.
A regression function is one that takes an input vector and produces a real-valued vector. Again, with -dimensional input vector and
-dimensional output vector, its input is an element of
, while its input is an element of
.
Let us first consider use of perceptrons for computing classification functions. For simplicity, we consider that each output node gives out only or
. Further, we assume the output layer has only one output node, or we can say, we analyze each output node in turn. For this case, the input-output equation becomes:
where, is the weight vector,
is the input vector,
, the dot-product of
and
, and
is the signum function defined as:
. Note that, we dropped the suffix
in the above equation because we are considering only one output node. Also, other variations are possible for the equation. In particular, instead of just looking at the sign of
, we might require that
is greater than a threshold to prevent ambiguity in classification of input when
is close to
.
In a classification problem, the setting is that input vectors are given, and for each input vector, its correct class
or
is also given. For example, we may have
in class
and
in class
. The question at hand is if a simple perceptron can correctly classify the given input vectors (or in machine learning scenario, new vectors that may come as input in future) into
and
classes. In other words, can a weight vector
be found such that
for each
, and
for each
?
To answer this, observe that represents a hyperplane in the
-space that is orthogonal to
and that passes through the origin. (The hyperplane would be a straight line if
is
-dimensional, and would be a plane if
is
-dimensional). Further, observe that all points
such that
lie on one side of the hyperplane, while all points
such that
lie on the other side of the hyperplane. (Lying on one or the other side of the hyperplane makes visually sense only if
is
-dimensional or
-dimensional. In higher dimensions, the two sides of the hyperplane are defined just algebraically as
and
). Thus, the question of classification boils down to whether a weight vector
can be found such that the hyperplane orthogonal to it and passing through the origin separates the points
to one side of it and the points
to the other side of it. Thus, a simple perceptron is capable of classifying a given set of input points if and only if the points are separable by a hyperplane – that is, (since a hyperplane is represented by a linear equation) if and only if the points are linearly separable.
Next, let us consider the use of simple perceptrons for computing regression functions. Again, to simplify, we assume that the output layer has only one node. For our first case, we assume that the activation function is a linear function. For this case, the input-output equation becomes:
.
Observe that since our activation is linear (of the form ), we could assume it is actually just the identity function.
In the regression problem scenario, we are again given a collection of input vectors . For each input vector
, the corresponding output real value
is given. The question is whether a weight vector
exists such that
for each
– that is, the question is just whether there is a solution to the system of
linear equations. Thus, a simple perceptron with a linear activation function can solve a regression problem (for given input vectors) if and only if weights
exist to solve the system of linear equations capturing the input vector-output value relationships.
For the second case, we consider non-linear activation functions. In this case, the input-output equation becomes:
Here, is a non-linear function such as the sigmoid function. It turns out if
is an invertible function, then the question here is equivalent to previous case. This is because we could form an equivalent input-output equation as:
And, we analyze the case just like the case of linear activation function. Thus, simple perceptrons with non-linear (invertible) activation functions have the same capability for solving regression problems as simple perceptrons with linear activation functions.
In the above discussion, we went round about to analyze when a simple perceptron can map a given collection of input vectors to their corresponding outputs, and when it cannot. We found that whether a simple perceptron can solve the problem for a given set of input vectors and their corresponding output values has to do with a linear criterion. This is not at all surprising because simple perceptrons are able to compute only linear functions. Given an input vector , a simple perceptron constructed using weights
by definition produces output
, which is a linear in
.