@Frankchen
2016-02-25T07:49:19.000000Z
字数 5496
阅读 1613
Deep-learning
The goal for unsupervised learning is that to
Feed-forward neural networks are the commonest type of neural network in practical applications. Such as its first layer is the input and the last layer is the output, and if it has more than one hidden layer, we call them "deep" neural networks.
Feed-forward neural networks compute a series of transformations that change the similarities between cases (In speech recognition, for example we'd like the same thing said by different speakers to become more similar, and different said by the same speaker to become less similar as we go through the different layers). The activities of the neurons in each layer are a non-linear function of the activities in the layer below.
These kind of networks have directed cycles in their connection graph. It has complicated dynamics and this can make them very difficult to train.
Perceptrons were popularized by Frank Rosenblatt in the early 1960’s. And then in 1969, Minsky and Papert published a book called “Perceptrons” that analysed what they could do and showed their limitations. This leaded to a bad result that many people thought these limitations applied to all neural network modles(It's not correct, for example, one layer of perceptrons couldn't compute the XOR operation, however, two layers of perceptrons could do it for XOR could represented by AND operation and OR operation).
The learning algorithm of the perceptron convergence procedure is training binary output neurons as classifiers which can be explained as follows:
Pick training cases using any policy that ensures that every training case will keep getting picked.
1.If the output unit is correct, leave its weights alone.
2.If the output unit incorrectly outputs a zero, add the input vector to the weight vector.
3.If the output unit incorrectly outputs a 1, subtract the input vector from the weight vector.
This is guaranteed to find a set of weights that gets the right answer for all the training cases if any such set exists.
In the weight space view, the weights represent points while the inputs represent planes. Another term for what the inputs represent is constraints(The inputs will constrain the set of weights that give the correct classification results).
Picture above is a simple explanation for a kind limitation of perceptrons: one layer of perceptrons couldn't do XOR operation.
Here comes a question to be answered for why a binary desion unit cannot discriminate pattens with the same number of on pixels(assuming translation with wraparound). My explanation is as follows: since the number of pixels of pattern A and pattern B is the same, and both of the two kinds of pattern allow wrapround, so, their possible cases or their number of possible positions of A and B are the same. This will cause that they will vote the weights for the same which leads to that the program cannot distinguish which pattern the input belongs to.
Here is a example from the discuss forum of the course:
"Simplified case: only 5 pixels, want to recognize between two different patterns where two pixels are on (first pixel in first example of each pattern is bolded so that translations can be seen easily):
Pattern A --> [1 1 0 0 0], [0 1 1 0 0], [0 0 1 1 0], [0 0 0 1 1], and [1 0 0 0 1]
Pattern B --> [1 0 1 0 0], [0 1 0 1 0], [0 0 1 0 1], [1 0 0 1 0], and [0 1 0 0 1]
Now, during training, you will input each of the possible positions of pattern A and each of the possible positions of pattern B as training examples. Every time one of the pixels appears positive for a pattern, it will be like adding one vote for that pattern every time that pixel is on (equal to 1). So for pattern A, looking at the first pixel in all 5 examples of the pattern, we find that it is on in 2 of the 5 cases (namely the first and last), meaning two votes for pattern A when the first pixel is on. Similarly, for pattern B, looking at the first pixel in all 5 examples of the pattern, we find that it is on in 2 of the 5 cases (namely first and fourth), meaning two votes for pattern B when the first pixel is on. In fact, regardless of the pattern and the pixel, you will find that there are 2 votes for each pattern should that pixel be on, and nothing to break the tie. On a crude level, having a tie is exactly what it means to be incapable of distinguishing between cases.
For the explanation about the weights, the votes are essentially like the weights Professor Hinton is referring to. In our example, when 2 pixels are on, you get a total of 4 "votes" for each pattern or a total of 2 times the "votes per pixel" which is analogous to the sum of the weights (though vastly simplified). Since neither pattern's votes outweighs the other's, the program cannot distinguish which pattern the input belongs to." (Jason Michael Runkle)
A neural network without the hidden layers is limited, however its difficult to learn the weights of the hidden layers, so, for a long time, people believed that perceptrons and neural networks are not good.