[关闭]
@MitoY 2016-07-17T14:33:37.000000Z 字数 2554 阅读 832

Notes on CS231n (1)

This is a note of the first three lectures of CS231n by Jing lei.

Notations

K-Nearest Neighbor

how it works?

Given a test data(e.g. a picture), the kNN searches through all the pictures it has stored, and find the k most similar pictures. Among them the kNN chooses the most common one, and its label as the predicted label.

  • Inputs: (1) training data ( and labels of its classes), and (2) test data (classes unknown)
  • Outputs: predictions (labels)
  • Procedure:
    1. Train: store training data
    2. Compute the distance between test data and all training data. A distance measures how two data are alike, usually we choose L-2 distance (Euclidean distance).
    3. Predict: find the k nearest training data to the given test data. Choose the most common class, and its label as the prediction.

performance

Rather bad. It only reaches an accuracy of 28% on cifar-10 with 10 classes, 5000 pictures as training data.


Linear Classifier

score function

A linear classifier perform a linear transformation on a given data and gives a vector named a score. The dimension of a score is number of classes, and the greatest element indicates the most likely class.


Usually we extend with one row of and extend with one more constant 1. Then the new score function will simplify to a single matrix multiply:

dimension:
: 1 x num_classes, : 1 x D, : D x num_classes
: num_training x D

SVM loss function

A loss function (or cost function) measure how inaccurate a classifier is. Given a test data, a loss function measures how the outcome from a classifier is inconsistent the correct class. The more inconsistent, the greater the loss is.

The Multiclass Support Vector Machine (SVM) loss is:

The function is often called the hinge loss.
After adding a regularization penalty, the full loss function looks like this:

The most common regularization penalty is the L2 norm.


Softmax Classifier

The score function is , and the loss becomes the cross-entropy loss:

Information theory view: The cross-entropy between a “true” distribution and an estimated distribution is defined as: .


Optimization

Gradient and gradient descent. And back propogation.

If you try to minimize L you should do:
loop:

Next time I'll come at it in more detail.


Cross-validation

Split your data into several sets, choose one as a test set and the others training sets.


Vectorization

This is where I often got stucked while coding.
Let's talked about it later.
pass

添加新批注
在作者公开此批注前,只有你和作者可见。
回复批注