@MitoY 2016-07-17T14:33:37.000000Z 字数 2554 阅读 832

Notes on CS231n (1)

This is a note of the first three lectures of CS231n by Jing lei.

Notes on CS231n (1)

Notations

training data: $X_{train}$
test data: $X$ or $X_i$ or $x$
label: $y$ or $y_i$
score: $s$ or $s_{j}$

K-Nearest Neighbor

how it works?

Given a test data(e.g. a picture), the kNN searches through all the pictures it has stored, and find the k most similar pictures. Among them the kNN chooses the most common one, and its label as the predicted label.

Inputs: (1) training data ( and labels of its classes), and (2) test data (classes unknown)

Outputs: predictions (labels)

Procedure:

Train: store training data

Compute the distance between test data and all training data. A distance measures how two data are alike, usually we choose L-2 distance (Euclidean distance).

Predict: find the k nearest training data to the given test data. Choose the most common class, and its label as the prediction.

performance

Rather bad. It only reaches an accuracy of 28% on cifar-10 with 10 classes, 5000 pictures as training data.

Linear Classifier

score function

A linear classifier perform a linear transformation on a given data $x$ and gives a vector named a score. The dimension of a score is number of classes, and the greatest element indicates the most likely class.

$s = f(X_i; W, b) = X_i W + b$
Usually we extend

$W$ with one row of

$b$ and extend

$x$ with one more constant 1. Then the new score function will simplify to a single matrix multiply:

$s = f(X_i;W) = X_i W$

dimension:
$s$ : 1 x num_classes, $X_i$ : 1 x D, $W$ : D x num_classes
$X$ : num_training x D

SVM loss function

A loss function (or cost function) measure how inaccurate a classifier is. Given a test data, a loss function measures how the outcome from a classifier is inconsistent the correct class. The more inconsistent, the greater the loss is.

The Multiclass Support Vector Machine (SVM) loss is:

$L_i = \displaystyle\sum_{j\neq y_i} max(0, s_j + \delta - s_{y_i})$

The $max(0,−)$ function is often called the hinge loss.
After adding a regularization penalty, the full loss function looks like this:

$L = \frac{1}{N} \displaystyle\sum_{i=1}^{N} L_i + \lambda \displaystyle\sum_{i,j} W_{ij}^2$

The most common regularization penalty is the L2 norm.

Softmax Classifier

The score function is $e^{s_i}$ , and the loss becomes the cross-entropy loss:

$L_i = -\log \frac{e^{s_{y_i}}}{ \displaystyle\sum e^{s_j} }$

Information theory view: The cross-entropy between a “true” distribution $p$ and an estimated distribution $q$ is defined as: $H(p,q) = - \sum_x p(x) \log q(x)$ .

Optimization

Gradient and gradient descent. And back propogation.

$L = L(f(x;W))$

If you try to minimize L you should do:
loop:
$W = W + \alpha \nabla_W L$
Next time I'll come at it in more detail.

Cross-validation

Split your data into several sets, choose one as a test set and the others training sets.

Vectorization

This is where I often got stucked while coding.
Let's talked about it later.
pass