@devilloser
2018-01-05T07:58:59.000000Z
字数 7930
阅读 1747
CS231N
*KNN
*SoftMax
*SVM
*Neural Net
*feature
矩阵求导
Y = A * X --> DY/DX = A'
Y = X * A --> DY/DX = A
Y = A' * X * B --> DY/DX = A * B'
Y = A' * X' * B --> DY/DX = B * A'
基本思想:
在一个N维欧式空间内,如果与一个样本最近的K个样本大部分属于某一个class,则该样本属于这个class。(监督学习)
如图,K=3时目标样本的class的选择。
算法步骤
1)定义临近点(多用欧式距离表示样本的距离)
2)统计概率最高的class
计算方法
两重循环
num_test = X.shape[0]num_train = self.X_train.shape[0]dists = np.zeros((num_test, num_train))for i in xrange(num_test):for j in xrange(num_train):dists[i,j]=np.sqrt(np.sum(np.square(self.X_train[j,:]-X[i,:])))return dists
一重循环
num_test = X.shape[0]num_train = self.X_train.shape[0]dists = np.zeros((num_test, num_train))for i in xrange(num_test)dists[i,:]=np.sqrt(np.sum(np.square(self.X_train-X[i,:]),axis=1)) #axis=1,一行相加return dists
向量化运算
同理
dists=np.multiply(np.dot(X,self.X_train.T),-2)x2=np.sum(np.square(X),axis=1,keepdims=True)y2=np.sum(np.square(self.X_train),axis=1)dists=np.add(dists,x2)dists=np.add(dists,y2)dists=np.sqrt(dists)
测试结果
Two loop version took 66.936663 seconds
One loop version took 56.345410 seconds
No loop version took 11.018102 seconds
K的选择
cross-validation
第一步:分割数据集
X_train_folds=np.array_split(X_train,num_folds)y_train_folds=np.array_split(y_train,num_folds)
第二步:分别将每一部分作为测试集其他作为训练集训练测试
for k in k_choices:k_to_accuracies[k] = []for i in range(num_folds):Xtr=np.vstack(X_train_folds[:i]+X_train_folds[i+1:])#垂直拼接训练集print Xtr.shapeytr=np.hstack(y_train_folds[:i]+y_train_folds[i+1:])print ytr.shapeXte=X_train_folds[i]yte=y_train_folds[i]classifier.train(Xtr,ytr)dists_cv=classifier.compute_distances_no_loops(Xte)dists_cv.shapeyte_pred = classifier.predict_labels(dists_cv, k)num_correct=np.sum(yte_pred==yte)accuracy = float(num_correct) / num_testk_to_accuracies[k].append(accuracy)

Logistic交叉熵损失函数
循环
for i in xrange(num_train):score=X[i].dot(W)score-=np.max(score)correct_score=score[y[i]]exp_sum=np.sum(np.exp(score))loss+=np.log(exp_sum)-correct_scoredW[:,y[i]]-=X[i]for j in xrange(num_class):dW[:,j]+=(np.exp(score[j])/exp_sum)*X[i]loss/=num_trainloss+=0.5*reg*np.sum(W*W)dW/=num_traindW+=reg*W
向量化
num_train = X.shape[0]scores = X.dot(W)exp_scores = np.exp(scores)row_sum = exp_scores.sum(axis=1)row_sum = row_sum.reshape((num_train, 1))#lossnorm_exp_scores = exp_scores / row_sumrow_index = np.arange(num_train)data_loss = norm_exp_scores[row_index, y].sum()loss = data_loss / num_train + 0.5 * reg * np.sum(W*W)norm_exp_scores[row_index, y] -= 1dW = X.T.dot(norm_exp_scores)dW = dW/num_train + reg * W
超参数的选择
hinge loss
举例:
用一个例子演示公式是如何计算的。假设有3个分类,并且得到了分值。其中第一个类别是正确类别的标签。同时假设是=10。上面的公式是将所有不正确分类()加起来,所以我们得到两个部分:
首先求一个样本的的一个分量对W的列向量的偏导数,对大于0的才有用,每一个大于0的项会对导数的两列带来贡献,对于会给导数的第j列带来的贡献,对于的列向量,带来的贡献
计算方法
循环
for i in xrange(num_train):scores = X[i].dot(W)correct_class_score = scores[y[i]]for j in xrange(num_classes):if j == y[i]:continuemargin = scores[j] - correct_class_score + 1if margin > 0:loss += margindW[:,j] += X[i,:].TdW[:,y[i]] -= X[i,:].Tloss /= num_traindW /= num_train#regularizationloss += 0.5 * reg * np.sum(W * W)dW += reg*W
向量化
scores_correct = scores[np.arange(num_train), y] # 1*nscores_correct = np.reshape(scores_correct, (num_train, -1)) # N *1margins = scores - scores_correct + 1margins = np.maximum(0,margins)margins[np.arange(num_train), y] = 0loss += np.sum(margins) / num_trainloss += 0.5 * reg * np.sum(W * W)# 计算梯度margins[margins > 0] = 1row_sum = np.sum(margins, axis=1) # 1 * Nmargins[np.arange(num_train), y] = -row_sumdW += np.dot(X.T, margins)/num_train + reg * W
超参数选择
同理softmax

Loss
scores_max=np.max(scores,axis=1,keepdims=True) # n*1exp_scores=np.exp(scores-scores_max)scores=exp_scores/np.sum(exp_scores,axis=1,keepdims=True)correct_scores=-np.log(scores[range(scores.shape[0]),y])loss=np.sum(correct_scores)/(scores.shape[0])loss+=0.5*reg*np.sum(W1*W1)+0.5*reg*np.sum(W2*W2)
gradient
dscore=scoresdscore[range(N),y]-=1dscore/=Ndw2=np.dot(h1.T,dscore)+reg*W2db2=np.sum(dscore,axis=0)dh1=np.dot(dscore,W2.T)dh1[h1<=0]=0dw1=np.dot(X.T,dh1)+reg*W1db1=np.sum(dh1,axis=0)grads['W1']=dw1grads['b1']=db1grads['W2']=dw2grads['b2']=db2
train
batch
index=np.random.choice(num_train,batch_size,replace=True)X_batch=X[index,:]y_batch=y[index]
loss, grads = self.loss(X_batch, y=y_batch, reg=reg)loss_history.append(loss)self.params['W2']-=learning_rate*grads['W2']self.params['b2']-=learning_rate*grads['b2']self.params['W1']-=learning_rate*grads['W1']self.params['b1']-=learning_rate*grads['b1']
predict
h1=np.maximum(0,(np.dot(X,self.params['W1'])+self.params['b1'])scores=np.dot(h1,self.params['W2']+self.params['b2'])y_pred=np.argmax(scores,axis=1)
for lr in learning_rates:for rs in regularization_strengths:svm=LinearSVM()loss=svm.train(X_train_feats,y_train,lr,rs,num_iters=1500,verbose=False)y_train_pred=svm.predict(X_train_feats)accuracy_train = np.mean(y_train == y_train_pred)y_val_pred = svm.predict(X_val_feats)accuracy_val = np.mean(y_val == y_val_pred)results[(lr, rs)] = (accuracy_train, accuracy_val)if accuracy_val > best_val:print "lr:",lrprint "reg:", rsbest_val = accuracy_valbest_svm = svm
for learning_rate_curr in learning_rates:for reg_cur in regularization_strengths:print "current training learning_rate:",learning_rate_currprint "current training reg:",reg_curnet = TwoLayerNet(input_dim, hidden_dim, num_classes)stats = net.train(X_train_feats, y_train, X_val_feats, y_val,num_iters=1000, batch_size=1500,learning_rate=learning_rate_curr, learning_rate_decay=0.95,reg=reg_cur, verbose=True)val_acc = (net.predict(X_val_feats) == y_val).mean()print "current val_acc:",val_accif val_acc>best_acc:best_acc = val_accbest_net = netbest_stats = statsprint "best_acc:",best_acc