[关闭]
@devilloser 2018-01-05T07:58:59.000000Z 字数 7930 阅读 1747

CS231N assignment1及思考

CS231N


*KNN
*SoftMax
*SVM
*Neural Net
*feature

矩阵求导
Y = A * X --> DY/DX = A'
Y = X * A --> DY/DX = A
Y = A' * X * B --> DY/DX = A * B'
Y = A' * X' * B --> DY/DX = B * A'

KNN

基本思想
在一个N维欧式空间内,如果与一个样本最近的K个样本大部分属于某一个class,则该样本属于这个class。(监督学习)
KNN
如图,K=3时目标样本的class的选择。
算法步骤
1)定义临近点(多用欧式距离表示样本的距离)
欧式距离
2)统计概率最高的class
计算方法

两重循环

  1. num_test = X.shape[0]
  2. num_train = self.X_train.shape[0]
  3. dists = np.zeros((num_test, num_train))
  4. for i in xrange(num_test):
  5. for j in xrange(num_train):
  6. dists[i,j]=np.sqrt(np.sum(np.square(self.X_train[j,:]-X[i,:])))
  7. return dists

一重循环

  1. num_test = X.shape[0]
  2. num_train = self.X_train.shape[0]
  3. dists = np.zeros((num_test, num_train))
  4. for i in xrange(num_test)
  5. dists[i,:]=np.sqrt(np.sum(np.square(self.X_train-X[i,:]),axis=1)) #axis=1,一行相加
  6. return dists

向量化运算


同理

  1. dists=np.multiply(np.dot(X,self.X_train.T),-2)
  2. x2=np.sum(np.square(X),axis=1,keepdims=True)
  3. y2=np.sum(np.square(self.X_train),axis=1)
  4. dists=np.add(dists,x2)
  5. dists=np.add(dists,y2)
  6. dists=np.sqrt(dists)

测试结果
Two loop version took 66.936663 seconds
One loop version took 56.345410 seconds
No loop version took 11.018102 seconds
K的选择
cross-validation

第一步:分割数据集

  1. X_train_folds=np.array_split(X_train,num_folds)
  2. y_train_folds=np.array_split(y_train,num_folds)

第二步:分别将每一部分作为测试集其他作为训练集训练测试

  1. for k in k_choices:
  2. k_to_accuracies[k] = []
  3. for i in range(num_folds):
  4. Xtr=np.vstack(X_train_folds[:i]+X_train_folds[i+1:])#垂直拼接训练集
  5. print Xtr.shape
  6. ytr=np.hstack(y_train_folds[:i]+y_train_folds[i+1:])
  7. print ytr.shape
  8. Xte=X_train_folds[i]
  9. yte=y_train_folds[i]
  10. classifier.train(Xtr,ytr)
  11. dists_cv=classifier.compute_distances_no_loops(Xte)
  12. dists_cv.shape
  13. yte_pred = classifier.predict_labels(dists_cv, k)
  14. num_correct=np.sum(yte_pred==yte)
  15. accuracy = float(num_correct) / num_test
  16. k_to_accuracies[k].append(accuracy)

交叉验证图

softmax

此处输入图片的描述
Logistic交叉熵损失函数



求导

梯度

softmax
Softmax Regression 即为一个 K 分类的概率判别模型,就是把 Logistic Regression 推广到 K 分类的版本

softmax参数估计
对单个样本,交叉熵loss为:

求导(即分类正确):

求导:

计算方法

循环

  1. for i in xrange(num_train):
  2. score=X[i].dot(W)
  3. score-=np.max(score)
  4. correct_score=score[y[i]]
  5. exp_sum=np.sum(np.exp(score))
  6. loss+=np.log(exp_sum)-correct_score
  7. dW[:,y[i]]-=X[i]
  8. for j in xrange(num_class):
  9. dW[:,j]+=(np.exp(score[j])/exp_sum)*X[i]
  10. loss/=num_train
  11. loss+=0.5*reg*np.sum(W*W)
  12. dW/=num_train
  13. dW+=reg*W

向量化

  1. num_train = X.shape[0]
  2. scores = X.dot(W)
  3. exp_scores = np.exp(scores)
  4. row_sum = exp_scores.sum(axis=1)
  5. row_sum = row_sum.reshape((num_train, 1))
  6. #loss
  7. norm_exp_scores = exp_scores / row_sum
  8. row_index = np.arange(num_train)
  9. data_loss = norm_exp_scores[row_index, y].sum()
  10. loss = data_loss / num_train + 0.5 * reg * np.sum(W*W)
  11. norm_exp_scores[row_index, y] -= 1
  12. dW = X.T.dot(norm_exp_scores)
  13. dW = dW/num_train + reg * W

超参数的选择

SVM

hinge loss
hinge loss
举例:
用一个例子演示公式是如何计算的。假设有3个分类,并且得到了分值。其中第一个类别是正确类别的标签。同时假设是=10。上面的公式是将所有不正确分类()加起来,所以我们得到两个部分:
此处输入图片的描述
首先求一个样本的的一个分量对W的列向量的偏导数,对大于0的才有用,每一个大于0的项会对导数的两列带来贡献,对于会给导数的第j列带来的贡献,对于的列向量,带来的贡献
此处输入图片的描述
计算方法

循环

  1. for i in xrange(num_train):
  2. scores = X[i].dot(W)
  3. correct_class_score = scores[y[i]]
  4. for j in xrange(num_classes):
  5. if j == y[i]:
  6. continue
  7. margin = scores[j] - correct_class_score + 1
  8. if margin > 0:
  9. loss += margin
  10. dW[:,j] += X[i,:].T
  11. dW[:,y[i]] -= X[i,:].T
  12. loss /= num_train
  13. dW /= num_train
  14. #regularization
  15. loss += 0.5 * reg * np.sum(W * W)
  16. dW += reg*W

向量化

  1. scores_correct = scores[np.arange(num_train), y] # 1*n
  2. scores_correct = np.reshape(scores_correct, (num_train, -1)) # N *1
  3. margins = scores - scores_correct + 1
  4. margins = np.maximum(0,margins)
  5. margins[np.arange(num_train), y] = 0
  6. loss += np.sum(margins) / num_train
  7. loss += 0.5 * reg * np.sum(W * W)
  8. # 计算梯度
  9. margins[margins > 0] = 1
  10. row_sum = np.sum(margins, axis=1) # 1 * N
  11. margins[np.arange(num_train), y] = -row_sum
  12. dW += np.dot(X.T, margins)/num_train + reg * W

超参数选择
同理softmax

Neural Net

此处输入图片的描述

Loss

  1. scores_max=np.max(scores,axis=1,keepdims=True) # n*1
  2. exp_scores=np.exp(scores-scores_max)
  3. scores=exp_scores/np.sum(exp_scores,axis=1,keepdims=True)
  4. correct_scores=-np.log(scores[range(scores.shape[0]),y])
  5. loss=np.sum(correct_scores)/(scores.shape[0])
  6. loss+=0.5*reg*np.sum(W1*W1)+0.5*reg*np.sum(W2*W2)

gradient

  1. dscore=scores
  2. dscore[range(N),y]-=1
  3. dscore/=N
  4. dw2=np.dot(h1.T,dscore)+reg*W2
  5. db2=np.sum(dscore,axis=0)
  6. dh1=np.dot(dscore,W2.T)
  7. dh1[h1<=0]=0
  8. dw1=np.dot(X.T,dh1)+reg*W1
  9. db1=np.sum(dh1,axis=0)
  10. grads['W1']=dw1
  11. grads['b1']=db1
  12. grads['W2']=dw2
  13. grads['b2']=db2

train

batch

  1. index=np.random.choice(num_train,batch_size,replace=True)
  2. X_batch=X[index,:]
  3. y_batch=y[index]
  1. loss, grads = self.loss(X_batch, y=y_batch, reg=reg)
  2. loss_history.append(loss)
  3. self.params['W2']-=learning_rate*grads['W2']
  4. self.params['b2']-=learning_rate*grads['b2']
  5. self.params['W1']-=learning_rate*grads['W1']
  6. self.params['b1']-=learning_rate*grads['b1']

predict

  1. h1=np.maximum(0,(np.dot(X,self.params['W1'])+self.params['b1'])
  2. scores=np.dot(h1,self.params['W2']+self.params['b2'])
  3. y_pred=np.argmax(scores,axis=1)

feature

  1. for lr in learning_rates:
  2. for rs in regularization_strengths:
  3. svm=LinearSVM()
  4. loss=svm.train(X_train_feats,y_train,lr,rs,num_iters=1500,verbose=False)
  5. y_train_pred=svm.predict(X_train_feats)
  6. accuracy_train = np.mean(y_train == y_train_pred)
  7. y_val_pred = svm.predict(X_val_feats)
  8. accuracy_val = np.mean(y_val == y_val_pred)
  9. results[(lr, rs)] = (accuracy_train, accuracy_val)
  10. if accuracy_val > best_val:
  11. print "lr:",lr
  12. print "reg:", rs
  13. best_val = accuracy_val
  14. best_svm = svm
  1. for learning_rate_curr in learning_rates:
  2. for reg_cur in regularization_strengths:
  3. print
  4. print "current training learning_rate:",learning_rate_curr
  5. print "current training reg:",reg_cur
  6. net = TwoLayerNet(input_dim, hidden_dim, num_classes)
  7. stats = net.train(X_train_feats, y_train, X_val_feats, y_val,
  8. num_iters=1000, batch_size=1500,
  9. learning_rate=learning_rate_curr, learning_rate_decay=0.95,
  10. reg=reg_cur, verbose=True)
  11. val_acc = (net.predict(X_val_feats) == y_val).mean()
  12. print "current val_acc:",val_acc
  13. if val_acc>best_acc:
  14. best_acc = val_acc
  15. best_net = net
  16. best_stats = stats
  17. print
  18. print "best_acc:",best_acc
  19. print
添加新批注
在作者公开此批注前,只有你和作者可见。
回复批注