[关闭]
@EtoDemerzel 2017-11-12T09:04:59.000000Z 字数 11162 阅读 2661

机器学习week6 ex5 review

吴恩达 机器学习


本周主要关于如何改进学习效果。

1 Regularized linear regression

1.1Plotting the data

绘制ex5data1.mat中关于水流量和坝中剩余水量的散点图。

  1. %% =========== Part 1: Loading and Visualizing Data =============
  2. % We start the exercise by first loading and visualizing the dataset.
  3. % The following code will load the dataset into your environment and plot
  4. % the data.
  5. %
  6. % Load Training Data
  7. fprintf('Loading and Visualizing Data ...\n')
  8. % Load from ex5data1:
  9. % You will have X, y, Xval, yval, Xtest, ytest in your environment
  10. load ('ex5data1.mat');
  11. % m = Number of examples
  12. m = size(X, 1);
  13. % Plot training data
  14. plot(X, y, 'rx', 'MarkerSize', 10, 'LineWidth', 1.5);
  15. xlabel('Change in water level (x)');
  16. ylabel('Water flowing out of the dam (y)');
  17. fprintf('Program paused. Press enter to continue.\n');
  18. pause;

绘制情况如下:
image_1bun23sad178gkap1ev117bs3b69.png-24kB

显然这个情况用直线拟合并不合适,我们先试着使用线性回归,之后再尝试更高次的多项式。

1.2 Regularized linear regression cost function

1.3 Regularized linear regression gradient

代价函数如下:
image_1bun2cah51skf1nls19fgs6l1ppc9.png-13.2kB
按照上述式子完成linearRegCostFunction.m

  1. function [J, grad] = linearRegCostFunction(X, y, theta, lambda)
  2. %LINEARREGCOSTFUNCTION Compute cost and gradient for regularized linear
  3. %regression with multiple variables
  4. % [J, grad] = LINEARREGCOSTFUNCTION(X, y, theta, lambda) computes the
  5. % cost of using theta as the parameter for linear regression to fit the
  6. % data points in X and y. Returns the cost in J and the gradient in grad
  7. % Initialize some useful values
  8. m = length(y); % number of training examples
  9. % You need to return the following variables correctly
  10. J = 0;
  11. grad = zeros(size(theta));
  12. % ====================== YOUR CODE HERE ======================
  13. % Instructions: Compute the cost and gradient of regularized linear
  14. % regression for a particular choice of theta.
  15. %
  16. % You should set J to the cost and grad to the gradient.
  17. %
  18. htheta = X * theta;
  19. J = 1 / (2 * m) * sum((htheta - y) .^ 2) + lambda / (2 * m) * sum(theta(2:end) .^ 2);
  20. grad = 1 / m * X' * (htheta - y);
  21. grad(2:end) = grad(2:end) + lambda / m * theta(2:end);
  22. % =========================================================================
  23. grad = grad(:);
  24. end

结果正确,不消多提。

1.4 Fitting linear regression

trainlinearReg.m中利用fimincg函数实现线性回归:

  1. function [theta] = trainLinearReg(X, y, lambda)
  2. %TRAINLINEARREG Trains linear regression given a dataset (X, y) and a
  3. %regularization parameter lambda
  4. % [theta] = TRAINLINEARREG (X, y, lambda) trains linear regression using
  5. % the dataset (X, y) and regularization parameter lambda. Returns the
  6. % trained parameters theta.
  7. %
  8. % Initialize Theta
  9. initial_theta = zeros(size(X, 2), 1);
  10. % Create "short hand" for the cost function to be minimized
  11. costFunction = @(t) linearRegCostFunction(X, y, t, lambda);
  12. % Now, costFunction is a function that takes in only one argument
  13. options = optimset('MaxIter', 200, 'GradObj', 'on');
  14. % Minimize using fmincg
  15. theta = fmincg(costFunction, initial_theta, options);
  16. end

ex5.m用训练出来的 做图:

  1. % Train linear regression with lambda = 0
  2. lambda = 0;
  3. [theta] = trainLinearReg([ones(m, 1) X], y, lambda);
  4. % Plot fit over the data
  5. plot(X, y, 'rx', 'MarkerSize', 10, 'LineWidth', 1.5);
  6. xlabel('Change in water level (x)');
  7. ylabel('Water flowing out of the dam (y)');
  8. hold on;
  9. plot(X, [ones(m, 1) X]*theta, '--', 'LineWidth', 2)
  10. hold off;
  11. fprintf('Program paused. Press enter to continue.\n');
  12. pause;

image_1bun97ulfhd01hrk1hbc1mdlg0m16.png-29.3kB


2 Bias variance

high bias: underfitting
high variance: overfitting

2.1 Learning curves

将training example划分为training setcross validation set
training error:
image_1bunersk2ec71jjjvda1ui51k551j.png-7.8kB
可用现有的cost function的函数将lambda设置为0直接计算。
代码如下:

  1. function [error_train, error_val] = ...
  2. learningCurve(X, y, Xval, yval, lambda)
  3. %LEARNINGCURVE Generates the train and cross validation set errors needed
  4. %to plot a learning curve
  5. % [error_train, error_val] = ...
  6. % LEARNINGCURVE(X, y, Xval, yval, lambda) returns the train and
  7. % cross validation set errors for a learning curve. In particular,
  8. % it returns two vectors of the same length - error_train and
  9. % error_val. Then, error_train(i) contains the training error for
  10. % i examples (and similarly for error_val(i)).
  11. %
  12. % In this function, you will compute the train and test errors for
  13. % dataset sizes from 1 up to m. In practice, when working with larger
  14. % datasets, you might want to do this in larger intervals.
  15. %
  16. % Number of training examples
  17. m = size(X, 1);
  18. % You need to return these values correctly
  19. error_train = zeros(m, 1);
  20. error_val = zeros(m, 1);
  21. % ====================== YOUR CODE HERE ======================
  22. % Instructions: Fill in this function to return training errors in
  23. % error_train and the cross validation errors in error_val.
  24. % i.e., error_train(i) and
  25. % error_val(i) should give you the errors
  26. % obtained after training on i examples.
  27. %
  28. % Note: You should evaluate the training error on the first i training
  29. % examples (i.e., X(1:i, :) and y(1:i)).
  30. %
  31. % For the cross-validation error, you should instead evaluate on
  32. % the _entire_ cross validation set (Xval and yval).
  33. %
  34. % Note: If you are using your cost function (linearRegCostFunction)
  35. % to compute the training and cross validation error, you should
  36. % call the function with the lambda argument set to 0.
  37. % Do note that you will still need to use lambda when running
  38. % the training to obtain the theta parameters.
  39. %
  40. % Hint: You can loop over the examples with the following:
  41. %
  42. % for i = 1:mx
  43. % % Compute train/cross validation errors using training examples
  44. % % X(1:i, :) and y(1:i), storing the result in
  45. % % error_train(i) and error_val(i)
  46. % ....
  47. %
  48. % end
  49. %
  50. % ---------------------- Sample Solution ----------------------
  51. for i = 1:m,
  52. theta = trainLinearReg(X(1:i,:),y(1:i),lambda);
  53. error_train(i) = linearRegCostFunction(X(1:i,:),y(1:i),theta,0);
  54. error_val(i) = linearRegCostFunction(Xval,yval,theta,0);
  55. end;
  56. % -------------------------------------------------------------
  57. % =========================================================================
  58. end

脚本文件中这部分:

  1. %% =========== Part 5: Learning Curve for Linear Regression =============
  2. % Next, you should implement the learningCurve function.
  3. %
  4. % Write Up Note: Since the model is underfitting the data, we expect to
  5. % see a graph with "high bias" -- Figure 3 in ex5.pdf
  6. %
  7. lambda = 0;
  8. [error_train, error_val] = ...
  9. learningCurve([ones(m, 1) X], y, ...
  10. [ones(size(Xval, 1), 1) Xval], yval, ...
  11. lambda);
  12. plot(1:m, error_train, 1:m, error_val);
  13. title('Learning curve for linear regression')
  14. legend('Train', 'Cross Validation')
  15. xlabel('Number of training examples')
  16. ylabel('Error')
  17. axis([0 13 0 150])
  18. fprintf('# Training Examples\tTrain Error\tCross Validation Error\n');
  19. for i = 1:m
  20. fprintf(' \t%d\t\t%f\t%f\n', i, error_train(i), error_val(i));
  21. end
  22. fprintf('Program paused. Press enter to continue.\n');
  23. pause;

image_1bunh2m23r8v1cij1dgtns516q02q.png-20.2kB

绘制的图像如下:
image_1bungod2pph31bjn179r1oaolbv20.png-27.3kB
这显然是underfitting的情况,即high bias,在这种情况下继续增加training examples几乎没有效果。


3 Polynomial regression

刚才的实验证明简单的线性回归无法很好地拟合情况,因此我们采用多项式回归。
image_1bunh8fm35n1up71obq19lr1323n.png-18.7kB
可以将其转化为多元的线性回归。
因此将X改写成X_poly

  1. unction [X_poly] = polyFeatures(X, p)
  2. %POLYFEATURES Maps X (1D vector) into the p-th power
  3. % [X_poly] = POLYFEATURES(X, p) takes a data matrix X (size m x 1) and
  4. % maps each example into its polynomial features where
  5. % X_poly(i, :) = [X(i) X(i).^2 X(i).^3 ... X(i).^p];
  6. %
  7. % You need to return the following variables correctly.
  8. X_poly = zeros(numel(X), p);
  9. % ====================== YOUR CODE HERE ======================
  10. % Instructions: Given a vector X, return a matrix X_poly where the p-th
  11. % column of X contains the values of X to the p-th power.
  12. %
  13. %
  14. for i = 1:p,
  15. X_poly(:,i) = X.^2;
  16. end;
  17. % =========================================================================
  18. end

3.1 Learning polynomial regression

进行完这步之后需要进行Normalization的操作。这个在ex1中已经实现过了。此处不表。

脚本文件执行learning polynomial regression的部分(已经完成了Normalization):

  1. %% =========== Part 7: Learning Curve for Polynomial Regression =============
  2. % Now, you will get to experiment with polynomial regression with multiple
  3. % values of lambda. The code below runs polynomial regression with
  4. % lambda = 0. You should try running the code with different values of
  5. % lambda to see how the fit and learning curve change.
  6. %
  7. lambda = 0;
  8. [theta] = trainLinearReg(X_poly, y, lambda);
  9. % Plot training data and fit
  10. figure(1);
  11. plot(X, y, 'rx', 'MarkerSize', 10, 'LineWidth', 1.5);
  12. plotFit(min(X), max(X), mu, sigma, theta, p);
  13. xlabel('Change in water level (x)');
  14. ylabel('Water flowing out of the dam (y)');
  15. title (sprintf('Polynomial Regression Fit (lambda = %f)', lambda));
  16. figure(2);
  17. [error_train, error_val] = ...
  18. learningCurve(X_poly, y, X_poly_val, yval, lambda);
  19. plot(1:m, error_train, 1:m, error_val);
  20. title(sprintf('Polynomial Regression Learning Curve (lambda = %f)', lambda));
  21. xlabel('Number of training examples')
  22. ylabel('Error')
  23. axis([0 13 0 100])
  24. legend('Train', 'Cross Validation')
  25. fprintf('Polynomial Regression (lambda = %f)\n\n', lambda);
  26. fprintf('# Training Examples\tTrain Error\tCross Validation Error\n');
  27. for i = 1:m
  28. fprintf(' \t%d\t\t%f\t%f\n', i, error_train(i), error_val(i));
  29. end
  30. fprintf('Program paused. Press enter to continue.\n');
  31. pause;

绘制如下两图:
image_1bunjq5gje7i1fte3etvv140t44.png-30.5kB
image_1bunjr6j7h4m14o010qbmop7mt5h.png-27.6kB
从第一张图可以看出,执行的回归对training set中的点拟合得非常好。但图像形状过于复杂,在两端极限处斜率很大,很可能发生了overfitting
图二的training error印证了这一点,无论training example的数量如何增加,training error始终为0, 而cross validation error起初非常大,随着training set的增加,在逐渐变小。这说明,当前的状况属于overfitting
在之前的学习中我们知道,引入regularization可以解决overfit的问题。

3.2 Adjust the regularization parameter

修改脚本文件中lambda的取值,观察图形变化。

3.3 selecting lambda using a validation set

对一组lambda分别计算其train errorcross validation error,绘制图像,以选择最合适的lambda。如下代码,将计算结果分别存入error_trainlambda_val中:

  1. function [lambda_vec, error_train, error_val] = ...
  2. validationCurve(X, y, Xval, yval)
  3. %VALIDATIONCURVE Generate the train and validation errors needed to
  4. %plot a validation curve that we can use to select lambda
  5. % [lambda_vec, error_train, error_val] = ...
  6. % VALIDATIONCURVE(X, y, Xval, yval) returns the train
  7. % and validation errors (in error_train, error_val)
  8. % for different values of lambda. You are given the training set (X,
  9. % y) and validation set (Xval, yval).
  10. %
  11. % Selected values of lambda (you should not change this)
  12. lambda_vec = [0 0.001 0.003 0.01 0.03 0.1 0.3 1 3 10]';
  13. % You need to return these variables correctly.
  14. error_train = zeros(length(lambda_vec), 1);
  15. error_val = zeros(length(lambda_vec), 1);
  16. % ====================== YOUR CODE HERE ======================
  17. % Instructions: Fill in this function to return training errors in
  18. % error_train and the validation errors in error_val. The
  19. % vector lambda_vec contains the different lambda parameters
  20. % to use for each calculation of the errors, i.e,
  21. % error_train(i), and error_val(i) should give
  22. % you the errors obtained after training with
  23. % lambda = lambda_vec(i)
  24. %
  25. % Note: You can loop over lambda_vec with the following:
  26. %
  27. % for i = 1:length(lambda_vec)
  28. % lambda = lambda_vec(i);
  29. % % Compute train / val errors when training linear
  30. % % regression with regularization parameter lambda
  31. % % You should store the result in error_train(i)
  32. % % and error_val(i)
  33. % ....
  34. %
  35. % end
  36. %
  37. %
  38. for i = 1:length(lambda_vec)
  39. lambda = lambda_vec(i);
  40. theta = trainLinearReg(X,y,lambda);
  41. error_train(i) = linearRegCostFunction(X, y, theta, 0);
  42. error_val(i) = linearRegCostFunction(Xval, yval, theta, 0);
  43. end;
  44. % =========================================================================
  45. end

脚本文件执行作图操作:

  1. %% =========== Part 8: Validation for Selecting Lambda =============
  2. % You will now implement validationCurve to test various values of
  3. % lambda on a validation set. You will then use this to select the
  4. % "best" lambda value.
  5. %
  6. [lambda_vec, error_train, error_val] = ...
  7. validationCurve(X_poly, y, X_poly_val, yval);
  8. close all;
  9. plot(lambda_vec, error_train, lambda_vec, error_val);
  10. legend('Train', 'Cross Validation');
  11. xlabel('lambda');
  12. ylabel('Error');
  13. fprintf('lambda\t\tTrain Error\tValidation Error\n');
  14. for i = 1:length(lambda_vec)
  15. fprintf(' %f\t%f\t%f\n', ...
  16. lambda_vec(i), error_train(i), error_val(i));
  17. end
  18. fprintf('Program paused. Press enter to continue.\n');
  19. pause;

绘制图形如下:
image_1bunm5pskb1l1fkh153n12ceni97l.png-27.8kB
根据cross validation error的图像,可以看出最佳的lambda取值应该大致在3左右的范围内。

3.4 Computing test error

经过trainingcross validation的过程后,我们得到了最合适的 , 但一般来说,我们还需要对一些之前没有使用过的数据进行test,并计算test error

3.5 Plotting learning curves with randomly selected examples

在绘制learning curve的时候,我们选取的增加training example 的方式是按顺序逐渐增加,而更好的做法应该是对第i个循环,随机选取itraining example来得到theta,并用这个theta来计算对应的train errorcross validation error

添加新批注
在作者公开此批注前,只有你和作者可见。
回复批注