[关闭]
@EtoDemerzel 2017-10-31T16:42:13.000000Z 字数 10097 阅读 627

机器学习week2 ex1 review


机器学习 编程作业 吴恩达


1. Linear regression with one variable

1.1 Plotting the Data

  1. data = load('ex1data1.txt'); % read comma separated data

Octave/MATLAB 会根据数据文件中的逗号来分隔数据。这里的data以矩阵形式存储。因为有两组数据,所以是两列。
然后分别将人口和盈利存入 中。

  1. X = data(:,1); y = data(:,2);
  2. m = length(y);

data的第一列, 取第二列。training example 的数量。因为 大小相同,用哪个都无所谓。

下一步调用 PlotData 函数绘制散点图。
我们需要首先把这个函数补充完整。
这是文件中包含的原始版本:

  1. function plotData
  2. %PLOTDATA Plots the data points x and y into a new figure
  3. % PLOTDATA(x,y) plots the data points and gives the figure axes labels of
  4. % population and profit.
  5. figure; % open a new figure window
  6. % ====================== YOUR CODE HERE ======================
  7. % Instructions: Plot the training data into a figure using the
  8. % "figure" and "plot" commands. Set the axes labels using
  9. % the "xlabel" and "ylabel" commands. Assume the
  10. % population and revenue data have been passed in
  11. % as the x and y arguments of this function.
  12. %
  13. % Hint: You can use the 'rx' option with plot to have the markers
  14. % appear as red crosses. Furthermore, you can make the
  15. % markers larger by using plot(..., 'rx', 'MarkerSize', 10);
  16. % ============================================================
  17. end

我们可以看到他要求我们绘制散点图,并给横纵坐标加上 population 和 profit 的标签。
Hint 提示我们这样做:

  • 使用figure打开一个图形窗口。
  • 使用plot函数绘图。用xlabelylabel分别设置横纵坐标的标签。rx将散点设置为红十字形(red cross)。通过'Markersize',10'设置散点大小。

代码如下:

  1. plot(x, y, 'rx', 'Markersize', 10); % plot the data
  2. ylabel('Profit in $10,000s'); % set the y-axis label
  3. xlabel('Population of City in 10,000s'); % set the x-axis label

这时再运行ex1.m,就可以得到如下图形:
Figure 1: Scatter plot of training data

1.2 Gradient Descent

1.2.1 Update Equations


线性回归的目标就是使代价函数最小

其中


使用批量梯度下降法batch gradient descent ) 以使 最小。

( simultaneously update for all )

1.2.2 Implementation

目前的 是一个列向量,每一行存储一个 training example,即
因此在脚本文件ex1.m中,为了处理 , 给每一行增加一个

  1. X = [ones(m,1) data(:,1)]; % Add a column of ones to x

因为只有一个变量 影响盈利,

初始化

  1. theta = zeros(2,1); % initialize fitting parameters

设置迭代次数和 的值:

  1. iterations = 1500;
  2. alpha = 0.01;

1.2.3 Computing the cost

根据上述公式,完成computeCost以计算代价函数

  1. function J = computeCost(X, y, theta)
  2. %COMPUTECOST Compute cost for linear regression
  3. % J = COMPUTECOST(X, y, theta) computes the cost of using theta as the
  4. % parameter for linear regression to fit the data points in X and y
  5. % Initialize some useful values
  6. m = length(y); % number of training examples
  7. prediction = X * theta;
  8. sqError = (prediction - y).^2;
  9. % You need to return the following variables correctly
  10. J = 0;
  11. % ====================== YOUR CODE HERE ======================
  12. % Instructions: Compute the cost of a particular choice of theta
  13. % You should set J to the cost.
  14. J = 1/(2 * m) * sum(sqError);
  15. % =========================================================================
  16. end

我们来看一下脚本文件ex1.m中这一部分的测试代码

  1. fprintf('\nTesting the cost function ...\n')
  2. % compute and display initial cost
  3. J = computeCost(X, y, theta);
  4. fprintf('With theta = [0 ; 0]\nCost computed = %f\n', J);
  5. fprintf('Expected cost value (approx) 32.07\n');
  6. % further testing of the cost function
  7. J = computeCost(X, y, [-1 ; 2]);
  8. fprintf('\nWith theta = [-1 ; 2]\nCost computed = %f\n', J);
  9. fprintf('Expected cost value (approx) 54.24\n');
  10. fprintf('Program paused. Press enter to continue.\n');
  11. pause;

它对两组数据进行了测试。一组是我们之前初始化后的 , 另一组是
如果你的computeCost.m计算正确的话,输出的两个答案应该是32.072734和54.242455。

1.2.4 Gradient descent

补充gradientDescent.m的代码。如下:

  1. %GRADIENTDESCENT Performs gradient descent to learn theta
  2. % theta = GRADIENTDESCENT(X, y, theta, alpha, num_iters) updates theta by
  3. % taking num_iters gradient steps with learning rate alpha
  4. % Initialize some useful values
  5. m = length(y); % number of training examples
  6. J_history = zeros(num_iters, 1);
  7. temp = theta;
  8. n = length(theta);
  9. for iter = 1:num_iters
  10. % ====================== YOUR CODE HERE ======================
  11. % Instructions: Perform a single gradient step on the parameter vector
  12. % theta.
  13. %
  14. % Hint: While debugging, it can be useful to print out the values
  15. % of the cost function (computeCost) and gradient here.
  16. %
  17. for j = 1:n
  18. temp(j) = theta(j) - 1/m*alpha*sum((X*theta-y).*X(:,j));
  19. end;
  20. theta = temp;
  21. % ============================================================
  22. % Save the cost J in every iteration
  23. J_history(iter) = computeCost(X, y, theta);
  24. end
  25. end

事实上这已经解决了多变量的线性回归问题,尽管这里只用处理 的情况。

再来看一下脚本文件里这一部分的内容:

  1. fprintf('\nRunning Gradient Descent ...\n')
  2. % run gradient descent
  3. theta = gradientDescent(X, y, theta, alpha, iterations);
  4. % print theta to screen
  5. fprintf('Theta found by gradient descent:\n');
  6. fprintf('%f\n', theta);
  7. fprintf('Expected theta values (approx)\n');
  8. fprintf(' -3.6303\n 1.1664\n\n');
  9. % Plot the linear fit
  10. hold on; % keep previous plot visible
  11. plot(X(:,2), X*theta, '-')
  12. legend('Training data', 'Linear regression')
  13. hold off % do not overlay any more plots on this figure
  14. % Predict values for population sizes of 35,000 and 70,000
  15. predict1 = [1, 3.5] *theta;
  16. fprintf('For population = 35,000, we predict a profit of %f\n',...
  17. predict1*10000);
  18. predict2 = [1, 7] * theta;
  19. fprintf('For population = 70,000, we predict a profit of %f\n',...
  20. predict2*10000);
  21. fprintf('Program paused. Press enter to continue.\n');
  22. pause;

使用梯度下降法后输出计算得到的 :
image_1btongdv2afu1dae13bv1e3i1mimm.png-7.7kB
与正确情况吻合。

之后绘制图线。对之前的散点图使用hold on,保留图形。
再绘制经过梯度下降后得到的 的图线。
得到如下图形:
image_1bton9ngt29919hc1duq3uh9339.png-36kB
再对population = 35000 和 70000的情况进行估计,输出这两种情况下的估计值:
image_1btoni94o1rg9d14107ktcu1oat13.png-8.5kB

1.3 Visualizing

脚本文件ex1.m提供了对 可视化的部分。

  1. fprintf('Visualizing J(theta_0, theta_1) ...\n')
  2. % Grid over which we will calculate J
  3. theta0_vals = linspace(-10, 10, 100);
  4. theta1_vals = linspace(-1, 4, 100);

函数 linspace(BASE,LIMIT,N=100) 返回一个从BASE到LIMIT的等间距分布的行向量;如果BASE和LIMIT是列向量的话,返回一个矩阵。不输入N的时候默认为100。

  1. % initialize J_vals to a matrix of 0 s
  2. J_vals = zeros(length(theta0_vals), length(theta1_vals));
  3. % Fill out J_vals
  4. for i = 1:length(theta0_vals)
  5. for j = 1:length(theta1_vals)
  6. t = [theta0_vals(i); theta1_vals(j)];
  7. J_vals(i,j) = computeCost(X, y, t);
  8. end
  9. end

平面上的点求出其代价函数值。
绘制曲面图:

  1. % Because of the way meshgrids work in the surf command, we need to
  2. % transpose J_vals before calling surf, or else the axes will be flipped
  3. J_vals = J_vals';
  4. % Surface plot
  5. figure;
  6. surf(theta0_vals, theta1_vals, J_vals)
  7. xlabel('\theta_0'); ylabel('\theta_1');

image_1btorckqkshi1l161shbka43263a.png-179.6kB
绘制等值线图:

  1. % Contour plot
  2. figure;
  3. % Plot J_vals as 15 contours spaced logarithmically between 0.01 and 100
  4. contour(theta0_vals, theta1_vals, J_vals, logspace(-2, 3, 20))
  5. xlabel('\theta_0'); ylabel('\theta_1');
  6. hold on;
  7. plot(theta(1), theta(2), 'rx', 'MarkerSize', 10, 'LineWidth', 2);

image_1btordvej9j1e3sdun1lvcnm33t.png-53.8kB
从图中可以看出最小值所在的位置。


2. Linear regression with multiple variables

这一次需要处理多变量的线性回归。比如预测房价,需要考虑的因素可能就包括房子的大小、卧室的数量。
在文件ex1data2.txt中有多变量的 training example。 如下图所示:
image_1btorn0cm5f9vrqa3i1tjr1n374a.png-40.5kB
共有三列,第一列是房子面积(单位:平方英尺),第二列是卧室数量,第三列是房价。

2.1 Feature normalization

脚本文件ex1_multi中首先展示部分数据:

  1. %% Load Data
  2. data = load('ex1data2.txt');
  3. X = data(:, 1:2);
  4. y = data(:, 3);
  5. m = length(y);
  6. % Print out some data points
  7. fprintf('First 10 examples from the dataset: \n');
  8. fprintf(' x = [%.0f %.0f], y = %.0f \n', [X(1:10,:) y(1:10,:)]');
  9. fprintf('Program paused. Press enter to continue.\n');
  10. pause;

image_1btos2cq66mm8d91d00oa21v14n.png-12.3kB
通过观察数据可以发现,第一列数据大小比第二列数据高三个数量级,需要进行标准化Normalization
标准化包括如下步骤:
* 减去平均值
* 除以标准差(因为大部分数据会落在平均值标准差的范围内),也可以直接选择用max-min来代替

代码如下:

  1. function [X_norm, mu, sigma] = featureNormalize(X)
  2. %FEATURENORMALIZE Normalizes the features in X
  3. % FEATURENORMALIZE(X) returns a normalized version of X where
  4. % the mean value of each feature is 0 and the standard deviation
  5. % is 1. This is often a good preprocessing step to do when
  6. % working with learning algorithms.
  7. % You need to set these values correctly
  8. X_norm = X;
  9. mu = zeros(1, size(X, 2));
  10. sigma = zeros(1, size(X, 2));
  11. % ====================== YOUR CODE HERE ======================
  12. % Instructions: First, for each feature dimension, compute the mean
  13. % of the feature and subtract it from the dataset,
  14. % storing the mean value in mu. Next, compute the
  15. % standard deviation of each feature and divide
  16. % each feature by it's standard deviation, storing
  17. % the standard deviation in sigma.
  18. %
  19. % Note that X is a matrix where each column is a
  20. % feature and each row is an example. You need
  21. % to perform the normalization separately for
  22. % each feature.
  23. %
  24. % Hint: You might find the 'mean' and 'std' functions useful.
  25. %
  26. mu = mean(X);
  27. sigma = std(X);
  28. for i = 1:size(X,1)
  29. X_norm(i,:) = (X(i,:)-mu)./sigma;
  30. end;
  31. % ============================================================
  32. end

meanstd 分别用来计算向量的平均值和标准差,如果是对象是矩阵的话,默认计算每列的平均值和标准差,然后返回一个行向量。

2.2 Gradient descent

这一部分(包括梯度下降和代价函数),我们在单变量的时候处理的时候已经可以用于多变量了。略去。
值得一提的是,在计算代价函数的时候,用如下方法计算是很有效的:
image_1btot5mnqv7tmgp1tce1dhr1cn754.png-56.1kB

2.2.1 Selecting learning rates

可以通过修改ex1_multi.m中的learning rate来直观感受其作用。
其中有如下代码:

  1. % Init Theta and Run Gradient Descent
  2. theta = zeros(3, 1);
  3. [theta, J_history] = gradientDescentMulti(X, y, theta, alpha, num_iters);
  4. % Plot the convergence graph
  5. figure;
  6. plot(1:numel(J_history), J_history, '-b', 'LineWidth', 2);
  7. xlabel('Number of iterations');
  8. ylabel('Cost J');
  9. % Display gradient descent's result
  10. fprintf('Theta computed from gradient descent: \n');
  11. fprintf(' %f \n', theta);
  12. fprintf('\n');

这段代码的功能是画出 随迭代次数的变化情况。
numel函数返回对象的元素个数。
J_history内存储了每次迭代后的代价函数值。在gradientDescentMulti.m中,我们每一次循环中有这样的步骤来计算J_history:

  1. % Save the cost J in every iteration
  2. J_history(iter) = computeCostMulti(X, y, theta);

设置learning rate 为 0.01,0.03,0.1,1,1.5 画出的图像依次如下所示:

image_1btp4lfg21t7j7fpej21qiq19985h.png-25.4kB
image_1btp4ncpa1cfg1248kei1eec1d195u.png-27.2kB
image_1btp4prvh161v7o9f1e1flsral7b.png-25.3kB
image_1btp4r2nk11181hmjh85f6p13hq7o.png-22kB
image_1btp4vmu3p6rhbi7lrqbkllq85.png-22.4kB
可以注意到,起初 设置得很小的时候,下降非常缓慢; 适当增大之后,下降速度变快;而 过大时,图线不降反升。

用梯度下降法在ex1_multi中计算1650平方英尺,3间卧室的房子的价格:

  1. testify = [1,1650, 3];
  2. price = (testify - [0 mu]) ./ [1 sigma] * theta;

需要记得,在使用时要先进行normalization。
输出的结果是:
image_1btp6hacsggttin1uvvq4d1io92.png-10.8kB

2.3 Normal equations

代码相当简单:

  1. function [theta] = normalEqn(X, y)
  2. %NORMALEQN Computes the closed-form solution to linear regression
  3. % NORMALEQN(X,y) computes the closed-form solution to linear
  4. % regression using the normal equations.
  5. theta = zeros(size(X, 2), 1);
  6. % ====================== YOUR CODE HERE ======================
  7. % Instructions: Complete the code to compute the closed form solution
  8. % to linear regression and put the result in theta.
  9. %
  10. % ---------------------- Sample Solution ----------------------
  11. theta = pinv(X' * X) * X' * y;
  12. % -------------------------------------------------------------
  13. % ============================================================
  14. end

用normal Equation在ex1_multi中计算1650平方英尺,3间卧室的房子的价格:

  1. price = testify * theta;

normal equation 不需要进行normalization。
输出的结果是:
image_1btp6hv3r1uecu3s1sg1osjfdu9f.png-11.1kB

与之前用梯度下降法求出的结果吻合得相当精确。

添加新批注
在作者公开此批注前,只有你和作者可见。
回复批注