@xmruibi
2014-10-05T21:55:05.000000Z
字数 2417
阅读 872
Coursera
Arthur Samuel (1959): Machine Learning: Field of study that gives computers the ability to learn without being explicitly programmed.
Definition:
Tom Mitchell (1998) : Well-posed Learning Problem: A computer program is said to learn from experience E with respect to some task T and some performance measure P, if its performance on T, as measured by P, improves with experience E.
Spam:
例子:对于一个垃圾邮件识别的问题,将邮件分类为垃圾邮件或非垃圾邮件是任务T,查看哪些邮件被标记为垃圾邮件哪些被标记为非垃圾邮件是经验E,正确识别的垃圾邮件或非垃圾邮件的数量或比率是评测指标P。
1、有监督学习(Supervised learning):通过生成一个函数将输入映射为一个合适的输出(通常也称为标记,多数情况下训练集都是有人工专家标注生成的)。例如分类问题,分类器 更加输入向量和输出的分类标记模拟了一个函数,对于新的输入向量,得到它的分类结果。
2、无监督学习(Unsupervised learning):与有监督学习相比,训练集没有人为标注的结果。常见的无监督学习算法有聚类。
3、半监督学习: 介于监督学习与无监督学习之间。
4、强化学习(Reinforcement learning): 通过观察来学习如何做出动作,每个动作都会对环境有所影响,而环境的反馈又可以引导该学习算法。
鸡尾酒会问题算法
Imagine you're at a cocktail party. For you it is no problem to follow the discussion of your neighbours, even if there are lots of other sound sources in the room: other discussions in English and in other languages, different kinds of music, etc.. You might even hear a siren from the passing-by police car.It is not known exactly how humans are able to separate the different sound sources. Independent component analysis is able to do it, if there are at least as many microphones or 'ears' in the room as there are different simultaneous sound sources. In this demo, you can select which sounds are present in your cocktail party. ICA will separate them without knowing anything about the different sound sources or the positions of the microphones.
Online Source: http://research.ics.aalto.fi/ica/cocktail/cocktail_en.cgi
(一行代码):
[W,s,v] = svd((repmat(sum(x.*x,1),size(x,1),1).*x)*x’);
1) Model representation(模型表示)
2) Cost function(代价函数,成本函数)
squared error function:
3) Cost function intuition I(直观解释1)
4) Cost function intuition II(直观解释2)
回顾线性回归的四个部分,这一次不在对Cost Function做简化处理,这个时候J(θ0,θ1)的图形是一个三维图或者一个等高线图,可以发现,当hθ(x)的直线越来越接近样本点时,J(θ0,θ1)在等高线的图中的点越来越接近最小值的位置。
5) Gradient descent(梯度下降)
应用的场景之一-最小值问题:
对于一些函数,例如J(θ0,θ1)
目标: minθ0,θ1J(θ0,θ1)
方法的框架:
1、给θ0, θ1一个初始值,例如都等于0
2、每次改变θ0, θ1的时候都保持J(θ0,θ1)递减,直到达到一个我们满意的最小值;
对于任一J(θ0,θ1) , 初始位置不同,最终达到的极小值点也不同,例如以下两个例子:
重复下面的公式直到收敛:
Learning Rate; Derivative;
6) Gradient descent intuition(梯度下降直观解释)
7) Gradient descent for linear regression(应用于线性回归的的梯度下降算法)
1 http://www.52nlp.cn/coursera%E5%85%AC%E5%BC%80%E8%AF%BE%E7%AC%94%E8%AE%B0-%E6%96%AF%E5%9D%A6%E7%A6%8F%E5%A4%A7%E5%AD%A6%E6%9C%BA%E5%99%A8%E5%AD%A6%E4%B9%A0%E7%AC%AC%E4%B8%80%E8%AF%BE%E5%BC%95%E8%A8%80introduct