@lyc102 2017-04-08T05:37:05.000000Z 字数 1585 阅读 2636

机器学习（周志华）第一章：绪论

machine_learning

机器学习（周志华）第一章：绪论

基本术语

data set
instance, sample, feature vector
attribute, feature, attribute value
attribute space or sample space

Mathematically, $D = \{ \boldsymbol x_1, \boldsymbol x_2, \ldots, \boldsymbol x_m \}$ is a date set containing $m$ samples. Each instance $\boldsymbol x_i = (x_i^1, x_i^2, \ldots, x_i^d)^T\in \mathbb R^d$ is a feature vecotr in the $d$ -dimensional sample space $\mathcal X$ , where $x_i^j$ is the $j$ -th attribute value of the $i$ -th sample. $d$ is called the dimensionality of the sample.

training data
training sample
training set

从数据中学习某种模型，对应了关于数据的某种潜在的规律。这个规律称为：hypothesis or ground-truth.

$(\boldsymbol x_i, y_i), \boldsymbol x_i \in \mathcal X, y_i \in \mathcal Y$ .

label: $y_i$
example: $(\boldsymbol x_i, y_i)$
lable space: $\mathcal Y$

根据我们欲预测的值，可以将学习任务归为

classification: discrete value
- binary classification: $\mathcal Y=\{-1,+1\}$ or $\{0,1\}$
- multi-class classification: $|\mathcal Y|>2$
regression: continuous value $\mathcal Y\in \mathbb R$

Mathematically, we want to get a map $f:\mathcal X \to \mathcal Y$ from the examples $\{(\boldsymbol x_i, y_i)\}$ .

根据训练数据是否拥有标记信息，可分为
- Supervised learning: classification and regression
- Unsupervised learning: clustering

机器学习的目标是使学得的模型能很好地适用于“新样本”，该能力称为“泛化” (generalization) 能力。通常假设样本空间全体样本服从一个未知分布，每个样本是i.i.d.采样获得。样本越多，就越有可能通过学习知道该分布，从而获得具有强泛化能力的模型。

假设空间

Induction
Deduction

Inductive learning
- 广义：从样例中学习（黑箱）
- 狭义：学得概念（太难）

归纳偏好（Inductive bias)

从函数拟合的角度，相同的取样点，可以有多条曲线通过这些点，怎么判断哪个曲线更好？

归纳偏好可看作学习算法的“价值观”。一般性的原则：Occam's razor.

若有多个假设和观察一致，选最简单的那个

问题转化为，如何定义“简单”。

算法的归纳偏好是否与问题本身匹配，大多数时候直接决定了算法能否取得好的性能。

No Free Lunch Theorem.

NFL定理的意义是：脱离具体问题，空泛地讨论“什么学习算法更好”毫无意义。要讨论算法的相对优劣，必须要针对具体的学习问题。

数学上来说，是在函数空间 $\{f\}$ 的某个子集上，选取一个好的度量，而这个度量是和问题相关的。给定度量后，最好的 $f$ ，如果存在的话，可以通过变分求解。

机器学习（周志华）第一章：绪论

基本术语

假设空间

归纳偏好 （Inductive bias)

内容目录

选择主题

归纳偏好（Inductive bias)