@betasy
2016-09-25T10:06:25.000000Z
字数 4869
阅读 1369
机器学习 吴恩达
(quote) machine learning is the science of getting computers to learn, without being explicitly programmed.
Machine Learning at Stanford Visit ml-class.org to enroll
leacture slide:
Machine Learning
- Grew out of work in AI
- New capability for computers
Examples:
- Database mining
Large datasets from growth of automation/web.
E.g., Web click data ,medical records, biology, engineering
- Applications can't program by hand.
E.g., Autonomous helicopter, handwriting recognition, most of Natural Language Processing(NLP), Computer Vision.
- Self-customizing programs
E.g., Amazon, Netflix product recommendations
- Understanding human learning(brain, real AI)
no labels or features
there is a dataset, and can you find the structure
cluster algorithm, e.g.google news gather news in simillar topics
applications:
example: cocktail party problem
use Octave to code
Regression Problem
Predict real-valued output
training set <--> test set
one training sample
, i th training sample
Process of learning algorithms:
Training Set------> Learning Algorithm------>h(hypothesis):input x, output estimated_y
h is a function that maps from x to y
How to represent h?
in this linear regression problem,
Cost Function
problem to choose(calculate) parameters
idea:choose so that is close to for our training samples
it's a minimizing problem, let's see:
Cost Function -- intuition I
Gradient Descent
gradient descent is going to minimize the cost function
Have some function
want
Gradient Descent Algorithm
Gradient Descent -- intuition
gradient descent can converge to a local minimum, even with the learning rate fixed
as we approach a local minimum, gradient descent will automatically take smaller steps. So, no need to decrease over time
Gradient Descent for linear regression
for the linear regression model, the partial derivative of is:
so put the slope back into the gradient algorithm:
it turns out to be that the cost function of linear regression model is often a convex function, which doesn't have any local optimum, except for one global optimum
"Batch" Gradient Descent
"Batch":each step of gradient descent uses all trainingexamples
Matrices and vectors
Matrix multiplication properties
Inverse and Transpose
sigular and degenerate matrices