最近开始学习Stanford吴恩达的机器学习课程,常做笔记,以便复习巩固。
鄙人才疏学浅,如有错漏与想法,还请多包涵,指点迷津。
Week 01
Introduction
- Application of machine learing
- Database mining
- Applications can’t program by hand
- Customizing programs
- Definition
- Arthur Samuen. Field of study that gives computers the ability to learn without being explicitly programmed.
- Tom Mitchell. A computer program is said to learn from experience E with respect to some task T and some performance measure P, if its performance on T as measured by P improves with experience E.
- Common Types
- Supervised Learning
- Given the “right answer” for each example in the data.
- Regreession Problem : Predict real-value output.
- Classification : Predict discrete output.
- Unsupercised learning
- Unsupervised learning allows us to approach problems with little or no idea what the effect of the variables is.
- Clustering、Non-clustering
- Supervised Learning
Model and Cost Function
- Basic Model Representation
- number of training data - m
- Input - x
- Output - y
- Input space - X
- Output space - Y
- Hypothesis - h:X->Y
- Cost Function
- Cost function measure the accuracy of our hypothesis function, for example :
J(θ0,θ1)=12m∑i=1m(yi^−yi)2=12m∑i=1m(hθ(xi)−yi)2 J ( θ 0 , θ 1 ) = 1 2 m ∑ i = 1 m ( y i ^ − y i ) 2 = 1 2 m ∑ i = 1 m ( h θ ( x i ) − y i ) 2
- Cost function measure the accuracy of our hypothesis function, for example :
- Contour plot
- A contour plot is a graph that contains many contour lines. A contour line of a two variable function has a constant value of the same line.
Parameter Learning
Outline
- Start with some θ0,θ1 θ 0 , θ 1
- Keep changing θ0,θ1 θ 0 , θ 1 to reduce J(θ0,θ1) J ( θ 0 , θ 1 ) until we hopefully end up at a minimum.
Algorithm
repeat until convergence {
θj:=θj−α∂∂θ0J(θ0,θ1,...,θn)(for j = 0, 1, ..., n) θ j := θ j − α ∂ ∂ θ 0 J ( θ 0 , θ 1 , . . . , θ n ) (for j = 0, 1, ..., n)
}
simultaneous update {
temp0:= θ0−α∂∂θ0J(θ0,θ1,...,θn) θ 0 − α ∂ ∂ θ 0 J ( θ 0 , θ 1 , . . . , θ n )
…
tempn:= θn−α∂∂θ0J(θ0,θ1,...,θn) θ n − α ∂ ∂ θ 0 J ( θ 0 , θ 1 , . . . , θ n )
θ0 θ 0 :=temp0
…
θn θ n :=tempn
}- Comprehension
- It’s like going down the hill in the fastest way. The differential give us a direction to move towards, and the α α (which is called learning rate) means the size of each step.
- As we approach the bottom of our convex function, the derivative will tend to be 0, and at the bottom we have θ1:=θ1−α×0 θ 1 := θ 1 − α × 0 .
- Comprehension
Gradient descent for linear regression - Algorthm1
repeat until convergence (simultaneously update){
θ0:=θ0−α1m∑i=1m(hθ(x(i))−yi) θ 0 := θ 0 − α 1 m ∑ m i = 1 ( h θ ( x ( i ) ) − y i )
θ1:=θ1−α1m∑i=1m(hθ(x(i))−yi) θ 1 := θ 1 − α 1 m ∑ m i = 1 ( h θ ( x ( i ) ) − y i )
}- This method looks at every example in the entire training set on every step, and is called batch gradient descent.