Machine Learning Notes

# Machine Learning

标签(空格分隔): ML


Introduction

Supervised Learning

Supervised Learning: “right answers” given
Classification: Discrete valued output (0 or 1)
Regression: Predict continuous valued output

Unsupervised Learning

Model and Cost Function

Created with Raphaël 2.1.0 Training Set Learning Algorithm hypothesis

Linear regression with one variable
Univariate linear regression
Hypothesis:

hθ(x)=θ0+θ1(x)

Parameters:
θ0,θ1

Idea: Choose θ0,θ1 so that hθ(x) is close to y for our training examples(x,y)
Cose Function:
minJ(θ0,θ1)=12mi=1m(hθ(xi)yi)2

We called that square error function.
hθ(x) (for fixed θ0,θ1 , this is a function of x )


这里写图片描述

Matrix

Dimension of matrix: number of rows × number of columns. Rm×n

Gradient descent algorithm

repeat until convergence

θj:=θjαJ(θ0,θ1)θjfor(j=0andj=1)

:= is assignment, and α is learning rate.
the subtlety of how you implement gradient descent

Correct: simultaneous update:
temp0:=θ0αJ(θ0,θ1)θ0
temp1:=θ1αJ(θ0,θ1)θ1
θ0=temp0
θ1=temp1
Gradient descent can converge to a local minimum, even with the learning rate in a fixed.
As we approach a local minimum, gradient descent will automatically take smaller steps. So, no need to decrease &\alpha$ over time.

WEEK 2

Linear Regression with multiple variables: Multiple Features

x(i) =input (features) of i(th) training example.
xij =value if feature j in ith training example.

Hypothesis

hθ(x)=θ0+θ1x1+θ2x2++θnxn
For convenience of notation, define x0=1

x=x0x1x2xnθ=θ0θ1θ2θn

hθ(x)=θTx

Gradient Descent for Multiple Variables

Hypothesis: hθ(x)=θT=θ0x0+θ1x1+θ2x2++θnxn
Parameters: θ0,θ1,,θn
Cost function:

J(θ0,θ1,,θn)=J(θ)=12mi=1m(hθ(xi)yi)2

Gradient Descent
repeat until convergence
θj:=θjαJ(θ0,θ1)θjfor(j=0,1,,n)

这里写图片描述

Feature Scaling

Mean Normalization

Learning rate

  • “Debugging”: How to make sure gradient descent is working correctly.
  • How to choose learning rate α
Housing prices prediction

hθ(x)=θ0+θ1×frontage+θ2×depth

define new features
hθ(x)=θ0+θ1×Area

Polynomial regression

这里写图片描述

Choice of features

hθ(x)=θ0+θ1×(size)+θ2×(size)2

hθ(x)=θ0+θ1×(size)+θ2×size

Computing Parameters Analytically

Normal Equation

Method to solve for θ analytically.

θ=(XTX)1XTy

+ No need to chhose α
+ Don’t need to iterate.
+ Need to compute (XTX)1
+ Slow if n is very large.

Octave

Moving Data Around

    load('featuresX.dat')
    load featuresX.dat
    who %veriables in the current scope
    whos
    clear
    save hello.mat v
    save hello.txt v %save as text(ASCII)
    A(2,:)  %':' means every elements along that row
    A([1 3],:)
    A = [A,[100;200;300]];
    A[:] %put all elements of A into a single vector

Computing on Data

    A.*B
    A*B
    A.^2
    v=[1;2;3]
    1 ./ v
    log(v)
    exp(v)
    abs(v)
    -v
    v + ones(length(v),1)
    a = [1 2 3 4]
    [val,ind]=max(a)
    a < 1
    find(a<3)
    A = magic(3)
    [r,c] = find(A>=7)
    sum(a)
    prod(a)
    floor(a)
    ceil(a)
    max(rand(3),rand(3))
    max(A,[],1)
    max(A,[],2)
    max(max(A))
    A.*eye(3)
    pinv(A)

Plotting Data

    plot

Vectorization

这里写图片描述

Week 3

Classification and Represstation

Classification

y=0 or 1
hθ(x) can be >1 or <0 <script type="math/tex" id="MathJax-Element-46"><0</script>
Logistic Regression: 0hθ(x)1

Hypothesis Representation

Logistic Regression Model
want 0hθ(x)1

hθ(x)=g(θTx)

g(z)=11+ez

Sigmoid function or logistic function
Interpretation of Hypothesis Output
hθ(x)= estimated probability that y=1 on input x

Cost Function

convex
Logistic regression cost function

J(θ)=1mi=1mCost(hθ(xi),yi)

Cost(hθ(x),y)={log(hθ(x))ify=1log(1hθ(x))ify=0

Note: y=0 or 1 always
Simplified Cost Function
Cost(hθ(xi),yi)=y(i)log(hθ(x(i)))(1y(i))log(1hθ(x(i)))

J(θ)=1mi=1m[y(i)log(hθ(x(i)))+(1y(i))log(1hθ(x(i)))]

Want minθJ(θ):
Repeat{
θj:=θjαi=1m(hθ(x(i))y(i))x(i)j

(simultaneously update all θj )
}

Algorithm looks identical to linear regression!

Regularization

Overfitting

这里写图片描述

Cost Function

J(θ)=12mi=1m(hθ(x(i))y(i))2+λj=1nθ2j

Gadient descent

Repeat{

θ0:=θ0α1mi=1m(hθ(x(i))y(i))xi0

θj:=θjα[1mi=1m(hθ(x(i))y(i))xij+λmθj]

}
θj:=θj(1αλm)α1mi=1m(hθ(x(i))y(i))xij

Non-invertibility (optional/advanced)

Suppose mn, (#examples) (#features)

θ=(XTX)1XTy

if ambda>0,
θ=XTX+λ01111XTy

Week 4

这里写图片描述
这里写图片描述
a(j)i= ”activation” of unit i in layer j
Θ(j)= matrix of weights controlling function mapping from layer j to layer j+1

a(2)1=g(Θ(1)10x0+Θ(1)11x1+Θ(1)12x2+Θ(1)13x3)

a(2)2=g(Θ(1)20x0+Θ(1)21x1+Θ(1)22x2+Θ(1)23x3)

a(2)3=g(Θ(1)30x0+Θ(1)31x1+Θ(1)32x2+Θ(1)33x3)

hΘ(x)=a(3)1=g(Θ(2)10a0+Θ(2)11a1+Θ(2)12a2+Θ(2)13a3)

If network has sj units in layer j , sj+1 units in layer j+1 , then Θ(j) will be of demension sj+1×(sj+1) .
z21=Θ(1)10x0+Θ(1)11x1+Θ(1)12x2+Θ(1)13x3

Forward propagation: Vectorized implementation

x=x0x1x2x3z(2)=z(2)1z(2)2z(2)3

z(2)=Θ(1)x

a2=g(z(2))

Adda(2)0=1

z(3)=Θ(2)a(2)

  • 1
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值