Machine Learning Week Two

notation

x(i)j denotes value of feature j in the ith training example

n denotes the number of features

The multivariable form of the hypothesis function is

hθ(x)=θ0+θ1x1+...+θnxn

or

hθ(x)=i=0nθixi

where we asume x(i)0=1
we alse write
θ=θ0θ1θn

and
X=x0x1xn

so we can rewrite h(x) as
hθ(x)=θ⃗ TX⃗ 

Algorithm

θj:=θjα1m(hθ(x(i))y(i)))x(i)j

(for j = 0 to n)
where x0 = 1

Algorithm in practise

feature scaling

We can speed up gradient descent by having each of our input values in roughly the same range.

xi:=xiμisi

where μi means the average of xi
and si means the range of xi or means standard deviation

learning rate

Debugging gradient descent.

Make a plot with number of iterations on the x-axis. Plot the cost function, J(θ) over the number of iterations of gradient descent. If J(θ) ever increases, α is too large.

Automatic convergence test.

Declare convergence if J(θ) decreases by less than E in one iteration, where E is some small value such as 10−3. However in practice it’s difficult to choose this threshold value.

Features and Polynomial Regression


we can combine multiple features into one. Such as x3 := x1 * x2
We can change the behavior or curve of our hypothesis function by making it a quadratic, cubic or square root function (or any other form)for example:
hθ(x)=θ0+θ1x1+θ2x21+θ3x31

or
hθ(x)=θ0+θ1x1+θ2x1

Normal Equation


θ=(XTX)1XTy⃗ 

where
X=x(1)Tx(2)Tx(m)Tm×(n+1)

and
y=y(0)y(m)

differences

Gradient DescentNormal Equation
Need to choose alphaNo need to choose alpha
Needs many iterationsNo need to iterate
O(Kn2) O(n3) and need to calculate X’X
work well when n is largeSlow if n is very large
  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值