Machine Learning - WEEK 1 2 3- 线性回归 、逻辑回归、梯度下降法及其优化算法、传统方法、 Octave 入门

WEEK 1、2、3

本文为个人笔记,只记了重要内容,不适合新手入手

线性回归

  • 样本 (x(i),y(i))i1,2,,m ( x ( i ) , y ( i ) ) , i ∈ 1 , 2 , … , m
  • x(i)=(x(i)1,x(i)2,,x(i)n) x ( i ) = ( x 1 ( i ) , x 2 ( i ) , … , x n ( i ) ) ,假设 x(i) x ( i ) 具有 n n 个特征
  • 假想函数(目标函数):
    hθ(x(i))=θ0+θ1x1(i)+θ2x2(i)++θnxn(i)
  • hθ(x(i)) h θ ( x ( i ) ) 表达式视具体情况而定
  • 线性回归线性 的含义是参数 θj θ j 都是一次的,而非 hθ h θ 的自变量,回归 应该是代价函数 J J 的自变量的回归
  • θ=(θ0,θ1,θ2,,θn) 称为参数,Machine Learning 的目的就是求出参数的合适取值使得 hθ(x(i)) h θ ( x ( i ) ) 更能体现 x(i)y(i) x ( i ) → y ( i ) 的映射关系
  • 为此我们提出了一个衡量参数取值好坏的函数——代价函数:
    J(θ)=12mi=1m(hθ(x(i))y(i))2 J ( θ ) = 1 2 m ∑ i = 1 m ( h θ ( x ( i ) ) − y ( i ) ) 2
  • 现在问题转变为了求使得代价函数 J(θ) J ( θ ) 最小时的参数 θ θ ,下面给出两种方法:

梯度下降法

时间复杂度 O(kn2) O ( k n 2 ) ,适合当 n > 10000 时 (k 为学习步数)

θj:=θjαJ(θ)θj θ j := θ j − α ∂ J ( θ ) ∂ θ j

  • α α 称学习速率(或步长),取值视情况而定,取值过大会导致不收敛;若当取值适当,在趋近极值时 J(θ)θj ∂ J ( θ ) ∂ θ j 也会变小,所以此时梯度下降法是收敛的,所以没必要担心 α α 在趋近极值点的时候不改变导致无法收敛。
  • 注意要先求出所有的 tempj=θjαJ(θ)θj t e m p j = θ j − α ∂ J ( θ ) ∂ θ j ,再对所有的 θj=tempj θ j = t e m p j

若取 x(i)0=1 x 0 ( i ) = 1 ,求偏导后可以写成:

θj:=θjαmi=1m[(hθ(x(i))y(i))x(i)j] θ j := θ j − α m ∑ i = 1 m [ ( h θ ( x ( i ) ) − y ( i ) ) ⋅ x j ( i ) ]

用矩阵来表述的话:

θ:=θαm[XT(Xθy)] θ := θ − α m [ X T ⋅ ( X ⋅ θ − y ) ]

  • θ(n+1)×1,Xm×(n+1),ym×1 θ ∈ R ( n + 1 ) × 1 , X ∈ R m × ( n + 1 ) , y ∈ R m × 1

特征优化:

尽量让 1<xi<1 − 1 < x i < 1

xi:=xiμisi x i := x i − μ i s i

  • μi μ i 为第 i i 个特征的平均取值,si 为第 i i 个特征的极差 (maxmin)

传统方法

时间复杂度 O(n3) O ( n 3 ) ,适合当 n < 10000 时

直接令 J(θ)θi=0 ∂ J ( θ ) ∂ θ i = 0 ,解得:

θ=(XTX)1XTy θ = ( X T X ) − 1 X T y

* θ(n+1)×1,Xm×(n+1),ym×1 θ ∈ R ( n + 1 ) × 1 , X ∈ R m × ( n + 1 ) , y ∈ R m × 1

code

featureNormalize

function [X_norm, mu, sigma] = featureNormalize(X)
mu = zeros(1, size(X, 2));
sigma = zeros(1, size(X, 2));
for iter = 1:size(X, 2)
    mu(iter) = mean(X(:, iter));
    sigma(iter) = std(X(:, iter));
    X_norm(:, iter) = (X(:, iter) - mu(iter)) / sigma(iter);
end
end

computeCostMulti

function J = computeCost(X, y, theta)
m = length(y); % number of training examples
J = 1 / (2 * m) * sum((X * theta - y) .^ 2);
end

gradientDescentMulti

function [theta, J_history] = gradientDescentMulti(X, y, theta, alpha, num_iters)
m = length(y); % number of training examples
J_history = zeros(num_iters, 1);
for iter = 1:num_iters
    theta -= alpha / m * X' * (X * theta - y);   
    J_history(iter) = computeCostMulti(X, y, theta);
end
end

normalEqn

function [theta] = normalEqn(X, y)
theta = zeros(size(X, 2), 1);
theta = pinv(X' * X) * X' * y;
end

Octave

https://www.gnu.org/software/octave/

Ubuntu Install:

sudo apt-add-repository ppa:octave/stable
sudo apt-get update
sudo apt-get install octave

Octave 入门:

内容比较多,推荐两篇文章

http://blog.csdn.net/weixin_36106941/article/details/64443944
https://www.cnblogs.com/leezx/p/5635056.html

逻辑回归

线性回归是用来预测某个点的取值,逻辑回归是预测某个点具有某种特征的概率

为了达到我们的目的,重现建立模型:

hθ(x)g(z)J(θ)=g(θTx)=11+ez=1mi=1mCost(hθ(x(i),y(i))) h θ ( x ) = g ( θ T x ) g ( z ) = 1 1 + e − z J ( θ ) = 1 m ∑ i = 1 m C o s t ( h θ ( x ( i ) , y ( i ) ) )

Cost(hθ(x,y))=log(hθ(x))log(1hθ(x))y=1y=0 C o s t ( h θ ( x , y ) ) = { − l o g ( h θ ( x ) ) y = 1 − l o g ( 1 − h θ ( x ) ) y = 0

Note: y=0 y = 0 or 1 1 always , and log is ln

可以写成:

Cost(hθ(x,y))=[ylog(hθ(x))+(1y)log(1hθ(x)]

对代价函数求偏倒后发现和线性回归代价函数求偏倒的结果形式上是完全一样的:

J(θ)θj=1mi=1m[(hθ(x(i))y(i))x(i)j] ∂ J ( θ ) ∂ θ j = 1 m ∑ i = 1 m [ ( h θ ( x ( i ) ) − y ( i ) ) ⋅ x j ( i ) ]

cost function code

function [J, grad] = costFunction(theta, X, y)
m = length(y); % number of training examples
tmp = sigmoid(X*theta);
J = -1 / m * (y'*log(tmp)+(1-y)'*log(1-tmp));
grad = 1 / m * X' * (sigmoid(X * theta) - y);
end

高级优化算法

Optimization algorithms:
- Gradient descent
- Conjugate gradient
- BFGS
- L-BFGS

Advantages:
- No need to manually pick α α
- Often faster than grdicent descent.

disadvantages:
- More complex

调用 Octave 优化算法 example :

[first] 定义 cost function:

costFunction.m

function [jval, gradient] = costFunction(theta, X, y)
    % jval := J(theta)
    % gradient := grad J(theta)

[then] 键入命令

options = optimset('GrandObj', 'on', 'MaxIter', '100');
initialTheta = zeros(n + 1, 1)
[optTheta, functionVal, exitFlag] = fminunc(@(t)costFunction(t, X, y), initialTheta, options);

正则化项

线性回归

为了防止 overfitting(过度拟合),对 cost function 引入了正则化项 λ2mni=1θ2i λ 2 m ∑ i = 1 n θ i 2

J(θ)=12m[i=1m(hθ(x(i))y(i))2+λi=1nθ2i] J ( θ ) = 1 2 m [ ∑ i = 1 m ( h θ ( x ( i ) ) − y ( i ) ) 2 + λ ∑ i = 1 n θ i 2 ]

求偏导:
J(θ)θj=1mi=1m(hθ(x(i))y(i))x(i)01m[i=1m(hθ(x(i))y(i))x(i)j+λθj]j=0j>0 ∂ J ( θ ) ∂ θ j = { 1 m ∑ i = 1 m ( h θ ( x ( i ) ) − y ( i ) ) x 0 ( i ) j = 0 1 m [ ∑ i = 1 m ( h θ ( x ( i ) ) − y ( i ) ) x j ( i ) + λ ⋅ θ j ] j > 0

注意!!! θ0 θ 0 不需要惩罚

梯度下降法

Repeat {

θ0θj:=θ0α1mi=1m(hθ(x(i))y(i))x(i)0:=θjα1m[i=1m(hθ(x(i))y(i))x(i)j+λθj] θ 0 := θ 0 − α 1 m ∑ i = 1 m ( h θ ( x ( i ) ) − y ( i ) ) x 0 ( i ) θ j := θ j − α 1 m [ ∑ i = 1 m ( h θ ( x ( i ) ) − y ( i ) ) x j ( i ) + λ ⋅ θ j ]

}

对上整理后:

θj:=θj(1αλm)α1mi=1m(hθ(x(i))y(i))x(i)j θ j := θ j ( 1 − α λ m ) − α 1 m ∑ i = 1 m ( h θ ( x ( i ) ) − y ( i ) ) x j ( i )

由于 1αλm<1 1 − α λ m < 1 ,可以将 θj(1αλm) θ j ( 1 − α λ m ) 写成 θj0.99 θ j ⋅ 0.99

常规方法

直接令 J(θ)θi=0 ∂ J ( θ ) ∂ θ i = 0 ,解得:

θ=(XTX+λ0000010000100001(n+1)×(n+1))1XTy θ = ( X T X + λ ⋅ [ 0 0 0 ⋯ 0 0 1 0 ⋯ 0 0 0 1 ⋯ 0 ⋮ ⋮ ⋮ ⋱ ⋮ 0 0 0 ⋯ 1 ] ⏟ ( n + 1 ) × ( n + 1 ) ) − 1 X T y

  • Suppose mn m ≤ n (m: examples , n : features)
  • λ>0 λ > 0 时,括号内的矩阵总是可逆的
  • θ(n+1)×1,Xm×(n+1),ym×1 θ ∈ R ( n + 1 ) × 1 , X ∈ R m × ( n + 1 ) , y ∈ R m × 1

逻辑回归

对 cost function 引入了正则化项 λ2mni=1θ2i λ 2 m ∑ i = 1 n θ i 2

J(θ)=1mi=1mCost(hθ(x(i),y(i)))+λ2mi=1nθ2i=1mi=1m[y(i)log(hθ(x(i)))+(1y(i))log(1hθ(x(i))]+λ2mi=1nθ2i J ( θ ) = 1 m ∑ i = 1 m C o s t ( h θ ( x ( i ) , y ( i ) ) ) + λ 2 m ∑ i = 1 n θ i 2 = − 1 m ∑ i = 1 m [ y ( i ) ⋅ l o g ( h θ ( x ( i ) ) ) + ( 1 − y ( i ) ) ⋅ l o g ( 1 − h θ ( x ( i ) ) ] + λ 2 m ∑ i = 1 n θ i 2

求偏导数后结果形式和线性回归是一样的:
J(θ)θj=1mi=1m(hθ(x(i))y(i))x(i)01m[i=1m(hθ(x(i))y(i))x(i)j+λθj]j=0j>0 ∂ J ( θ ) ∂ θ j = { 1 m ∑ i = 1 m ( h θ ( x ( i ) ) − y ( i ) ) x 0 ( i ) j = 0 1 m [ ∑ i = 1 m ( h θ ( x ( i ) ) − y ( i ) ) x j ( i ) + λ ⋅ θ j ] j > 0

注意!!! θ0 θ 0 不需要惩罚

cost function code

function [J, grad] = costFunctionReg(theta, X, y, lambda)
tmp = sigmoid(X*theta);
J = -1 / m * (y'*log(tmp)+(1-y)'*log(1-tmp)) + lambda / (2 * m) * sum(theta(2:size(theta, 1),1) .^ 2);
theta(1) = 0;
grad = 1 / m * (X' * (tmp - y) + lambda * theta);
end

https://www.coursera.org/learn/machine-learning
教学方: Andrew Ng, Co-founder, Coursera; Adjunct Professor, Stanford University; formerly head of Baidu AI Group/Google Brain

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值