机器学习笔记-线性回归

机器学习笔记-线性回归

线性回归概述

概述

原理

X 1 X1 X1是年龄特征, X 2 X2 X2是业绩特征, Y Y Y是收入

年龄业绩收入
287009000
458008000
334006000
403504000
204036000
392508000

θ 0 \theta_0 θ0是偏置参数, θ 1 \theta_1 θ1是年龄参数, θ 2 \theta_2 θ2是成绩参数,找一个高维的的线/面拟合数据
h θ ( x ) = θ 0 + θ 1 x 1 + θ 2 x 2 h_\theta(x) = \theta_0 + \theta_1x_1 + \theta_2x_2\\ hθ(x)=θ0+θ1x1+θ2x2
转换为矩阵表达为:
h θ ( x ) = ∑ i = 0 m θ i x i = θ T x h_\theta(x) = \sum_{i=0}^m \theta_i x_i = \theta^T x hθ(x)=i=0mθixi=θTx
预测值与真实值之间的误差用 ϵ \epsilon ϵ表示:
y ( i ) = θ T x ( i ) + ϵ ( i ) y^{(i)} = \theta^T x^{(i)} + \epsilon^{(i)} y(i)=θTx(i)+ϵ(i)
误差 ϵ ( i ) \epsilon^{(i)} ϵ(i)是独立并且具有相同服从均值为0,方差为 θ 2 \theta^2 θ2的高斯分布,独立:甲和乙没有关系,不互相影响,同分布:甲和乙都是在这个公司工作,由于 ϵ ( i ) \epsilon^{(i)} ϵ(i)服从高斯分布,因此:
p ( ϵ ( i ) ) = 1 2 π σ e ( − ( ϵ ( i ) ) 2 2 σ 2 ) p \left( \epsilon^{(i)} \right) = \frac 1 { \sqrt{2 \pi}\sigma } e^{ \left( - \frac {\left( \epsilon^{(i)} \right)^2}{2\sigma^2}\right)} p(ϵ(i))=2π σ1e(2σ2(ϵ(i))2)
y ( i ) = θ T x ( i ) + ϵ ( i ) y^{(i)} = \theta^T x^{(i)} + \epsilon^{(i)} y(i)=θTx(i)+ϵ(i)代入上式:
p ( y ( i ) ∣ x ( i ) ; θ ) = 1 2 π σ e ( − ( y ( i ) − θ T x ( i ) ) 2 2 σ 2 ) p \left( y^{(i)} | x^{(i)}; \theta \right) = \frac 1 { \sqrt{2 \pi}\sigma } e^{ \left( - \frac {\left( y^{(i)} - \theta^T x^{(i)}\right)^2}{2\sigma^2}\right)} p(y(i)x(i);θ)=2π σ1e(2σ2(y(i)θTx(i))2)
什么样的参数和我们的数据组合恰好是真实值,用似然函数:
L ( θ ) = ∏ i = 1 m p ( y ( i ) ∣ x ( i ) ; θ ) = ∏ i = 1 m 1 2 π σ e ( − ( y ( i ) − θ T x ( i ) ) 2 2 σ 2 ) L(\theta) = \prod_{i=1}^{m} p \left( y^{(i)} | x^{(i)}; \theta \right) = \prod_{i=1}^{m} \frac 1 { \sqrt{2 \pi}\sigma } e^{ \left( - \frac {\left( y^{(i)} - \theta^T x^{(i)}\right)^2}{2\sigma^2}\right)} L(θ)=i=1mp(y(i)x(i);θ)=i=1m2π σ1e(2σ2(y(i)θTx(i))2)
为求解方便转化为对数似然函数:
l o g L ( θ ) = l o g ∏ i = 1 m 1 2 π σ e ( − ( y ( i ) − θ T x ( i ) ) 2 2 σ 2 ) = ∑ i = 0 m l o g ( 1 2 π σ e ( − ( y ( i ) − θ T x ( i ) ) 2 2 σ 2 ) ) = m l o g 1 2 π σ − 1 2 σ 2 ∑ i = 0 m ( y ( i ) − θ T x ( i ) ) 2 \begin{aligned} logL(\theta) &= log \prod_{i=1}^{m} \frac 1 { \sqrt{2 \pi}\sigma } e^{ \left( - \frac {\left( y^{(i)} - \theta^T x^{(i)}\right)^2}{2\sigma^2}\right)} \\ &=\sum_{i=0}^m log\left( \frac 1{ \sqrt{2 \pi}\sigma } e^{ \left( - \frac {\left( y^{(i)} - \theta^T x^{(i)}\right)^2}{2\sigma^2}\right)}\right)\\ &= m log \frac 1{ \sqrt{2 \pi}\sigma} - \frac1{2 \sigma^2} \sum_{i=0}^m ( y^{(i)} - \theta^T x^{(i)})^2 \end{aligned} logL(θ)=logi=1m2π σ1e(2σ2(y(i)θTx(i))2)=i=0mlog2π σ1e(2σ2(y(i)θTx(i))2)=mlog2π σ12σ21i=0m(y(i)θTx(i))2
转换为的目标函数:
J ( θ ) = 1 2 ∑ i = 0 m ( y ( i ) − θ T x ( i ) ) = 1 2 ∑ i = 0 m ( y i − h θ ( x i ) ) 2 \begin{aligned} J(\theta) &= \frac 12 \sum_{i=0}^m ( y^{(i)} - \theta^T x^{(i)})\\ &= \frac 12 \sum_{i=0}^m (y^{i} - h_\theta{(x^i)})^2 \end{aligned} J(θ)=21i=0m(y(i)θTx(i))=21i=0m(yihθ(xi))2

批量梯度下降:
∂ J ( θ ) ∂ ( θ ) = − ∑ i = 0 m ( y i − h θ ( x i ) ) x j i θ j ′ = θ j + ∑ i = 0 m ( y i − h θ ( x i ) ) x j i \begin{aligned} \frac {\partial J (\theta)}{\partial (\theta)} &= - \sum_{i=0}^m (y^{i} - h_\theta{(x^i)})x_j^i\\ \theta '_j&= \theta _j+ \sum_{i=0}^m (y^{i} - h_\theta{(x^i)})x_j^i \end{aligned} (θ)J(θ)θj=i=0m(yihθ(xi))xji=θj+i=0m(yihθ(xi))xji
批量梯度下降容易得到最优解,但每次考虑所有样本,速度慢
随机梯度下降:
θ j ′ = θ j + ( y i − h θ ( x i ) ) x j i \theta '_j= \theta _j+ (y^{i} - h_\theta{(x^i)})x_j^i θj=θj+(yihθ(xi))xji
随机梯度下降每次找一个样本,迭代速度快,但每次不一定朝着收敛的方向
小批量梯度下降:
θ j ′ = θ j − α 1 10 ∑ k = i i + 9 ( y ( k ) − h θ ( x ( k ) ) ) x j k \theta '_j= \theta _j- \alpha \frac {1}{10} \sum_{k=i}^{i+9} (y^{(k)} - h_\theta{(x^{(k)})})x_j^{k} θj=θjα101k=ii+9(y(k)hθ(x(k)))xjk
每次更新选择一小部分数据来算,实用

  • 学习率:尽量选择小一些的学习率
  • 批处理数量:32,64,128都课可以
  • 评估方法 R 2 R^2 R2 R 2 R^2 R2取值越接近于1,模型拟合越好:
    1 − ∑ i = 0 m ( y ^ i − y i ) 2 ∑ i = 0 m ( y i − y ‾ i ) 2 1-\frac { \sum_{i=0}^m(\hat y_i - y_i)^2 }{\sum_{i=0}^m(y_i-\overline y_i)^2} 1i=0m(yiyi)2i=0m(y^iyi)2

实现代码

class linear(object):
    def __init__(self):
        self.W = None
        self.b = None
    
    def loss(self,X,y):
        num_feature  = X.shape[1]
        num_train = X.shape[0]
        # 2.1 计算当前权重及偏置下预测值
        h = X.dot(self.W) + self.b
        # 2.2 计算损失值
        loss = 0.5 *np.sum(np.square(h - y)) / num_train
        # 2.3 计算当前梯度
        dW = X.T.dot((h-y)) / num_train
        db = np.sum((h-y)) / num_train
        
        return loss,dW,db
        
    def train(self,X,y,learn_rate = 0.001,iters = 10000):
        num_feature = X.shape[1]
        # 1.初始化权重参数
        self.W = np.zeros((num_feature,1))
        # 1.初始化偏置参数
        self.b = 0
        loss_list = []
        
        for i in xrange(iters):
            # 2.计算损失值
            loss,dW,db = self.loss(X,y)
            loss_list.append(loss)
            # 3.更新权重与偏置参数
            self.W += -learn_rate*dW
            self.b += -learn_rate*db
            
            if i%500 == 0:
                print 'iters = %d,loss = %f' % (i,loss)
        return loss_list
        
    def predict(self,X_test):
        y_pred = X.dot(self.W) + self.b
        return y_pred
    
    pass
  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值