机器学习基本概念解析

2 选择模型

评判好模型标准: 略
尽量减少误差,误差

2.1 Bias & Variance

  • bias:根据样本拟合出的模型的输出结果的期望与样本真实值的差距
  • variance: 描述的是样本上训练出来的模型在测试集的表现
  • 方差的定义:
    var ⁡ [ X ] = E [ ( X − μ ) 2 ] = E [ X 2 − 2 X μ + μ 2 ] = E ( X 2 ) − 2 μ 2 + μ 2 = E ( X 2 ) − μ 2 E [ X 2 ] = Var ⁡ [ X ] + ( E [ X ] ) 2 \operatorname{var}[X]=E\left[(X-\mu)^{2}\right]=E\left[X^{2}-2 X \mu+\mu^{2}\right]=E\left(X^{2}\right)-2 \mu^{2}+\mu^{2}=E\left(X^{2}\right)-\mu^{2}\\ E\left[X^{2}\right]=\operatorname{Var}[X]+(E[X])^{2} var[X]=E[(Xμ)2]=E[X22Xμ+μ2]=E(X2)2μ2+μ2=E(X2)μ2E[X2]=Var[X]+(E[X])2
  • 测试样本y的期望:
    E [ f ] = f y = f + ε E [ ε ] = 0 var ⁡ [ ε ] = σ 2 E [ y ] = E [ f + ε ] = f \begin{array}{c}{E[f]=f} \\ {y=f+\varepsilon} \\ {E[\varepsilon]=0} \\ {\operatorname{var}[\varepsilon]=\sigma^{2}} \\ {E[y]=E[f+\varepsilon]=f}\end{array} E[f]=fy=f+εE[ε]=0var[ε]=σ2E[y]=E[f+ε]=f

在这里插入图片描述
将系列02中的误差拆分为bias何variance。简单model(左边)是bias比较大造成的error,这种情况叫做 Underfitting(欠拟合),而复杂model(右边)是variance过大造成的error,这种情况叫做Overfitting(过拟合)。

2.2 Model Selection

在这里插入图片描述

  • Should NOT do: 直接在Testing Set验证Model效果。因为Testing Set有自己的bias,会导致效果变差。
Holdout Method
  • 是指将数据集 D 划分成两份互斥的数据集,一份作为训练集 S,一份作为测试集 T,在 S 上训练模型,在 T 上评估模型效果;
  • 尽量保证训练集 S 和测试集 T 的数据分布一致,避免由于数据划分引入额外的偏差而对最终结果产生影响.
N-fold Cross Validation

为了解决Validation Set的bias问题
在这里插入图片描述

3 优化方法

  • Vanilla Gradient descent
    w t + 1 ← w t − η t g t η t = η t + 1 g t = ∂ C ( θ t ) ∂ w w^{t+1} \leftarrow w^{t}-\eta^{t} g^{t}\\ \eta^{t}=\frac{\eta}{\sqrt{t+1}} \quad g^{t}=\frac{\partial C\left(\theta^{t}\right)}{\partial w} wt+1wtηtgtηt=t+1 ηgt=wC(θt)

  • Adagrad
    w t + 1 ← w t − η t σ t g t σ t = 1 t + 1 ∑ i = 0 t ( g i ) 2 } } w t + 1 ← w t − η ∑ i = 0 t ( g i ) 2 g t \left.\begin{array}{l}{w^{t+1} \leftarrow w^{t}-\frac{\eta^{t}}{\sigma^{t}} g^{t}} \\ {\sigma^{t}=\sqrt{\frac{1}{t+1} \sum_{i=0}^{t}\left(g^{i}\right)^{2}} \}}\end{array}\right\} w^{t+1} \leftarrow w^{t}-\frac{\eta}{\sqrt{\sum_{i=0}^{t}\left(g^{i}\right)^{2}}} g^{t} wt+1wtσtηtgtσt=t+11i=0t(gi)2 }}wt+1wti=0t(gi)2 ηgt
    σ t \sigma^{t} σt: root mean square of the previous derivatibes of parameter w

  • Gradient Descent
    θ i = θ i − 1 − η ∇ L ( θ i − 1 ) \theta^{i}=\theta^{i-1}-\eta \nabla L\left(\theta^{i-1}\right) θi=θi1ηL(θi1)

  • Stochastic Gradient Descent
    L n = ( y ^ n − ( b + ∑ w i x i n ) ) 2 θ i = θ i − 1 − η ∇ L ( θ i − 1 ) L^{n}=\left(\hat{y}^{n}-\left(b+\sum w_{i} x_{i}^{n}\right)\right)^{2}\\ \theta^{i}=\theta^{i-1}-\eta \nabla L\left(\theta^{i-1}\right) Ln=(y^n(b+wixin))2θi=θi1ηL(θi1)

  • Gradient descent: Feature Scaling
    在这里插入图片描述
    将特征缩放到差不多大

  • Steepest Gradient descent
    g ( t ) : = f ( x ( k ) + t d ( k ) )  over  t ≥ 0  Set  x ( k + 1 ) = x ( k ) + t k d ( k ) \begin{array}{c}{g(t) :=f\left(\mathbf{x}^{(k)}+t \mathbf{d}^{(k)}\right) \quad \text { over } \quad t \geq 0} \\ {\text { Set } \mathbf{x}^{(k+1)}=\mathbf{x}^{(k)}+t_{k} \mathbf{d}^{(k)}}\end{array} g(t):=f(x(k)+td(k)) over t0 Set x(k+1)=x(k)+tkd(k)

SGD MGD代码

# Stochastic Gradient Descent

n_epochs = 50
t0, t1 = 5, 50 # 学习超参数

def learning_schedule(t):
    return t0 / (t + t1)

theta = np.random.randn(2,1) # 随机初始化

for epoch in range(n_epochs):
    for i in range(m):
        random_index = np.random.randint(m)
        xi = X_b[random_index:random_index+1]
        yi = y[random_index:random_index+1]
        gradients = 2 * xi.T.dot(xi.dot(theta) - yi)
        eta = learning_schedule(epoch * m + i)
        theta = theta - eta * gradients
# Stochastic Gradient Descent with Scikit-Learn

from sklearn.linear_model import SGDRegressor
sgd_reg = SGDRegressor(max_iter=1000, tol=1e-3, penalty=None, eta0=0.1)
sgd_reg.fit(X, y.ravel())
# MBGD  小批量梯度下降法
import numpy as np
import random

def gen_line_data(sample_num = 100):
    """
    y = 3*x1 + 4*x2
    """
    x1 = np.linspace(0,9,sample_num)
    x2 = np.linspace(4,13,sample_num)
    x = np.concatenate(([x1],[x2]),axis = 0).T
    y = np.dot(x,np.array([3,4]).T)
    return x,y

def mbgd(samples, y,step_size = 0.01,max_iter_count=10000, batch_size=0.2):
    sample_num,dim = samples.shape
    y = y.flatten()
    w = np.ones((dim,),dtype=np.float32)
    loss = 10
    iter_count=0
    while loss > 0.001 and iter_count < max_iter_count:
        loss = 0
        error = np.zeros((dim,), dtype=np.float32)
        
        index = random.sample(range(sample_num), int(np.ceil(sample_num * batch_size)))
        batch_samples = samples[index]
        batch_y = y[index]
        
        for i in range(len(batch_samples)):
            predict_y = np.dot(w.T, batch_samples[i])
            for j in range(dim):
                error[j] += (batch_y[i] - predict_y)*batch_samples[i][j]
        for j in range(dim):
            w[j] += step_size * error[j]/sample_num
            
        for i in range(sample_num):
            predict_y = np.dot(w.T, samples[i])
            error = (1/(sample_num * dim))*np.power((predict_y - y[i]), 2)
            loss += error
            
        iter_count += 1
    return w
    
if __name__ == '__main__':
    samples, y = gen_line_data()
    w = mbgd(samples, y)
    print(w)

学习回归模型评价指标

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值