读书笔记 - 机器学习实战 - 8 数值预测 - 回归

8 数值预测 - 回归(Predicting numeric values regression)

8.1 利用线性逻辑回归计算最佳匹配边界(Finding best-fit lines with linear regression)

支持向量机(support vector machines):

优点:结果容易解释,计算成本低

缺点:处理线性不可分数据表现差

适用范围:数值和标称值

回归(regression):计算回归权值的过程

回归步骤:

  1. 收集数据
  2. 准备:数值型数据,标称值需映射为二进制数值
  3. 分析:
  4. 训练:计算回归权值
  5. 测试:计算 L 2 L_2 L2范数或预测值与数据的相关性,用以评估
  6. 使用:

样本数据:
X = [ x i T ] T \mathbf{X} = \left[ \mathbf{x}_i^{\mathrm{T}} \right]^{\mathrm{T}} X=[xiT]T
x i T = [ x i , 0 , x i , 1 , x i , 2 , ⋯   , x i , n ] \mathbf{x}_i^{\mathrm{T}} = \left[ x_{i, 0}, x_{i, 1}, x_{i, 2}, \cdots, x_{i, n} \right] xiT=[xi,0,xi,1,xi,2,,xi,n]

权值向量:
w T = [ w 0 , w 1 , w 2 , ⋯   , w n ] \mathbf{w}^{\mathrm{T}} = \left[ w_0, w_1, w_2, \cdots, w_n \right] wT=[w0,w1,w2,,wn]

标签向量:
y T = [ y 1 , y 2 , ⋯   , y m ] \mathbf{y}^{\mathrm{T}} = \left[ y_1, y_2, \cdots, y_m \right] yT=[y1,y2,,ym]

平方误差损失函数:
L = ∑ i = 1 m ( y i − x i T w ) 2 = ( y − X w ) T ( y − X w ) {L} = \sum_{i = 1}^{m} \left( y_i - \mathbf{x}^{\mathrm{T}}_i \mathbf{w} \right)^2 = \left( \mathbf{y} - \mathbf{X} \mathbf{w} \right)^{\mathrm{T}} \left( \mathbf{y} - \mathbf{X} \mathbf{w} \right) L=i=1m(yixiTw)2=(yXw)T(yXw)

平方误差损失函数关于 w \mathbf{w} w的偏导数(partial derivative):

∂ L ∂ w = − 2 ( y − X w ) T X \frac{\partial {L}}{\partial \mathbf{w}} = -2 \left( \mathbf{y} - \mathbf{X} \mathbf{w} \right)^{\mathrm{T}} \mathbf{X} wL=2(yXw)TX

∂ L ∂ w = 0 \frac{\partial {L}}{\partial \mathbf{w}} = 0 wL=0,则 ( y − X w ) T = 0 \left( \mathbf{y} - \mathbf{X} \mathbf{w} \right)^{\mathrm{T}} = 0 (yXw)T=0

通常样本数远大于样本特征维度, m ≫ n m \gg n mn X \mathbf{X} X满足列满秩条件,则 X \mathbf{X} X存在左伪逆矩阵 ( X T X ) − 1 X T \left( \mathbf{X}^{\mathrm{T}} \mathbf{X} \right)^{-1} \mathbf{X}^{\mathrm{T}} (XTX)1XT,故有:

w ∗ = ( X T X ) − 1 X T y \mathbf{w}^{*} = \left( \mathbf{X}^{\mathrm{T}} \mathbf{X} \right)^{-1} \mathbf{X}^{\mathrm{T}} \mathbf{y} w=(XTX)1XTy

∗ * 表示最优解。实践中,应确认 ( X T X ) − 1 \left( \mathbf{X}^{\mathrm{T}} \mathbf{X} \right)^{-1} (XTX)1是否存在

%matplotlib inline
import matplotlib.pyplot as plt
import numpy as np

X = []
with open("ex0.txt", 'r') as fr:
    
    for line in fr.readlines():
        
        X.append([float(item)
                  for item in line.strip().split("\t")])
        
X = np.array(X)
fig = plt.figure()
ax = fig.add_subplot(111)
ax.scatter(X[:, 1], X[:, 2], c="red", marker="o", edgecolors="black")
ax.set_xlabel("$x$")
ax.set_ylabel("$y$")
plt.show()

在这里插入图片描述

  • OLS(ordinary least squares)求解回归问题:
# Listing 8.1 Standard regression function and data-importing functions

import numpy as np

def loadDataSet(fileName):
    
    dataMat = []
    labelMat = []
    
    with open(fileName, 'r') as fr:
        
        for line in fr.readlines():
            
            curLine = line.strip().split('\t')
            lineArr = [float(item) for item in curLine]
            dataMat.append(lineArr[: -1])
            labelMat.append(lineArr[-1])
            
    return dataMat, labelMat

def standRegres(xArr, yArr):
    
    xMat = np.matrix(xArr)
    yMat = np.matrix(yArr).T
    xTx = np.dot(xMat.T, xMat)
    
    if np.linalg.det(xTx) == 0:
        print("This matrix is singular, cannot do inverse")
        return None
    
    ws = xTx.I * np.dot(xMat.T, yMat)
    # ws = np.linalg.solve(xTx, np.dot(xMat.T, yMat))
    
    return ws

import numpy as np

xArr, yArr = loadDataSet("ex0.txt")

print("X: {}".format(xArr[0: 2]))

ws = standRegres(xArr, yArr)
print("weights: {}".format(ws))
X: [[1.0, 0.067732], [1.0, 0.42781]]
weights: [[3.00774324]
 [1.69532264]]
xMat = np.matrix(xArr)
yMat = np.matrix(yArr)

yHat = xMat * ws
print("y hat: {}".format(yHat.T))
y hat: [[3.12257084 3.73301922 4.69582855 4.25946098 4.67099547 3.89977285
  3.65007895 3.06514517 3.23286683 3.24221654 3.42785945 4.10677005
  4.24737842 3.40925159 4.6518468  4.03763819 3.61572324 3.25838991
  4.08905393 3.39829811 3.12681762 3.12159094 4.57689489 4.22453225
  3.03380205 3.575795   3.07638008 3.36812645 4.05412689 3.92524508
  3.08463121 3.22264573 3.95116656 4.53547307 4.63266931 3.47321272
  3.67637171 4.48687834 3.70271024 4.54873897 3.29055527 4.13515992
  3.4548456  3.39972557 3.9451346  3.08868473 3.62780072 3.84693134
  3.84432733 4.35061    4.31256527 3.43466109 3.31620889 4.37786401
  3.57417766 3.58183035 3.77501915 4.18560255 4.00951278 3.52978055
  3.25926809 4.1981649  4.23099312 4.13833018 3.2195416  4.0559375
  3.65380697 3.66153255 4.32530562 3.20455155 3.20353097 4.13260677
  3.43985386 4.69270407 3.50859919 4.62257705 3.52828697 3.47779009
  3.48090949 3.30565212 3.27286288 4.25069447 4.44561758 4.31514894
  3.73587075 3.28268389 4.56246261 3.54184447 3.34930498 3.51131849
  3.57846852 4.13701291 3.99762179 3.15189822 4.17244007 4.61771487
  3.32970536 4.43864133 3.57277394 3.73170535 3.79838917 3.94128791
  3.47940404 4.56535313 4.51074001 3.97320914 3.12027706 3.57694952
  3.73141375 3.6786604  4.18467182 3.47015945 3.86132972 3.1215316
  3.07271818 4.02344833 4.61330533 4.14577773 3.74488818 3.87130839
  4.27828923 3.10609569 3.82654678 3.79907916 3.61427374 4.61681295
  3.71523529 4.56822162 3.27294765 4.1209972  4.05277064 4.45457905
  3.23372127 4.1826137  3.31027187 3.34600419 3.13188155 3.49347861
  3.31633604 4.06142526 3.5919243  3.68342425 3.82212368 3.26772097
  3.29907766 4.47772699 3.38724969 3.90338389 4.5169059  3.1880442
  4.56385616 3.03292726 3.97014739 3.12447299 4.03092979 3.90973965
  4.11201199 4.53561039 3.35416039 4.18782512 4.64190543 4.67151593
  4.68257113 4.68771813 4.25553631 3.43763299 4.15161473 4.60027508
  3.4068544  3.195112   3.37736257 4.48229758 3.3414336  4.6321641
  4.6567378  4.27210978 3.76330447 3.80548918 3.93887547 3.5768478
  4.5076613  4.56560234 3.75676222 3.20541447 3.61682859 4.38911078
  3.66404502 4.67296204 4.33158679 4.11405146 4.4827248  4.03299809
  3.44040992 4.24625442 3.84494273 3.44346998 4.42119307 3.12657688
  3.90129695 3.20467701]]
fig = plt.figure()
ax = fig.add_subplot(111)
ax.scatter(xMat[:, 1].A, yMat.A)
ax.set_xlabel("$x$")
ax.set_ylabel("$y$")

xCopy = xMat.copy()
xCopy.sort(0)
yHat = np.dot(xCopy, ws)
ax.plot(xCopy[:, 1], yHat)

plt.show()

在这里插入图片描述

yHat = np.dot(xMat, ws)
print(yHat.shape)
print(yMat.shape)
print(np.corrcoef(yHat.T, yMat))
(200, 1)
(1, 200)
[[1.         0.98647356]
 [0.98647356 1.        ]]

8.2 局部加权线性回归(Locally weighted linear regression)

在最小均方误差(lowest mean-squared error)意义下,线性回归实现对统计量的无偏估计,但线性回归倾向于欠拟合(underfit)。通过增加偏置项,线性回归能够进一步减小均方误差。

*局部加权线性回归(Locally weighted linear regression,LWLR)*通过对感兴趣数据点附近的数据赋予权值,减少均方误差:

w ∗ = ( X T W X ) − 1 X T W y \mathbf{w}^{*} = \left( \mathbf{X}^{\mathrm{T}} \mathbf{W} \mathbf{X} \right)^{-1} \mathbf{X}^{\mathrm{T}} \mathbf{W} \mathbf{y} w=(XTWX)1XTWy

其中, W \mathbf{W} W为数据点的权值矩阵。LWLR通过类似“核”技巧的方式,使邻近数据点(nearby points)相比其它数据点或的更大的权重。常用核为高斯(Gaussian)核函数:

w ( i , i ) = exp ⁡ ( − ∥ x ( i ) − x ∥ 2 2 2 k 2 ) w(i, i) = \exp \left( - \frac{\left\| \mathbf{x}^{(i)} - \mathbf{x} \right\|_2^2}{2 k^2} \right) w(i,i)=exp(2k2x(i)x22)

该方程定义了一个对角(diagonal)权值矩阵 W \mathbf{W} W,数据点 x ( i ) \mathbf{x}^{(i)} x(i) x \mathbf{x} x越近,其权值 w ( i , i ) w(i, i) w(i,i)越大。预定义的参数 k k k用于指定 x \mathbf{x} x的影响范围(权值衰减速度)。

fig = plt.figure()
ax = fig.add_subplot(411)
ax.scatter(xMat[:, 1].A, yMat.A)

x = np.linspace(0, 1)

y = np.exp(- (x - 0.5) ** 2 / 2 / 0.5 **2)
ax = fig.add_subplot(412)
ax.plot(x, y, label="$k = 0.5$")
ax.legend()

y = np.exp(- (x - 0.5) ** 2 / 2 / 0.1 **2)
ax = fig.add_subplot(413)
ax.plot(x, y, label="$k = 0.1$")
ax.legend()

y = np.exp(- (x - 0.5) ** 2 / 2 / 0.01 **2)
ax = fig.add_subplot(414)
ax.plot(x, y, label="$k = 0.01$")
ax.legend()

plt.show()

在这里插入图片描述

# Listing 8.2 Locally weighted linear regression function

def lwlr(testPoint, xArr, yArr, k=1.0):
    
    xMat = np.matrix(xArr)
    yMat = np.matrix(yArr).T
    m = xMat.shape[0]
    # create diagonal matrix
    weights = np.matrix(np.eye(m))
    
    for j in range(m):
        # populate weights with exponentially decaying values
        diffMat = testPoint - xMat[j, :]
        weights[j, j] = np.exp(
            - np.dot(diffMat, diffMat.T) / 2 / k**2)
        
    xTx = xMat.T * weights * xMat
    if np.linalg.det(xTx) == 0:
        print("this matrix is singular, cannot do inverse")
        return None
    
    ws = xTx.I * xMat.T * weights * yMat
    return ws

def lwlrTest(testArr, xArr, yArr, k=1.0):
    
    m = np.shape(testArr)[0]
    yHat = np.zeros(m)
    for i in range(m):
        yHat[i] = np.dot(testArr[i], lwlr(testArr[i], xArr, yArr, k))
    return yHat

xArr, yArr = loadDataSet("ex0.txt")

print(lwlr(xArr[0], xArr, yArr, 1.0))
# 单点估计
print("y_0: {}".format(yArr[0]))
print("y_hat_0: {}".format(
    np.dot(xArr[0], lwlr(xArr[0], xArr, yArr, 1.0))))

yHat = lwlrTest(xArr, xArr, yArr, 1)
xMat = np.matrix(xArr)
srtInd = xMat[:, 1].argsort(0)
xSort = xMat[srtInd][:, 0, :]

fig = plt.figure(figsize=(5, 8))
ax = fig.add_subplot(311)
ax.plot(xSort[:, 1], yHat[srtInd])
ax.scatter(xMat[:, 1].A, yArr, s=2, c="red")

yHat = lwlrTest(xArr, xArr, yArr, 0.01)
ax = fig.add_subplot(312)
ax.plot(xSort[:, 1], yHat[srtInd])
ax.scatter(xMat[:, 1].A, yArr, s=2, c="red")

yHat = lwlrTest(xArr, xArr, yArr, 0.003)
ax = fig.add_subplot(313)
ax.plot(xSort[:, 1], yHat[srtInd])
ax.scatter(xMat[:, 1].A, yArr, s=2, c="red")
plt.show()
[[3.00714496]
 [1.69638809]]
y_0: 3.176513
y_hat_0: [[3.12204471]]

在这里插入图片描述

8.3 例:预测鲍鱼年龄(Example: predicting the age of an abalone)

def rssError(yArr, yHatArr):
    return ((yArr - yHatArr)**2).sum()

abX, abY = loadDataSet("abalone.txt")
yHat01 = lwlrTest(abX[0 : 99], abX[0 : 99], abY[0 : 99], 0.1)
yHat1 = lwlrTest(abX[0 : 99], abX[0 : 99], abY[0 : 99], 1)
yHat10 = lwlrTest(abX[0 : 99], abX[0 : 99], abY[0 : 99], 10)

print(rssError(abY[0 : 99], yHat01.T))
print(rssError(abY[0 : 99], yHat1.T))
print(rssError(abY[0 : 99], yHat10.T))

56.81549669032892
429.8905618702056
549.1181708826584
  • k k k值过小时将导致过拟合
abX, abY = loadDataSet("abalone.txt")
abY = np.array(abY)
yHat01 = lwlrTest(abX[100 : 199], abX[0 : 99], abY[0 : 99], 0.1)
yHat1 = lwlrTest(abX[100 : 199], abX[0 : 99], abY[0 : 99], 1)
yHat10 = lwlrTest(abX[100 : 199], abX[0 : 99], abY[0 : 99], 10)

print(rssError(abY[100 : 199], yHat01.T))
print(rssError(abY[100 : 199], yHat1.T))
print(rssError(abY[100 : 199], yHat10.T))

40659.27596541478
573.5261441898057
517.5711905382693
  • 与常规线性回归比较
ws = standRegres(abX[0 : 99], abY[0 : 99])
yHat = np.dot(np.matrix(abX[100 : 199]), ws)
print(rssError(abY[100 : 199], yHat.A.T))
518.6363153249081

此例中,常规线性回归与局部加权线性回归性能几乎相同。

8.4 系数收缩(Shrinking coefficients to understand our data)

当特征数 n n n大于样本数 m m m时, ( X T X ) − 1 \left( \mathbf{X}^{\mathrm{T}} \mathbf{X} \right)^{-1} (XTX)1不存在,统计学家通过引入岭回归(ridge regression)进行系数收缩来解决这一问题。还有一种系数收缩法是套索回归(lasso regression),但套索回归计算困难,可通过前向分段回归(forward stagewise regression)近似。

8.4.1 岭回归

*岭回归(ridge regression)*向矩阵 X T X \mathbf{X}^{\mathrm{T}} \mathbf{X} XTX中添加附加项 λ I \lambda \mathbf{I} λI,使其变为非奇异矩阵:

X T X + λ I \mathbf{X}^{\mathrm{T}} \mathbf{X} + \lambda \mathbf{I} XTX+λI

其中, I \mathbf{I} I m × m m \times m m×m维单位方阵, λ \lambda λ为预定义标量。此时权值向量表示为:

w ∗ = ( X T X + λ I ) − 1 X T y \mathbf{w}^{\ast} = \left( \mathbf{X}^{\mathrm{T}} \mathbf{X} + \lambda \mathbf{I} \right)^{-1} \mathbf{X}^{\mathrm{T}} \mathbf{y} w=(XTX+λI)1XTy

岭回归能够抑制权值向量和的最大值,防止过拟合。统计学中,通过引入惩罚项,减小非重要参数的方法,称为收缩(shrinkage)

# Listing 8.3 Ridge regression

def ridgeRegres(xMat, yMat, lam=0.2):
    
    m, n = xMat.shape
    xTx = np.dot(xMat.T, xMat)
    denom = xTx + lam * np.eye(n)
    
    if np.linalg.det(denom) == 0:
        print("This matrix is singular, can not do inverse")
        return None
    
    ws = denom.I * np.dot(xMat.T, yMat)
    return ws

def ridgeTest(xArr, yArr):
    
    xMat = np.matrix(xArr)
    yMat = np.matrix(yArr).T
    
    m, n = xMat.shape
    yMat = yMat - yMat.mean()
    xMat = (xMat - xMat.mean(axis=0)) / xMat.var(axis=0)
    
    numTestPts = 30
    wMat = np.zeros((numTestPts, n))
    for i in range(numTestPts):
        ws = ridgeRegres(xMat, yMat, np.exp(i - 10))
        wMat[i, :] = ws.T
        
    return wMat
abX, abY = loadDataSet("abalone.txt")

ridgeWeights = ridgeTest(abX, abY)
log_lam = [item - 10 for item in range(30)]

fig = plt.figure()
ax = fig.add_subplot(111)
ax.plot(log_lam, ridgeWeights)
ax.set_xlabel("log($\\lambda$)")
plt.show()

在这里插入图片描述

8.4.2 套索

岭回归与一般最小平方误差回归的区别在于引入限制条件:

∑ k = 1 n w k 2 ≤ λ \sum_{k = 1}^{n} w_{k}^{2} \leq \lambda k=1nwk2λ

即系数平方和不大于 λ \lambda λ。当两个或两个以上的特征相关时,一般最小平方误差回归可能导致某个特征的权值为很大的正数,与其相关的特征权值为很大的负数,而岭回归可以避免这个问题。

套索(lasso)引入的限制条件为:

∑ k = 1 n ∣ w k ∣ ≤ λ \sum_{k = 1}^{n} \left| w_{k} \right| \leq \lambda k=1nwkλ

套索回归限制条件使得权值向量稀疏化,有利于理解数据。套索回归求解困难,可用前向分段回归近似。

8.4.3 前向分段回归

前向分段回归(forward stagewise regression)是一种贪婪算法(greedy algorithm)。每次更新时,前向分段回归都试图将当前误差最小化。权值向量初始化为零,

  • 前向分段回归伪代码:
1. 数据标准化

2. FOR迭代:

3.   SET lowestError = inf

4.   FOR 遍历特征:

5.     FOR 增加、减小特征权值:

6.       修改该特征的权值,得到新的权值向量W

7.       计算新特征向量的误差Error

8.       IF Error < lowestERROR:

9.         SET Wbest = W

10.    更新:SET W = Wbest

# Listing 8.4 Forward stagewise linear regression

def stageWise(xArr, yArr, eps=0.01, numIt=100):
    
    xMat = np.matrix(xArr)
    yMat = np.matrix(yArr).T
    m, n = xMat.shape
    
    xMat = (xMat - xMat.mean(axis=0)) / xMat.var(axis=0)
    yMat -= yMat.mean()
    
    returnMat = np.zeros((numIt, n))
    ws = np.zeros((n, 1))
    wsMax = ws.copy()
    
    for i in range(numIt):
        # print(ws.T)
        lowestError = np.inf
        
        for j in range(n):
            
            for sign in [-1, 1]:
                
                wsTest = ws.copy()
                wsTest[j] += eps * sign
                yTest = np.dot(xMat, wsTest)
                rssE = rssError(yMat.A, yTest.A)
                
                if rssE < lowestError:
                    lowestError = rssE
                    wsMax = wsTest
                    
        ws = wsMax.copy()
        returnMat[i, :] = ws.T
        
    return returnMat
        

前向分段回归的原理是坐标提升算法(coordinate ascent),每次迭代( i i i)中,仅更新一个使误差减小最大的特征的权值。

xArr, yArr = loadDataSet("abalone.txt")
stageWise(xArr, yArr, 0.01, 200)
array([[ 0.  ,  0.  ,  0.  , ...,  0.  ,  0.  ,  0.  ],
       [ 0.  ,  0.  ,  0.  , ...,  0.  ,  0.  ,  0.  ],
       [ 0.  ,  0.  ,  0.  , ...,  0.  ,  0.  ,  0.  ],
       ...,
       [ 0.05,  0.  ,  0.09, ..., -0.64,  0.  ,  0.36],
       [ 0.04,  0.  ,  0.09, ..., -0.64,  0.  ,  0.36],
       [ 0.05,  0.  ,  0.09, ..., -0.64,  0.  ,  0.36]])
stageWise(xArr, yArr, 0.001, 5000)
array([[ 0.   ,  0.   ,  0.   , ...,  0.   ,  0.   ,  0.   ],
       [ 0.   ,  0.   ,  0.   , ...,  0.   ,  0.   ,  0.   ],
       [ 0.   ,  0.   ,  0.   , ...,  0.   ,  0.   ,  0.   ],
       ...,
       [ 0.043, -0.011,  0.12 , ..., -0.963, -0.105,  0.187],
       [ 0.044, -0.011,  0.12 , ..., -0.963, -0.105,  0.187],
       [ 0.043, -0.011,  0.12 , ..., -0.963, -0.105,  0.187]])
wMat = stageWise(xArr, yArr, 0.005, 1000)

fig = plt.figure()
ax = fig.add_subplot(111)
ax.plot(list(range(1000)), wMat)
plt.show()

在这里插入图片描述

xMat = np.matrix(xArr)
yMat = np.matrix(yArr).T

xMat = (xMat - xMat.mean(axis=0)) / xMat.var(axis=0)
yMat -= yMat.mean(axis=0)

weights = standRegres(xMat, yMat.T)

print(weights.T)

[[ 0.0430442  -0.02274163  0.13214087  0.02075182  2.22403814 -0.99895312
  -0.11725427  0.16622915]]

收缩(shrinkage)法相当于通过增加模型的偏置(bias)换取模型方差(variance)的减小。

8.5 平衡偏置与方差(The bias/variance tradeoff)

通常认为误差由偏置(bias)、variance(方差)和随机噪声(random noise)三部分组成。

局部加权线性回归增加了模型的复杂度,因此增加了模型的方差;系数收缩减小了模型的复杂度,因此增加了模型的偏置。

模型方差的测量:从训练数据集中随机抽取两个子集,用两个子集分别训练同一模型,得到两组模型参数,通过比较两组模型参数的差异总量即可得到模型的方差。

8.6 例:乐高套装积木价格预测(Example: forecasting the price of LEGO sets)

8.7 总结(Summary)

  • 1
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值