一些常用的回归模型

最新推荐文章于 2024-06-22 18:43:34 发布

littletomatodonkey

最新推荐文章于 2024-06-22 18:43:34 发布

阅读量2w

点赞数 2

分类专栏：机器学习 python相关文章标签：线性回归梯度下降正则化线性模型 Logistic回归

本文链接：https://blog.csdn.net/u012526003/article/details/79073401

版权

机器学习同时被 2 个专栏收录

24 篇文章 4 订阅

订阅专栏

python相关

24 篇文章 0 订阅

订阅专栏

线性回归模型

线性回归

线性回归在求解时，一般需要给所有样本添加一个常数项，作为回归模型的偏置
线性回归模型可以表述为
$y^= h θ (x) = θ T x$ $\hat y = {h_\theta }(x) = {\theta ^T}x$
该方程有封闭解，利用最小二乘法可以有
$θ^= (x T x) - 1 x \cdot y$ $\hat \theta = {({x^T}x)^{ - 1}}x \cdot y$

%matplotlib inline
import numpy as np
import matplotlib.pyplot as plt
# .T是转置，dot是矩阵乘法
X = 2*np.random.rand(100,1)
y = 4+3*X + np.random.randn(X.shape[0], X.shape[1])
X_b = np.c_[np.ones(X.shape), X]
theta_best = np.linalg.inv( X_b.T.dot( X_b ) ).dot( X_b.T ).dot(y)
print( theta_best )
X_new = np.array([[0], [2]])
X_new_b = np.c_[np.ones((2,1)), X_new]
y_predict = X_new_b.dot( theta_best )
plt.plot( X, y, 'r.' )
plt.plot( X_new, y_predict, 'b-' )
plt.show()

[[ 3.80963489]
 [ 3.17287426]]

这里写图片描述

sklearn中的线性模型也可以用于求解
这个线性模型在求解时，会自动给数据添加一列常数项作为偏置
对于线性模型来说，数据量过大时，求解过程会变得非常长，因为X矩阵变大了
在对线性模型进行预测的时候，预测的时间只与要预测的变量成正比

python from sklearn.linear_model import LinearRegression lin_reg = LinearRegression() lin_reg.fit( X, y ) b = lin_reg.intercept_ k = lin_reg.coef_ print(k, b) print( lin_reg.predict( X_new ) )

[[ 3.17287426]] [ 3.80963489]
[[ 3.80963489]
[ 10.15538341]]

梯度下降(Gradient Descend)

梯度下降的目标是最小化代价函数，当代价函数是凸函数的时候，梯度下降会有唯一解，权重更新的公式为
$\nabla θ M S E (θ) = 2 m X T (X θ - y)$ ${\nabla _\theta }MSE(\theta ) = \frac{2}{m}{X^T}(X\theta - y)$
$θ (n e x t_s t e p) = θ - η \nabla θ M S E (θ)$ ${\theta ^{(next\_step)}} = \theta - \eta {\nabla _\theta }MSE(\theta )$
其中 $\eta$ 是学习率
学习率太大，容易发生震荡，学习率太小，达到最优解的速度太慢，需要根据情况选择合适的学习率
如果使用上面的梯度下降方法(批量梯度下降BGD)进行训练，则每次训练的时候会使用全部的训练集数据，对于数据量很大的训练集来说，这种训练方法过于耗时，因此可以采用一种随机梯度下降(SGD)的方法，每次只采用一个训练样本进行梯度下降
如果代价函数有局部最小值，而不是全局最小值，则BGD一般难以跳出局部最优，而SGD常常可以跳出全局最优
BGD一般到达最优解之后很少跳动，而SGD一般在到达最优解之后仍然会小幅跳动，即它可以保证解较好，但是不一定是最优的
动态的学习率可以使的在迭代初期，模型快速收敛，在迭代的后期，模型可以在最优解附近以很小的步长寻找最优解
不同于以上只采用一个训练样本进行训练的SGD方法，**Mini-batch G**D是在每次迭代中随机选择部分的训练样本，这样的话可以通过使用GPU大大加快GD训练的速度，同时到达最优解的稳定性也比SGD要好

# BGD代码
eta = 0.1
n_iterations = 500
m = X_b.shape[0]
theta = np.random.randn( 2, n_iterations)

for iteration in range(1,n_iterations):
    gradients = 2/m * X_b.T.dot( X_b.dot(theta[:,iteration-1:iteration]) - y )
    theta[:,iteration:iteration+1] = theta[:,iteration-1:iteration] - eta * gradients
plt.plot( theta.T )
plt.show()

这里写图片描述

# 动态学习指数的SGD
n_epochs = 50
t0, t1 = 5, 50

all_theta = np.zeros((2, n_epochs))
theta = np.random.randn(2,1)

def learning_schedule(t):
    return t0 / ( t + t1 )

for epoch in range( n_epochs ):
    for i in range(m):
        random_index = np.random.randint(m)
        xi = X_b[random_index:random_index+1]
        yi = y[random_index:random_index+1]
        gradients = 2* xi.T.dot( xi.dot(theta) - yi )
        eta = learning_schedule(epoch*m+i)
        theta = theta - eta*gradients
    all_theta[:,epoch:epoch+1] = theta
print( theta )
plt.plot( all_theta.T )
plt.show()

[[ 4.09826145]
 [ 3.16957123]]

这里写图片描述

# sklearn自带的SGD回归
from sklearn.linear_model import SGDRegressor
sgd_reg = SGDRegressor(max_iter=50, penalty=None, eta0=0.1)
# SGD要求y是一个1d的矩阵
sgd_reg.fit( X, y.ravel() )
print( sgd_reg.intercept_ )
print( sgd_reg.coef_ )

[ 3.79193051]
[ 3.15907091]

多项式回归

上述所有内容均是真对线性模型，如果y与x有非常明显的非线性关系，则可能需要用到非线性回归，如多项式回归，通用多项式回归方程如下
$y = a n x n + a n - 1 x n - 1 + \dots + a 0$ $y = {a_{n}}{x^n} + {a_{n - 1}}{x^{n - 1}} + \cdots + {a_0}$
我们可以首先构建出 $\left[ {{x^n},{x^{n - 1}}, \cdots ,x} \right]$ ，然后进行线性回归即可
在进行多项式回归时，如果阶次选的太大，很容易发生过拟合现象，即训练误差远小于测试（或者验证）误差

from sklearn.preprocessing import PolynomialFeatures
m = 100
X = 6 * np.random.rand(m,1) - 3
y = 0.5 * X**2 + X + 2 + np.random.randn(m, 1)

poly_features = PolynomialFeatures( degree=2, include_bias=False )
X_poly = poly_features.fit_transform( X )
print( X[0], X_poly[0] )

lin_reg = LinearRegression()
lin_reg.fit( X_poly, y )
print( lin_reg.intercept_, lin_reg.coef_ )

# min是找出矩阵最小值，argmin是找出矩阵最小值所在的index
X_pred = np.linspace( np.min(X), np.max(X), 50 )
# coef所对应的阶数是从1往n排的
y_poly_pred = lin_reg.coef_[0,1] * X_pred**2 + lin_reg.coef_[0,0] * X_pred + lin_reg.intercept_

plt.plot( X, y, 'r+' )
plt.plot( X_pred, y_poly_pred, 'b-' )
plt.show()

[-0.59004635] [-0.59004635  0.3481547 ]
[ 2.32067214] [[ 1.02368889  0.42819181]]

这里写图片描述

from sklearn.metrics import mean_squared_error
from sklearn.model_selection import train_test_split

def plot_learning_curves(model, X, y):
    X_train, X_val, y_train, y_val = train_test_split( X, y, test_size=0.2 )
    train_errors, val_errors = [], []
    for m in range(1, len(X_train)):
        model.fit( X_train[:m], y_train[:m] )
        y_train_predict = model.predict( X_train[:m] )
        y_val_predict = model.predict( X_val )
        train_errors.append( mean_squared_error( y_train_predict, y_train[:m] ) )
        val_errors.append( mean_squared_error( y_val_predict, y_val ) )
    plt.plot( np.sqrt(train_errors), "r-+", label="train" )
    plt.plot( np.sqrt(val_errors), "b-", label="validation" )
    plt.legend()

lin_reg = LinearRegression()
plot_learning_curves( lin_reg, X, y )
plt.show()

这里写图片描述

from sklearn.pipeline import Pipeline
polynomial_reg = Pipeline( (("poly_features", PolynomialFeatures(degree=10, include_bias=False)),
                           ("sgd_reg", LinearRegression()), ))
plot_learning_curves( polynomial_reg, X, y )

这里写图片描述

过拟合与欠拟合

机器学习中模型的误差可以用偏差、方差或者不可约束的误差进行描述，如果一个模型的偏差(bias)很大，则一般认为该模型是欠拟合的；如果一个模型的方差(variance)很大，则一般认为这个模型是过拟合的；不可约束的误差来源于数据本身，减小它的方式只有数据清洗
增加模型的复杂度可以增大一个模型的方差，降低其偏差。因此一般需要对模型进行折衷考虑。

正则化(Regularized Lineral Models)

为了减小模型过拟合的程度，可以引入惩罚线性回归，即对线性模型的参数进行限制。这就是正则化线性模型，又称为惩罚线性模型。有4种常用的正则化模型
下面两种模型中的 $\alpha$ 都是惩罚项的系数，为0时，化简为单纯的线性回归模型；为 $\infty$ 时，表示模型只受到参数项的影响

岭回归

岭回归的代价函数是
$J (θ) = M S E (θ) + 1 2 α \sum i = 1 n θ 2 i$ $J(\theta ) = MSE(\theta ) + \frac{1}{2}\alpha \sum\limits_{i = 1}^n {\theta _i^2}$
封闭解为
$θ^= (X T X + α A) - 1 X T y$ $\hat \theta = {({X^T}X + \alpha A)^{ - 1}}{X^T}y$

Lasso回归

岭回归的代价函数是
$J (θ) = M S E (θ) + α \sum i = 1 n | θ i |$ $J(\theta ) = MSE(\theta ) + \alpha \sum\limits_{i = 1}^n {\left| {{\theta _i}} \right|}$

弹性网络(Elastic Net)

弹性网络是将岭回归与Lasso回归的代价函数进行加权平均，引入一个权重 $r$ ， $r=0$ 时，只有岭回归起作用； $r=1$ 时，只有Lasso回归起作用。
$J (θ) = M S E (θ) + α r \sum i = 1 n | θ i | 1 - r 2 α \sum i = 1 n θ 2 i$ $J(\theta ) = MSE(\theta ) + \alpha r\sum\limits_{i = 1}^n {\left| {{\theta _i}} \right|} \frac{{1 - r}}{2}\alpha \sum\limits_{i = 1}^n {\theta _i^2}$

Early Stopping

Early Stopping指的是迭代学习训练，如果GD，只要验证集误差到达极小值，立刻停止训练
SGDRegressor中的warm_start设置为true时，表示每次调用fit时，都会在之前的参数的基础上进行训练，而不是用随机初始参数进行训练

from sklearn.linear_model import Ridge
from sklearn.preprocessing import StandardScaler
# X = np.linspace(-3, 3, 20).reshape(20,1)
# y = 2 * X + 1 + np.random.randn(X.shape[0], 1)

plt.subplot(121)
plt.plot(X, y, 'r+')
for a in [0, 50, 100]:
    ridge_reg = Ridge( alpha=a, solver="cholesky" )
    ridge_reg.fit( X, y )
    x_pred = np.linspace( np.min(X), np.max(X), 50 ).reshape((50, 1))
    y_pred = ridge_reg.predict( x_pred )
    plt.plot( x_pred, y_pred, label="alpha = %d" % a )

plt.legend()
poly_10 = PolynomialFeatures(degree=10, include_bias=False)
plt.subplot(122)
plt.plot(X, y, 'r+')
for a in [0, 10, 1000]:
    ridge_reg = Ridge( alpha=a, solver="cholesky" )
    x_pred = np.linspace( np.min(X), np.max(X), 100 ).reshape((100, 1))
    model = Pipeline([
                ("poly_features", PolynomialFeatures(degree=10, include_bias=False)),
                ("std_scaler", StandardScaler()),
                ("regul_reg", ridge_reg),
            ])

    model.fit( X, y )
    y_pred = model.predict( x_pred )
    plt.plot( x_pred, y_pred, label="alpha = %d" % a )

plt.legend()
plt.show()

这里写图片描述

from sklearn.linear_model import Lasso
plt.subplot(121)
plt.plot(X, y, 'r+')
for a in [0, 50, 100]:
    ridge_reg = Ridge( alpha=a, solver="cholesky" )
    ridge_reg.fit( X, y )
    x_pred = np.linspace( np.min(X), np.max(X), 50 ).reshape((50, 1))
    y_pred = ridge_reg.predict( x_pred )
    plt.plot( x_pred, y_pred, label="alpha = %d" % a )

plt.legend()
poly_10 = PolynomialFeatures(degree=10, include_bias=False)
plt.subplot(122)
plt.plot(X, y, 'r+')
for a in [0, 0.1, 2]:
    lasso_reg = Lasso( alpha=a, tol=1e-4 )
    x_pred = np.linspace( np.min(X), np.max(X), 100 ).reshape((100, 1))
    model = Pipeline([
                ("poly_features", PolynomialFeatures(degree=10, include_bias=False)),
                ("std_scaler", StandardScaler()),
                ("regul_reg", lasso_reg),
            ])

    model.fit( X, y )
    y_pred = model.predict( x_pred )
    plt.plot( x_pred, y_pred, label="alpha = %d" % a )

plt.legend()
plt.show()

E:\InstallFolders\Anaconda342\lib\site-packages\sklearn\pipeline.py:250: UserWarning: With alpha=0, this algorithm does not converge well. You are advised to use the LinearRegression estimator
  self._final_estimator.fit(Xt, y, **fit_params)
E:\InstallFolders\Anaconda342\lib\site-packages\sklearn\linear_model\coordinate_descent.py:477: UserWarning: Coordinate descent with no regularization may lead to unexpected results and is discouraged.
  positive)
E:\InstallFolders\Anaconda342\lib\site-packages\sklearn\linear_model\coordinate_descent.py:491: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Fitting data with very small alpha may cause precision problems.
  ConvergenceWarning)

这里写图片描述

from sklearn.linear_model import ElasticNet
elastic_nmet = ElasticNet( alpha=0.1, l1_ratio=0.5 )
elastic_nmet.fit( X, y )
res = elastic_nmet.predict( [[1.5]] )
print( res )

[ 5.11098612]

from sklearn.base import clone
poly_scaler = Pipeline([
        ("poly_features", PolynomialFeatures(degree=90, include_bias=False)),
        ("std_scaler", StandardScaler()),
    ])

X_train_poly_scaled = poly_scaler.fit_transform(X_train)
X_val_poly_scaled = poly_scaler.fit_transform(X_val)
sgd_reg = SGDRegressor( max_iter=1, warm_start=True, penalty=None, learning_rate="constant", eta0=5e-4 )

minimum_val_error = float( "inf" )
best_epoch = None
best_model = None
all_errors = []
for epoch in range(1000):
    sgd_reg.fit( X_train_poly_scaled, y_train.ravel() )
    y_val_predict = sgd_reg.predict( X_val_poly_scaled )
    val_error = mean_squared_error( y_val_predict, y_val )
    all_errors.append( val_error )
    if val_error < minimum_val_error:
        minimum_val_error = val_error
        best_epoch = epoch
        best_model = clone( sgd_reg )

plt.plot( all_errors )
plt.show()

这里写图片描述

Logistic回归

Logisitic回归指的是将分类问题视为回归问题，即将样本属于某个类别的问题化为属于某个类别的概率问题，获得最大概率的类别就是样本所属的类别。

估计概率(Estimating Probabilities)

p^= h θ (x) = σ (θ T x)

$\hat p = {h_\theta }(x) = \sigma ({\theta ^T}x)$

σ(⋅) $\sigma (\cdot)$ 是sigmoid函数，取值在(0,1)之间

σ (t) = 1 1 + e - t

$\sigma (t) = \frac{1}{{1 + {e^{ - t}}}}$
Logistic回归模型预测

y^={0,ifp^<0.5,1,ifp^≥0.5.

$\hat y = \left\{ \begin{array}{l} 0,{\kern 8pt} if{\kern 4pt} \hat p < 0.5,\\ 1,{\kern 8pt} if{\kern 4pt} \hat p \ge 0.5. \end{array} \right.$
* Logistic回归模型中，一个样本(instance)的训练代价函数为

y^={−log(p^),ify=1,−log(1−p^),ify=0.

$\hat y = \left\{ \begin{array}{l} - \log (\hat p), {\kern 8pt} if{\kern 1pt} y = 1,\\ - log(1 - \hat p), if{\kern 1pt} y = 0. \end{array} \right.$
因此Logistic回归代价函数为

J (θ) = - 1 m \sum i = 1 m [y (i) l o g (p^(i)) + (1 - y (i)) l o g (1 - p^(i))]

$J(\theta ) = - \frac{1}{m}\sum\limits_{i = 1}^m {[{y^{(i)}}log({{\hat p}^{(i)}}) + (1 - {y^{(i)}})log(1 - {{\hat p}^{(i)}})]}$
利用 链式法则，对齐求偏导数，得到

\partial \partial θ j J (θ) = 1 m \sum i = 1 m (σ (θ T \cdot x (i)) - y (i)) x (i) j

$\frac{\partial }{{\partial {\theta _j}}}J(\theta ) = \frac{1}{m}\sum\limits_{i = 1}^m {(\sigma ({\theta ^T} \cdot {x^{(i)}}) - {y^{(i)}})x_j^{(i)}}$
因此利用GD方法就可以进行求解

决策边界(Decision Boundaries)

虽然Logistic回归无法通过选择阈值来直接输出预测值，但是我们可以首先返回概率值，再手动选择阈值即可

Softmax回归

Logistic回归常用于二分类的回归，当类别多于两个分类时(多分类问题)，模型称为Softmax回归或多项式Logisitic回归
对于样本 $x$ ，其关于类 $k$ 的得分为
$s k (x) = θ T k \cdot x$ ${s_k}(x) = \theta _k^T \cdot x$
将其归一化，得到样本 $k$ 的归一化得分为
$p^k = σ (s (x)) k = exp ( s k ( x ) ) \sum j = 1 K exp ( s j ( x ) )$ ${\hat p_k} = \sigma {(s(x))_k} = \frac{{\exp ({s_k}(x))}}{{\sum\limits_{j = 1}^K {\exp ({s_j}(x))} }}$
其中 $K$ 是类别的个数， $s(x)$ 是x关于每一类的得分向量， $\sigma (s(x))_k$ 是样本属于类别k的估计概率，最终最大概率所属的类别就是样本所属类别的预测结果
Softmax回归的代价函数为
$J (θ) = - 1 m \sum i = 1 m \sum k = 1 K y (i) k log (p^(i) k)$ $J(\theta ) = - \frac{1}{m}\sum\limits_{i = 1}^m {\sum\limits_{k = 1}^K {y_k^{(i)}\log (\hat p_k^{(i)})} }$
偏导数为
$\nabla θ k J (θ) = 1 m \sum i = 1 m (p (i) k - y (i) k) x (i)$ ${\nabla _{{\theta _k}}}J(\theta ) = \frac{1}{m}\sum\limits_{i = 1}^m {(p_k^{(i)} - y_k^{(i)})^{x(i)}}$
Softmax回归中，当solver设置为lbfgs时，表示采用权重衰减的Softmax回归
参考文献:http://deeplearning.stanford.edu/wiki/index.php/Softmax%E5%9B%9E%E5%BD%92
推导偏微分的参考链接：http://www.cnblogs.com/Deep-Learning/p/7073744.html
- 注：主要是估计概率的偏微分过程，将所有的 $y_j ^i$ ，即标签值，化为1和0，简化其中一项的微分过程。

from sklearn.linear_model import LogisticRegression
from sklearn import datasets
iris = datasets.load_iris()
# print( iris["DESCR"] )
X = iris["data"][:,3:] # petal width
y = (iris["target"] == 2).astype(np.int)  # binarize

log_reg = LogisticRegression()
log_reg.fit( X, y )
X_new = np.linspace( 0, 5, 1000 ).reshape(-1, 1)
y_proba = log_reg.predict_proba( X_new ) # 得到的是一个NX2的矩阵，每一行表示属于0和1的概率
plt.plot( X_new, y_proba[:,1], "g-", label="Iris-Virginica" ) 
plt.plot( X_new, y_proba[:,0], "b--", label="Not Iris-Virginica" )
plt.legend(loc="center left")
plt.show()

这里写图片描述

X = iris.data[:,(2,3)]
y = iris.target

softmax_reg = LogisticRegression( multi_class="multinomial", solver="lbfgs", C=10 )
softmax_reg.fit(X, y)

res = softmax_reg.predict( [[5, 2]] )
print(res)
res = softmax_reg.predict_proba( [[5, 2]] )
print(res)

[2]
[[  6.33134077e-07   5.75276067e-02   9.42471760e-01]]

littletomatodonkey

关注

2
点赞
踩
44

收藏

觉得还不错? 一键收藏
打赏
1
评论
一些常用的回归模型

线性回归模型线性回归线性回归在求解时，一般需要给所有样本添加一个常数项，作为回归模型的偏置线性回归模型可以表述为 y^=hθ(x)=θTx\hat y = {h_\theta }(x) = {\theta ^T}x 该方程有封闭解，利用最小二乘法可以有 θ^=(xTx)−1x⋅y\hat \theta = {({x^T}x)^{ - 1}}x \cdot y%ma
复制链接

扫一扫