线性回归算法

小黑student

已于 2023-07-29 12:00:10 修改

阅读量313

点赞数

文章标签：线性回归机器学习算法

于 2023-07-29 00:19:36 首次发布

本文链接：https://blog.csdn.net/m0_69321141/article/details/130233767

版权

线性回归

评估方法

R-square(决定系数)

Adjusted R-square(调整的R方)

线性回归

线性回归属于回归问题，是监督学习的一种，简单来说，给定一个带有标签值的训练样本(可以包含多个属性值)，通过构造一个函数，拟合样本点构造出一条直线（曲线），从而对未知的样本点进行预测。在介绍两种方法以前，先介绍几种回归分析评估方法。

评估方法

R-square(决定系数)

总平方和 $SST=\sum{(y_i - \bar{y})^2}$

回归平方和 $SSR=\sum{(\hat{y_i}-\bar{y})^2}$

残差平方和 $SSE=\sum{(y_i-\hat{y_i})^2}$

$R=\frac{SSR}{SST}$

Adjusted R-square(调整的R方)

$R^2-adjusted = 1-\frac{(1-R^2)(m-1)}{m-p-1}$ ，其中p为特征数量

MSE(均方差)

$MSE=\frac{1}{m}\sum_{i=1}^{m}{(y_i-\hat{y_i})^2}$

梯度下降法

原理

1、构造函数
通常使用 $y=\omega^Tx+b$ ，这里 $\omega$ 是列向量 $\omega$ 的大小等于属性值个数，b代表偏置

2、定义损失函数
$J(\omega ,b) = \frac{1}{2m}\sum_{i=1}^{m} (f(x_i)-y_i)^2$
代表的是均方误差， $f(x_i)$ 表示第i个训练样本点预测值， $y_i$ 表示第i个训练样本点的真实值

3、优化函数

即找到能使损失函数最小的 $\omega$ 和b，使用的是梯度下降法

利用求偏导找到最小值，求偏导的过程其实就是求斜率。斜率为正，说明函数值右边比左边大，向左移动；斜率为负，函数值左边比右边大，向右移动。于是得到

$\omega = \omega - \alpha \ast \frac{d{J(\omega ,b)}}{d\omega }$

其中 $\alpha$ 是学习率，决定梯度下降的快慢，需要自己设置(也叫超参数)。求导的结果放到下面了

4、收敛条件

达到迭代次数
设置阈值 $\epsilon$ ，当 Cost < $\epsilon$ 时停止

代码

数据集：数据集

提取码：tnme

class LinearModel():
    def __init__(self, fit_intercept=True, normalize=True, verbose=False):
        self.fit_intercept = fit_intercept
        self.normalize = normalize
        self.verbose = verbose
        self.coef_ = 0
        self.intercept_ = 0
        
    def fit(self, X, y):
        if self.normalize:
            X = self.z_score(X)
        assert X.ndim>1, "Expected 2D array, got 1D array instead"
        n = X.shape[1]
        w = np.zeros(n)
        b = 0
        self.coef_, b_out, J_hist = self.gradient_descent(X, y, w, b)
        if self.fit_intercept:
            self.intercept_ = b_out
        if self.verbose:
            self.plot(J_hist)
     
    #z-score标准化
    def z_score(self, X):
        mean = np.mean(X, axis=0)
        std = np.std(X, axis=0)
        return (X - mean) / std
       
    #梯度下降
    def gradient_descent(self, X, y, w, b, alpha=0.01, num_iters=20000):
        J_history = []
        for i in range(num_iters):
            dw, db = self.compute_gradient(X, y, w, b)
            w = w - alpha * dw
            b = b - alpha * db
            
            cost = self.compute_cost(X, y, w, b)
            J_history.append(cost)
        return w, b, J_history
        
    #代价函数
    def compute_cost(self, X, y, w, b):
        m = X.shape[0]
        f_wb = np.dot(X, w) + b
        cost = sum((f_wb - y)**2) / (2*m)
        return cost
    
    #计算梯度
    def compute_gradient(self, X, y, w, b):
        m = X.shape[0]
        f_wb = np.dot(X, w) + b
        dw = (f_wb - y) @ X
        db = sum(f_wb - y)
        dw = dw / m
        db = db / m
        return dw, db
    
    #预测
    def predict(self, X):
        X = self.z_score(X)
        return (self.coef_ * X + self.intercept_).reshape(-1)
        
    def plot(self, J_history):
        plt.figure()
        plt.plot(J_history, c='orange', lw=3)
        plt.xlabel("iter")
        plt.ylabel("Cost")
        plt.grid(True)
        plt.show()

X, y = df['x'].values, df['y'].values
X = X.reshape(len(X), -1)
x_train, x_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=12)
linear = LinearModel(verbose=True)
linear.fit(x_train, y_train)

n, p = x_test.shape
yhat = linear.predict(x_test)
SST = sum((y_test - np.mean(y_test)) ** 2)
SSR = sum((yhat - np.mean(y_test)) ** 2)
SSE = sum((y_test - yhat) ** 2)
R2 = SSR/ SST
R2_adjusted = 1 - ((1-R2) * (n - 1)) / (n - p - 1)
print(f"R-square: {R2}")
print(f"Adjusted R-square: {R2_adjusted}")
print(f"mean_squared_error: {mean_squared_error(y_test, yhat)}")

正规方程式

原理

不同于上面的算法，这次我们将b加到向量 $\omega$ 当中，即 $\hat{\omega} = (\omega ;b)$ ，为了保持一致，我们对X添加新的一列，其值全为1，即

$X=\begin{pmatrix} {x_1}^T & 1\\ {x_2}^T & 1\\ \vdots & \vdots\\ {x_m}^T & 1 \end{pmatrix}$

这时我们的损失函数就变成 $Loss=(y-X\hat{\omega })^T(y-X\hat{\omega })$ ，需要找到 ${\hat{\omega}}^*$ 使 $Loss$ 最小。

这里的计算需要利用矩阵求导，详细的内容可以参考资料[1]

简单写下计算过程

注意，这里的X为满秩矩阵或正定矩阵（保证矩阵可逆）

不可逆的情况：1、矩阵各行（列）线性相关。2、 $m\leqslant n$ （m为行数，n为列数）

相较于梯度下降法，

优点：1、不用设置学习率 $\alpha$ 2、不需要迭代

缺点：1、如果特征的个数很大的时候，计算效率会很慢

代码

X2 = np.c_[x_train, np.ones(len(x_train))] #添加一值为1的列，表示偏置
y = np.expand_dims(y_train, axis=-1)

w = np.linalg.pinv(X2.T @ X2) @ X2.T @ y
yhat = w[0] * x_test + w[1]
yhat = yhat.reshape(-1) #转换成一维
SST = sum((y_test - np.mean(y_test)) ** 2)
SSR = sum((yhat - np.mean(y_test)) ** 2)
SSE = sum((y_test - yhat) ** 2)
R2 = SSR/ SST
R2_adjusted = 1 - ((1-R2) * (n - 1)) / (n - p - 1)
print(f"R-square: {R2}")
print(f"Adjusted R-square: {R2_adjusted}")
print(f"mean_squared_error: {mean_squared_error(y_test, yhat)}")

局部加权回归

原理

对于上面这些点，如果我们使用线性回归，很明显看出拟合的效果不好，因为我们能直观看出这些点拟合出一条曲线更好，这时候就不能用线性回归了。

局部加权回归的原理：对于某一点x，我们只需要根据其附近的样本点拟合，从而得到x的预测值

比如我们要预测红色点的值，只需根据附近绿色点拟合出的直线进行预测即可。

于是重新定义损失函数

$E=\frac{1}{2}\sum_{i=1}^{m}\omega_i(\hat{y_i}-y_i)^2$

$\omega_i = exp(-\frac{||x_i - x||_2}{2\sigma^2})$

其中 $\omega_i$ 又称为高斯核，现在想想，为什么损失函数定义为这样？

因为我们只需要依据预测点附近的样本点进行预测就ok了，对于远离预测点的那些点来说，他们不影响预测，因此添加权重到损失函数.对于附近的样本点来说，他们对损失函数的大小起着重要作用。这就要来到第二小问了：为什么权重使用高斯核？。

对于离x近的的样本点 $x_i$ ： $||x_i-x||_2$ 的值越接近于0，得到的 $\omega_i$ 也就越接近1，所占权重大

对于离x远的的样本点 $x_i$ ： $||x_i-x||_2$ 的值越接近于 $\infty$ ，得到的 $\omega_i$ 也就越接近0，所占权重小

所以只需要求出

$arg \min_{\theta}E$

对J求导即可求出 $\theta$ ，常数项不影响求导，可以忽略

代码

class LWLR():
    def __init__(self, sigma=1):
        self.sigma = sigma
        
    def fit(self, X, y, x):
        assert X.ndim > 1, "Except the dimension should > 1"
       
        X = np.c_[X, np.ones(len(X))] #添加常数值1的列，对应偏置
        y = np.expand_dims(y,axis=-1) #扩张维度
        m = len(x)
        x = np.c_[x, np.ones(m)] #这里也要添加一常数列
        yhat = np.zeros(m) #存放预测数据
        
        for i in range(m):
            W = self.get_weight(X, x[i])
            #利用公式求出theta
            theta = np.linalg.pinv(X.T @ W @ X) @ X.T @ W @ y + 1e-5 #防止theta值过小
            yhat[i] = np.dot(x[i], theta)
        return yhat
        
    def gaussian_kernel(self, X, x0):
        diff = X - x0
        return np.exp(np.linalg.norm(diff, axis=1)**2 / (-2 * self.sigma ** 2))
    
    def get_weight(self, X, x_test):
        w = self.gaussian_kernel(X, x_test)
        W = np.diag(w) #对角矩阵
        return W
    
X, y = df['x'].values, df['y'].values
#按列排序
x_test = np.sort(X, axis=0)
# x_test = np.linspace(min(X), max(X), 500)
#转换成矩阵
X = X.reshape(len(X), -1)

#测试不同sigma下的效果
sigmas = [1, 0.1, 0.01, 0.001]
fig = plt.figure(figsize=(15, 8))
for i, sigma in enumerate(sigmas):
    ax = fig.add_subplot(2, 2, i+1)
    lwlr = LWLR(sigma=sigma)
    yhat = lwlr.fit(X,y, x_test)
    
    plt.scatter(df['x'], df['y'])
    plt.plot(x_test, yhat, c="r", lw=2)
    ax.set_title(f"sigma = {sigma}")
plt.show()