ensemble learning - Task 2 (回归）

最新推荐文章于 2024-11-26 00:47:56 发布

my319

最新推荐文章于 2024-11-26 00:47:56 发布

阅读量191

点赞数

文章标签： python 机器学习

本文链接：https://blog.csdn.net/my319/article/details/118770736

版权

本文详细探讨了线性回归模型的最小二乘法，解释了如何通过求导找到使误差平方和最小的参数。同时，讨论了线性回归中最小二乘估计与极大似然估计的关系。此外，还分析了多项式回归在实际应用中可能面临的过拟合问题及其对预测稳定性的影响。最后，展示了如何实现线性回归模型的拟合。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

(1)详细阐述线性回归模型的最小二乘法（The method of least squares）表达

为了衡量真实值yi与线性回归模型的预测值 $w^Tx_i$ 之间的差距，用 $y_i- w^Tx_i$ 的平方和L(w)来描述这种差距，

$L(w) =\sum\limits_{i=1}^{N}(w^Tx_i-y_i)^2 \\ = (w^TX^T-Y^T)(w^TX^T-Y^T)^T \\ = w^TX^TXw - 2w^TX^TY+YY^T$

求使得L(w)最小时对应的参数w，即：

$\hat{w} = argmin\;L(w)\\$

求导：
$\frac{\partial L(w)}{\partial w} = 2X^TXw-2X^TY = 0\\$

$\hat{w} = (X^TX)^{-1}X^TY$

（2) 线性回归中极大似然估计和最小二乘估计的区别和联系

线性回归的最小二乘估计等价于噪声 $\epsilon\backsim N(0,\sigma^2)$ 的极大似然估计

（3）为什么多项式回归在实际问题表现不佳

多项式回归若阶数过大容易在训练数据集上过拟合，在X的边界处有置信区间明显增大，预测效果的稳定性下降

(7) 实现用线性回归模型拟合模型

from sklearn import datasets
import numpy as np

def hypothesis(X, theta):
    return np.dot(X, theta)

def computeCost(X, y, theta):
    n = X.shape[0]
    y_pred = hypothesis(X, theta)
    return np.sum(np.square(y_pred - y.reshape(-1,1)))/(2*n)
    
def gradientDescent(X,y,theta,alpha,epoch):
    cost = np.zeros(epoch)
    for i in range(epoch):
        y_pred = hypothesis(X, theta)
        #print(y_pred.shape, y.shape)
        residues = y_pred - y.reshape(-1,1)
        #print(residues.shape)
        grad = np.dot(X.T, residues) / X.shape[0]
        theta -= alpha*grad 
        cost[i] = computeCost(X,y,theta)
        
    return theta, cost

boston = datasets.load_boston()     
X = boston.data
y = boston.target

X_std = (X - np.mean(X, axis=0)) / np.std(X, axis=0)
X_std = np.concatenate([X_std,np.ones((X.shape[0],1))], axis=1)
theta = np.zeros((X_std.shape[1],1))
alpha = 1e-4
epoch = 100000

# perform linear regression on the data set
g, cost = gradientDescent(X_std, y, theta, alpha, epoch)