神经网络基础-神经网络补充概念-33-偏差与方差

丰。。

于 2023-08-16 10:48:40 发布

阅读量693

点赞数 1

分类专栏：神经网络补充神经网络文章标签：神经网络人工智能深度学习

本文链接：https://blog.csdn.net/CSDNXXCQ/article/details/132314570

版权

神经网络同时被 2 个专栏收录

105 篇文章 7 订阅

订阅专栏

神经网络补充

63 篇文章 2 订阅

订阅专栏

概念

偏差（Bias）：
偏差是模型预测值与实际值之间的差距，它反映了模型对训练数据的拟合能力。高偏差意味着模型无法很好地拟合训练数据，通常会导致欠拟合。欠拟合是指模型过于简单，不能捕捉数据中的复杂模式，导致在训练集和测试集上都表现不佳。

方差（Variance）：
方差是模型在不同训练数据集上预测值的变化程度，它反映了模型对训练数据的敏感性。高方差意味着模型过于复杂，对训练数据的小变化非常敏感，通常会导致过拟合。过拟合是指模型在训练集上表现很好，但在未见过的测试数据上表现不佳。

权衡

偏差-方差权衡（Bias-Variance Trade-off）：
在实际机器学习中，我们通常希望找到适当的模型复杂度，以平衡偏差和方差之间的关系，从而实现良好的泛化能力。一个理想的模型应该具有适当的复杂度，能够在训练数据上进行合适的拟合，同时又不会过于敏感，可以较好地适应未见过的数据。

解决方法

解决偏差和方差的方法包括：

减小偏差：增加模型的复杂度，使用更多的特征或更深的网络等，以提高模型的表达能力。
减小方差：使用正则化方法，如L1/L2正则化、Dropout等，以减少模型对训练数据的过度拟合。
总结：

偏差反映了模型对训练数据的拟合能力，高偏差通常导致欠拟合。
方差反映了模型对不同训练数据的变化敏感性，高方差通常导致过拟合。
偏差和方差之间存在权衡关系，需要找到适当的模型复杂度来实现良好的泛化能力。

代码实现

import numpy as np
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import PolynomialFeatures
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error

# 生成随机数据
np.random.seed(0)
X = np.random.rand(100, 1)
y = 2 * X + 1 + np.random.randn(100, 1) * 0.2

# 划分训练集和测试集
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)

# 创建多项式特征
poly = PolynomialFeatures(degree=10)
X_train_poly = poly.fit_transform(X_train)
X_test_poly = poly.transform(X_test)

# 使用不同复杂度的模型进行拟合
model_underfit = LinearRegression()
model_properfit = LinearRegression()
model_overfit = LinearRegression()

model_underfit.fit(X_train, y_train)
model_properfit.fit(X_train_poly, y_train)
model_overfit.fit(X_train_poly, y_train)

# 绘制拟合结果
plt.figure(figsize=(12, 6))

plt.subplot(1, 3, 1)
plt.scatter(X_train, y_train, color='blue', label='Training Data')
plt.plot(X_train, model_underfit.predict(X_train), color='red', label='Underfitting')
plt.legend()
plt.title('Underfitting')

plt.subplot(1, 3, 2)
plt.scatter(X_train, y_train, color='blue', label='Training Data')
plt.plot(X_train, model_properfit.predict(X_train_poly), color='red', label='Properfitting')
plt.legend()
plt.title('Properfitting')

plt.subplot(1, 3, 3)
plt.scatter(X_train, y_train, color='blue', label='Training Data')
plt.plot(X_train, model_overfit.predict(X_train_poly), color='red', label='Overfitting')
plt.legend()
plt.title('Overfitting')

plt.tight_layout()
plt.show()

# 计算测试集上的均方误差
y_pred_underfit = model_underfit.predict(X_test)
y_pred_properfit = model_properfit.predict(X_test_poly)
y_pred_overfit = model_overfit.predict(X_test_poly)

mse_underfit = mean_squared_error(y_test, y_pred_underfit)
mse_properfit = mean_squared_error(y_test, y_pred_properfit)
mse_overfit = mean_squared_error(y_test, y_pred_overfit)

print("MSE Underfit:", mse_underfit)
print("MSE Properfit:", mse_properfit)
print("MSE Overfit:", mse_overfit)

在这个示例中，我们生成了一个随机的多项式回归问题，并使用不同复杂度的线性回归模型进行拟合。通过绘制拟合结果和计算测试集上的均方误差，我们可以看到：

Underfitting（欠拟合）：模型过于简单，无法捕捉数据中的复杂模式，导致在训练集和测试集上都表现不佳。

Properfitting（适当拟合）：使用多项式特征的线性回归模型可以适当地拟合数据，既不过于简单也不过于复杂。

Overfitting（过拟合）：模型过于复杂，对训练数据的小变化非常敏感，导致在训练集上表现良好，但在测试集上表现不佳。

通过调整模型的复杂度，我们可以在适当的范围内平衡偏差和方差，从而实现更好的泛化能力。

丰。。

关注

1
点赞
踩
1

收藏

觉得还不错? 一键收藏
打赏
0
评论
神经网络基础-神经网络补充概念-33-偏差与方差

在实际机器学习中，我们通常希望找到适当的模型复杂度，以平衡偏差和方差之间的关系，从而实现良好的泛化能力。一个理想的模型应该具有适当的复杂度，能够在训练数据上进行合适的拟合，同时又不会过于敏感，可以较好地适应未见过的数据。Overfitting（过拟合）：模型过于复杂，对训练数据的小变化非常敏感，导致在训练集上表现良好，但在测试集上表现不佳。通过调整模型的复杂度，我们可以在适当的范围内平衡偏差和方差，从而实现更好的泛化能力。减小偏差：增加模型的复杂度，使用更多的特征或更深的网络等，以提高模型的表达能力。
复制链接

扫一扫