计算机视觉：2.3、正则

最新推荐文章于 2024-07-13 16:36:20 发布

俯仰天地

最新推荐文章于 2024-07-13 16:36:20 发布

阅读量124

点赞数

分类专栏：计算机视觉文章标签：计算机视觉机器学习人工智能

本文链接：https://blog.csdn.net/weixin_43669978/article/details/120897194

版权

计算机视觉专栏收录该内容

20 篇文章 2 订阅

订阅专栏

3 正则

Many strategies used in machine learning are explicitly designed to reduce the test error,possibly at the expense of increased training error. These strategies are collectively known as regularization

-----Goodfellow et al

可以看到大牛Goodfellow给出的说法：以升高训练错误率为代价降低训练错误率的策略被称为正则化。

虽然损失函数（loss function）让我们决定我们分类任务上的参数有多少，但是损失函数没有考虑到这些权重矩阵是什么样的。

我们工作的空间有无限个参数集合，这些参数集合可以让我们在我们的数据集上取得良好的准确率。，那么我们该如何确保选择一系列能确保我们模型泛化能力的参数呢？至少要降低过拟合吧？

答案就是正则化，仅此于学习率，正则化是你所能调节的模型的最重要的参数。

正则化技术有很多种类型：例如L1正则和 L2正则，他们的作用也是更新区权重矩阵，只不过是通过添加一个额外的参数来限制模型的容量。

有两类正则的方法：

一类是直接加入到网络结构中：dropout就是一个典型的例子。

另一类是在训练过程中简介运用的正则方法：比如数据增强和早停

那么到底什么是正则？

正则是用来帮助我们控制模型容量的。

那么什么是模型容量呢？在我看来就是模型内参数的个数。

简单理解，如果我们的模型中参数越多，那么肯定越容易精准刻画一个潜在的模式，但是问题在于，我们的训练是以提升精确度为目的的，这样的话肯定是参数越多越好，但是这样存在一个问题，这样的模型在训练集上的准确率极高，但是在训练集上准确率大幅下降，这就是过拟合问题。

但是，如果正则过于严重的化也会出现问题，我们的模型无法准确预测训练集的数据，也就是无法刻画出训练集背后潜在的模式。这种问题叫做欠拟合。

所以我们要做的就是找到过拟合于欠拟合之间的那个平衡点，让我们的模型既能在训练数据中学习到较为准确的潜在模式，又不至于学习地模式只能用于训练数据，而失去了泛化能力。

之前地损失函数：

$L=\frac{1}N \sum^N_{i=1} W_{i,j}^2$
加上正则之后的损失函数：
$L=\frac{1}N \sum^N_{i=1} W_{i,j}^2+\lambda R(W)$
其中R(W)为正则项，常用的正则项有：

L1正则：
$R(W)=\sum_i\sum_j |W_{i,j}|$
L2正则：
$R(W)=\sum_i\sum_jW_{i,j}^2$
Elastic Net：
$R(W)=\sum_i\sum_j\beta W_{i,j}^2+ |W_{i,j}|$

具体问题中，你应该使用哪种正则方法，应该作为一项你需要取优化的超参数，通过实验来选择并确定。

代码实现：

from sklearn.linear_model import SGDClassifier
from sklearn.preprocessing import LabelEncoder
from sklearn.metrics import classification_report
from sklearn.model_selection import train_test_split
from pyimagesearch.preprocessing.simplepreprocessor import SimplePreprocessor
from pyimagesearch.datasets.simpledatasetsloader import SimpleDatasetLoader
from imutils import paths

dataset = r"E:\PycharmProjects\DLstudy\data\animal"
print("[INFO] loading datasets...")
imagePaths = list(paths.list_images(dataset))
sp = SimplePreprocessor(32, 32)
sdl = SimpleDatasetLoader(preprocessor=[sp])
(data, labels) = sdl.load(imagePaths, verbose=500)
data = data.reshape((data.shape[0], 3072))

le = LabelEncoder()
le.fit_transform(labels)
(trainX, testX, trainY, testY) = train_test_split(data, labels, test_size=0.25, random_state=5)

for r in (None, "l1", "l2"):
    print("[INFO] training model with {} penalty".format(r))
    model = SGDClassifier(loss="log", penalty=r, max_iter=100, learning_rate="constant", eta0=0.01,
                          random_state=42)
    model.fit(trainX, trainY)
    acc = model.score(testX, testY)
    print("[INFO] {} penalty accuracy:{:.2f}%".format(r,acc*100))

上述代码，我们分别对比了不使用正则，以及L1正则和L2正则的效果，其中lamda=0.0001（默认），epoch=100，learning rate=0.01

运行结果：

E:\DLstudy\Scripts\python.exe E:/PycharmProjects/DLstudy/run/regularization.py
[INFO] loading datasets...
[INFO] processed 500/8932
[INFO] processed 1000/8932
[INFO] processed 1500/8932
[INFO] processed 2000/8932
[INFO] processed 2500/8932
[INFO] processed 3000/8932
[INFO] processed 3500/8932
[INFO] processed 4000/8932
[INFO] processed 4500/8932
[INFO] processed 5000/8932
[INFO] processed 5500/8932
[INFO] processed 6000/8932
[INFO] processed 6500/8932
[INFO] processed 7000/8932
[INFO] processed 7500/8932
[INFO] processed 8000/8932
[INFO] processed 8500/8932
[INFO] training model with None penalty
[INFO] None penalty accuracy:61.44%
[INFO] training model with l1 penalty
[INFO] l1 penalty accuracy:49.08%
[INFO] training model with l2 penalty
[INFO] l2 penalty accuracy:59.70%
libpng warning: iCCP: known incorrect sRGB profile

Process finished with exit code 0

可以看到，用正则或者不用正则，用哪个正则方法没有谁强谁弱之分，我们需要依据实际情况来进行选择。

俯仰天地

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
计算机视觉：2.3、正则

3 正则Many strategies used in machine learning are explicitly designed to reduce the test error,possibly at the expense of increased training error. These strategies are collectively known as regularization -----Goodf
复制链接

扫一扫