激活函数Swish和Hardswish简介

最新推荐文章于 2025-03-09 21:51:08 发布

coder1479

最新推荐文章于 2025-03-09 21:51:08 发布

阅读量2.3w

点赞数 21

分类专栏：深度学习文章标签：深度学习神经网络

本文链接：https://blog.csdn.net/m0_48742971/article/details/123438626

版权

深度学习专栏收录该内容

29 篇文章

订阅专栏

前言

Swish激活函数和Hardswish激活函数学习笔记。

Swish论文

Searching for Activation Functions，Google出品。

论文摘要翻译和解读

The choice of activation functions in deep networks has a significant effect on the training dynamics and task performance.

深度网络中激活函数的选择对训练动态（training dynamics）和任务性能有显著影响。

training dynamics：指的是训练过程中，模型的性能指标随迭代轮数变化的情况。影响training dynamics的因素不止一个，每种网络结构都有自己的training dynamic，但有些因素对各类网络的traning dynamics都有影响，比如激活函数、学习率等。

Currently, the most successful and widely-used activation function is the Rectified Linear Unit (ReLU).

目前，最成功和广泛使用的激活函数是整流线性单元（ReLU）。

Although various hand-designed alternatives to ReLU have been proposed, none have managed to replace it due to inconsistent gains.
虽然已经提出了各种手工设计的ReLU替代品，但由于增益不一致，没有一个能够取代它。

In this work, we propose to leverage automatic search techniques to discover new activation functions.
在这项工作中，我们建议利用自动搜索技术来发现新的激活功能。

Using a combination of exhaustive and reinforcement learning-based search, we discover multiple novel activation functions.
通过结合暴力搜索和基于强化学习的搜索，我们发现了多种新颖的激活函数。

We verify the effectiveness of the searches by conducting an empirical evaluation with the best discovered activation function.
我们通过对发现的最佳激活函数进行实证评估来验证搜索的有效性。

实验评估，就是不用理论分析。

Our experiments show that the best discovered activation function, f(x)=x⋅sigmoid(βx), which we name Swish, tends to work better than ReLU on deeper models across a number of challenging datasets.
我们的实验表明，在许多具有挑战性的数据集中，所发现的最佳激活函数f(x)=x⋅sigmoid(βx)，我们将其命名为Swish，在更深的模型上往往比ReLU更好。

$\frac{1}{1 + exp(−x)}$

β，常量，或者是可学习的参数。
如果β = 1， f(x)=x⋅sigmoid(x)，相当于Sigmoid-weighted Linear Unit (SiL) 。
如果β = 0，Swish 变成了缩放线性函数 f(x) = x/2。
如果β → ∞，sigmoid 分量接近 0-1 函数，因此 Swish 变得像 ReLU 函数。
这表明，我们可以大致地把Swish 视为一个平滑函数，它在线性函数和 ReLU 函数之间进行非线性插值。如果将插值程度设置为可训练参数，则模型可以控制β。

从下图可以看到不同的β取值时的函数曲线，当β =10的时候，就开始和ReLU很靠近了。
在这里插入图片描述

For example, simply replacing ReLUs with Swish units improves top-1 classification accuracy on ImageNet by 0.9% for Mobile NASNet-A and 0.6% for Inception-ResNet-v2.

例如，只需将 ReLU 替换为 Swish 单元，Mobile NASNet-A 的 ImageNet 上 top-1 分类准确率就会提高 0.9%，Inception-ResNet-v2 的分类准确率提高 0.6%。