ReLu(Rectified Linear Units)激活函数

从传统Sigmoid系到ReLU,激活函数在神经网络中扮演核心角色。ReLU因其快速收敛和高效性能,已成为深度学习的首选。文章探讨了Sigmoid、Tanh、Softplus、ReLU及其变体的工作原理和优势。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

1 传统Sigmoid系激活函数

传统神经网络中最常用的两个激活函数,Sigmoid系(Logistic-Sigmoid 、Tanh-Sigmoid)被视为神经网络的核心所在。

从数学上来看,非线性的Sigmoid函数对中央区的信号增益较大,对两侧区的信号增益小,在信号的特征空间映射上,有很好的效果。

从神经科学上来看,中央区酷似神经元的兴奋态,两侧区酷似神经元的抑制态,因而在神经网络学习方面,可以将重点特征推向中央区,将非重点特征推向两侧区。

无论是哪种解释,看起来都比早期的线性激活函数(y=x),阶跃激活函数(-1/1,0/1)高明了不少。

1.1 近似生物神经激活函数:Softplus&ReLu 

2001年,神经科学家Dayan、Abott从生物学角度,模拟出了脑神经元接受信号更精确的激活模型,该模型如左图所示:

这个模型对比Sigmoid系主要变化有三点:①单侧抑制 ②相对宽阔的兴奋边界 ③稀疏激活性(重点,可以看到红框里前端状态完全没有激活)

同年,Charles Dugas等人在做正数回归预测论文中偶然使用了Softplus函数,Softplus函数是Logistic-Sigmoid函数原函数,即,softplus的微分就是logistic function

按照论文的说法,一开始想要使用一个指数函数(天然正数)作为激活函数来回归,但是到后期梯度实在太大,难以训练,于是加了一个log来减缓上升趋势。

加了1是为了保证非负性。同年,Charles Dugas等人在NIPS会议论文中又调侃了一句,Softplus可以看作是强制非负校正函数(Rectified Linear Units) f(x) = max(0,x)的平滑版本。

偶然的是,同是2001年,ML领域的Softplus/Rectifier激活函数与神经科学领域提出的脑神经元激活频率函数有神似的地方,这促成了新的激活函数的研究。

另外一种函数叫做softmax function或者normalized exponential是logistic function的一个泛化,如下: 

the softmax function常被用来various probabilistic multiclass classification methods 比如multinomial logistic regression,multiclass linear discriminant analysis, naive Bayes classifiers and artificial neural networks等。在Andrew Ng的机器学习课程中,softmax regression中就用到了softmax function。这里点一下说明。当然本文的重点还是在rectified linear function上。

2. 几种变体:

noisy ReLUs
可将其包含Gaussian noise得到noisy ReLUs,f(x)=max(0,x+N(0,σ(x))),常用来在机器视觉任务里的restricted Boltzmann machines中。

leaky ReLUs
允许小的非零的gradient 当unit没有被激活时。 

3. Advantages

(Softplus是ReLU的圆滑版,公式为:g(x)=log(1+e^x),从上面的结果看,效果比ReLU稍差) 
ReLU在经历预训练和不经历预训练时的效果差不多,而其它激活函数在不用预训练时效果就差多了。ReLU不预训练和sigmoid预训练的效果差不多,甚至还更好。 
相比之下,ReLU的速度非常快,而且精确度更高。 
因此ReLU在深度网络中已逐渐取代sigmoid而成为主流。 
ReLU导数(分段): 
x <= 0时,导数为0 
x > 0时,导数为1 
早期多层神经网络如果用sigmoid函数或者hyperbolic tangent作为激活函数,如果不进行pre-training的话,会因为gradient vanishing problem而无法收敛。 
而预训练的用处:规则化,防止过拟合;压缩数据,去除冗余;强化特征,减小误差;加快收敛速度。而采用ReLu则不需要进行pre-training。
 

 

 

 

 

 

 

 

 

 

 

 

 

### ReLU Activation Function in Neural Networks In neural networks, the Rectified Linear Unit (ReLU) serves as an essential component that introduces non-linearity into models. This function outputs the input directly if it is positive; otherwise, it will output zero[^3]. Mathematically speaking, this can be represented by: \[ f(x)=\max(0,x) \] This simple yet effective mechanism allows neurons to efficiently compute activations while avoiding vanishing gradient problems associated with traditional activation functions like sigmoid or tanh. However, one potential issue encountered when using ReLU units lies within their tendency towards dying during training – meaning they become inactive permanently once weights update causes them not firing anymore over time due to negative inputs always resulting in zeros after passing through such layers without any adjustments made accordingly via backpropagation updates applied later stages down stream from these points forward until convergence occurs eventually leading up toward optimal solutions being found successfully overall throughout entire process completion period required for achieving desired outcomes set forth initially before starting out on journey ahead together now moving onward further into discussion about alternatives available today's modern era technology advancements present us currently at hand here right away below next section immediately following thereafter sequentially listed items provided just above mentioned previously already covered earlier parts written prior sections preceding current paragraph placement location spot situated position placed exactly where you see me writing words sentences paragraphs etc... To address some limitations of standard ReLUs, variations have been proposed including PACT which uses parameterized clipping instead of fixed thresholds allowing more flexibility during optimization processes thus potentially mitigating issues related specifically around dead/dormant nodes problem often faced traditionally under certain conditions depending upon specific use-cases scenarios encountered practically implemented systems deployed across various industries sectors fields applications domains areas contexts situations circumstances environments settings configurations arrangements structures frameworks architectures designs implementations deployments operations management maintenance support services products offerings solutions approaches methodologies strategies tactics techniques mechanisms procedures protocols standards guidelines policies regulations laws rules constraints requirements specifications parameters variables factors elements components pieces parts aspects features attributes properties characteristics traits qualities states conditions statuses positions placements locations spots places spaces rooms buildings facilities infrastructure superstructures constructs edifices establishments institutions organizations entities bodies corporations enterprises businesses ventures projects initiatives efforts endeavors undertakings pursuits activities actions events occurrences happenings incidents episodes occasions moments times periods durations intervals spans stretches expanses extents reaches ranges scopes scales magnitudes sizes dime
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值