DeepLearing学习笔记-Sigmoid函数的梯度

最新推荐文章于 2025-02-04 11:39:32 发布

JasonLiu1919

最新推荐文章于 2025-02-04 11:39:32 发布

阅读量9.4k

点赞数 4

分类专栏：机器学习深度学习文章标签： deep-learning python

本文链接：https://blog.csdn.net/ljp1919/article/details/78075731

版权

深度学习同时被 2 个专栏收录

55 篇文章

订阅专栏

机器学习

17 篇文章

订阅专栏

背景：

求解 $z= \sigma(z)$ 的梯度
由于 $sigmoid(x) = \frac{1}{1+e^{-x}}$
在python中利用numpy模块实现：

# GRADED FUNCTION: sigmoid

import numpy as np
# this means you can access numpy functions by writing np.function() instead of numpy.function()

def sigmoid(x):
    """
    Compute the sigmoid of x

    Arguments:
    x -- A scalar or numpy array of any size

    Return:
    s -- sigmoid(x)
    """

    ### START CODE HERE ### (≈ 1 line of code)
    s = None
    s = 1/(1+np.exp(-x))
    ### END CODE HERE ###

    return s

求对应的导数

s i g m o i d_d e r i v a t i v e (x) = σ' (x) = σ (x) (1 - σ (x)) (1)

$sigmoid\_derivative(x) = \sigma'(x) = \sigma(x) (1 - \sigma(x))\tag{1}$
那这个是怎么推导的呢？

σ(x)=11+e−x $\sigma(x) = \frac{1}{1+e^{-x}}$
另临时变量

t=1+e−x $t={1+e^{-x}}$ ，通过复合函数的求导法则，所以

σ′(x)=(t−1)′⋅t′=−t−2⋅(−e−x)=1(1+e−x)2⋅e−x=11+e−x(e−x1+e−x)=11+e−x(1+e−x−11+e−x)=11+e−x(1−11+e−x)=σ(x)⋅(1−σ(x)) $\sigma'(x)=(t^{-1})^{'}\cdot t^{'}=-t^{-2}\cdot (-e^{-x})=\frac{1}{(1+e^{-x})^{2}} \cdot e^{-x}=\frac{1}{1+e^{-x}}(\frac{e^{-x}}{1+e^{-x}})=\frac{1}{1+e^{-x}}(\frac{1+e^{-x}-1}{1+e^{-x}})=\frac{1}{1+e^{-x}}(1-\frac{1}{1+e^{-x}})=\sigma(x)\cdot (1-\sigma(x))$
得证！

python实现

def sigmoid_derivative(x):
    """
    Compute the gradient (also called the slope or derivative) of the sigmoid function with respect to its input x.
    You can store the output of the sigmoid function into variables and then use it to calculate the gradient.

    Arguments:
    x -- A scalar or numpy array

    Return:
    ds -- Your computed gradient.
    """

    ### START CODE HERE ### (≈ 2 lines of code)
    s = 1 / ( 1 + 1 / np.exp(x))
    ds = s * (1 - s)
    ### END CODE HERE ###

    return ds
x = np.array([1, 2, 3])
print ("sigmoid_derivative(x) = " + str(sigmoid_derivative(x)))

输出结果：

sigmoid_derivative(x) = [ 0.19661193 0.10499359 0.04517666]