深度学习基础——梯度检验

最新推荐文章于 2024-11-29 10:30:00 发布

潇湘潞

最新推荐文章于 2024-11-29 10:30:00 发布

阅读量109

点赞数

文章标签：深度学习

本文链接：https://blog.csdn.net/mihexiaoxianglu/article/details/132796733

版权

梯度检验

梯度定义：
$\frac{dJ}{d\theta}=\lim_{\epsilon\rightarrow0}\frac{J(\theta+\epsilon)-J(\theta-\epsilon)}{2\epsilon}$
一维梯度：类比函数求导定义，已知成本函数：J(theta).

N维梯度：

正向传播

在这里插入图片描述

反向传播

在这里插入图片描述

函数dictionary_to_vector(),vector_to_dictionary(),gradients_to_vector()将字典参数变为串联的向量：

def dictionary_to_vector(parameters):
    """
    Roll all our parameters dictionary into a single vector satisfying our specific required shape.
    """
    keys = []
    count = 0
    for key in ["W1", "b1", "W2", "b2", "W3", "b3"]:
        
        # flatten parameter
        new_vector = np.reshape(parameters[key], (-1,1))
        keys = keys + [key]*new_vector.shape[0]
        
        if count == 0:
            theta = new_vector
        else:
            theta = np.concatenate((theta, new_vector), axis=0)
        count = count + 1
 
    return theta, keys
def vector_to_dictionary(theta):
    """
    Unroll all our parameters dictionary from a single vector satisfying our specific required shape.
    """
    parameters = {}
    parameters["W1"] = theta[:20].reshape((5,4))
    parameters["b1"] = theta[20:25].reshape((5,1))
    parameters["W2"] = theta[25:40].reshape((3,5))
    parameters["b2"] = theta[40:43].reshape((3,1))
    parameters["W3"] = theta[43:46].reshape((1,3))
    parameters["b3"] = theta[46:47].reshape((1,1))
 
    return parameters
 
def gradients_to_vector(gradients):
    """
    Roll all our gradients dictionary into a single vector satisfying our specific required shape.
    """
    
    count = 0
    for key in ["dW1", "db1", "dW2", "db2", "dW3", "db3"]:
        # flatten parameter
        new_vector = np.reshape(gradients[key], (-1,1))
        
        if count == 0:
            theta = new_vector
        else:
            theta = np.concatenate((theta, new_vector), axis=0)
        count = count + 1
 
    return theta