读书笔记：神经网络的梯度 gradient_simplenet.py ← 斋藤康毅

hnjzsyjyj

于 2023-01-27 22:14:15 发布

阅读量109

点赞数

分类专栏：深度学习与人工智能 Python程序设计文章标签：神经网络的梯度

本文链接：https://blog.csdn.net/hnjzsyjyj/article/details/128773403

版权

Python程序设计同时被 2 个专栏收录

52 篇文章 14 订阅

订阅专栏

深度学习与人工智能

30 篇文章 4 订阅

订阅专栏

神经网络的梯度，是指损失函数关于权重参数的梯度。
需要提醒的是，此代码每次运行的输出结果不同。如下面是运行三次时，每次的输出结果。

# 第1次的运行结果
[[ 0.26126144  0.31254288 -0.57380431]
 [ 0.39189215  0.46881432 -0.86070647]]

# 第2次的运行结果
[[ 0.23076882  0.04199472 -0.27276354]
 [ 0.34615324  0.06299208 -0.40914532]]

# 第3次的运行结果
[[ 0.09425202  0.31184225 -0.40609428]
 [ 0.14137804  0.46776338 -0.60914141]]

【神经网络的梯度 gradient_simplenet.py】

import sys,os
sys.path.append(os.pardir)
import numpy as np

def softmax(x):
    if x.ndim==2:
        x=x.T
        x=x-np.max(x,axis=0)
        y=np.exp(x)/np.sum(np.exp(x),axis=0)
        return y.T 

    x=x-np.max(x)
    return np.exp(x)/np.sum(np.exp(x))

def cross_entropy_error(y,t):
    if y.ndim==1:
        t=t.reshape(1,t.size)
        y=y.reshape(1,y.size)
        
    if t.size==y.size:
        t=t.argmax(axis=1)
             
    batch_size=y.shape[0]
    return -np.sum(np.log(y[np.arange(batch_size),t]+1e-7))/batch_size

def numerical_gradient_no_batch(f,x):
    h=1e-4
    grad=np.zeros_like(x)
    
    for idx in range(x.size):
        tmp_val=x[idx]
        x[idx]=float(tmp_val)+h
        fxh1=f(x)
        
        x[idx]=tmp_val-h 
        fxh2=f(x)
        grad[idx]=(fxh1-fxh2)/(2*h)
        
        x[idx]=tmp_val
        
    return grad
 
def numerical_gradient(f,X):
    if X.ndim==1:
        return numerical_gradient_no_batch(f,X)
    else:
        grad=np.zeros_like(X)
        
        for idx,x in enumerate(X):
            grad[idx]=numerical_gradient_no_batch(f,x)
        
        return grad

class simpleNet:
    def __init__(self):
        self.W=np.random.randn(2,3)

    def predict(self,x):
        return np.dot(x,self.W)

    def loss(self,x,t):
        z=self.predict(x)
        y=softmax(z)
        loss=cross_entropy_error(y,t)

        return loss

x=np.array([0.6,0.9])
t=np.array([0,0,1])

net=simpleNet()

f=lambda w:net.loss(x,t)
dW=numerical_gradient(f,net.W)

print(dW)