斯坦福coursera作业题神经网络训练数字识别Backpropagation

最新推荐文章于 2019-03-03 20:41:41 发布

xiaoling_000666

最新推荐文章于 2019-03-03 20:41:41 发布

阅读量295

点赞数

分类专栏： python学习文章标签：斯坦福大学 python

本文链接：https://blog.csdn.net/xiaoling_000666/article/details/79294644

版权

python学习专栏收录该内容

26 篇文章 0 订阅

订阅专栏

神经网络的后向传播算法的主要思路是：

1.生成随机的系数矩阵theta1和theta2，并根据前向传播算法计算得出每个例子为每个数字的概率，在本题中，也就是5000个10*1的矩阵

2.再根据已知答案得出真实值和计算值得差值，在通过公式向前推计算出theta1和theta2对应的梯度。

3.根据公式theta=theta+α*delta，对于theta进行梯度下降，但此时的步长α是未知的，这个就相当于走路，我们已经知道了向哪个方向走可以下降，但是走多远下降的最快是不确定的，则需要根据一维搜索，计算出当α取何值时，theta下降速率最快。从而计算出下降后的theta。

4.利用新的theta值，设置迭代次数，重复2，3最终得到对应theta1和theta2的取值，也就是模型的参数。

我只写到了计算梯度，因为要求就是写到这里……

在这里，我还有一点想说的就是公式theta=theta+α*delta，步长α其实也可以取固定值，只是当α取得使theta下降速率最快的时候，迭代速率会快很多。

而如果当α取固定值1时，误差函数J也会下降，但是会下降的慢。而当α取固定值很大时，步长过大，可能会误差函数越来越大。

def nnCostFunction(theta1,theta2,num_labels, x, y, lamda):
    m=len(x)
    one1=np.ones(m)
    a1=np.column_stack((one1,x))
    z2=np.dot(theta1,a1.T)
    x2=1/(1+np.exp(-z2))
    one2=np.ones(len(x2[0]))
    a2=np.row_stack((one2,x2))
    z3=np.dot(theta2,a2)
    z3=z3.T
    htheta=1/(1+np.exp(-z3))
    x4=np.log(htheta)
    x4_2=np.ones((m, len(htheta[0]))) - htheta
    x5=np.log(x4_2)
    a=htheta[0]
    cost=0
    y_summary = np.zeros((len(y), num_labels))
    for i in range(len(y)):
        yi=np.zeros((1,num_labels))
        j=y[i]
        if j!=10:
            yi[0][j-1]=1
        else:
            yi[0][9]=1
        yi_1 = np.ones((1, num_labels)) - yi
        cost_every_y = -np.dot(yi, x4[i].T) - np.dot(yi_1, x5[i].T)
        cost = cost + cost_every_y
        y_summary[i]=yi
    delta3_small=htheta-y_summary
    delta2_big = np.dot(delta3_small.T, a2.T)
    theta2_new = np.column_stack((np.zeros((len(theta2), 1)), theta2[:, 1:]))
    D2=delta2_big/m+lamda*theta2_new/m
    delta2_small=np.dot(theta2[:,1].T,delta3_small.T)*sigmoidGradient(z2)
    delta1_big = np.dot(delta2_small, a1)
    theta1_new=np.column_stack((np.zeros((len(theta1), 1)),theta1[:,1:]))
    D1=delta1_big/m+lamda*theta1_new/m
    theta1_sum =0
    theta2_sum =0
    for i in range(len(theta1)):
        for j in range(len(theta1[0])):
            theta1_sum=theta1_sum+theta1[i][j]*theta1[i][j]
    for i in range(len(theta2)):
        for j in range(len(theta2[0])):
            theta2_sum=theta2_sum+theta2[i][j]*theta2[i][j]
    theta_sum=theta1_sum+theta2_sum
    cost=cost+0.5*lamda*theta_sum
    return D1,D2,cost/5000
def sigmoidGradient(z):
    g=1/(1+np.exp(-z))
    g_diff=g*(1-g)
    return g_diff
def initialization(l_in,l_out):
    epsilon=math.sqrt(6.0/(l_in+l_out))
    return np.random.rand(l_out, l_in + 1)* 2 * epsilon - epsilon
data = sio.loadmat('ex4data1.mat')       #x为5000*400，y为5000*1，theta1为25*401，theta2为10*26
y = data['y']
x = data['X']
lamda=1
num_labels=10
input_layer_size = 400
hidden_layer_size =25
theta1 =initialization(input_layer_size,hidden_layer_size)    #Part 6: Initializing Pameters
theta2 =initialization(hidden_layer_size,num_labels)
max_iteration=100
for i in range(max_iteration):
    D1, D2, J=nnCostFunction(theta1,theta2,num_labels,x, y, lamda)
    theta2=theta2-D2
    theta1=theta1-D1
    print J

xiaoling_000666

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
斯坦福coursera作业题神经网络训练数字识别Backpropagation

神经网络的后向传播算法的主要思路是：1.生成随机的系数矩阵theta1和theta2，并根据前向传播算法计算得出每个例子为每个数字的概率，在本题中，也就是5000个10*1的矩阵2.再根据已知答案得出真实值和计算值得差值，在通过公式向前推计算出theta1和theta2对应的梯度。3.根据公式theta=theta+α*delta，对于theta进行梯度下降，但此时的步长α是未知的，这
复制链接

扫一扫

专栏目录