使用numpy基于全连接层实现Minst数据集训练——待续

PS:本贴还没完全写完,全部代码直接转至文末,有时间会一点一点把细节部分解释清楚
题外话:以前习惯调包实现网络,然后发现研究新的较为复杂的网络结构会很吃力,于是回过头来垒实基础,后续会在全连接层的基础上添加卷积池化dropout归一层等等,如果你也有对应需求,可以持续关注哈。so,进入正题

目录

网络结构

代码实现

参数初始化

前向传播:

L_model_forward

linear_activation_forward

linear_forward

activation


Minist数据集网盘地址:回头贴上
ps:度娘随便找一下也有

网络结构

ps:全连接层功能函数实现是基于吴恩达老师的课程的课后作业实现的 ,有大佬整合成了中文版的,顺便推一下(真的很细节):deeplearning_目录
        使用简单的3层全连接层来实现,隐藏层分别使用256,64个神经元,输出层为10分类。(ps:笔者实力有限(大概懒得整理),所以没有将函数整合成类)接下来让我们从全连接层的各个小功能开始一一实现。

代码实现

这里我是从大函数开始讲,我觉的这样会比从小功能开始说起更有条理感,如果大家觉得不好理解可以留言我,我会进一步修改。

参数初始化

def linear_initialize_parameters(n_x,n_h1,n_h2,n_y):

    W1 = np.random.randn(n_h1, n_x) * 0.1
    b1 = np.zeros((n_h1, 1))
    #HE初始化呢
    # W2 = np.random.randn(n_h2, n_h1) * 0.1
    W2 = he_init(n_h1,n_h2)
    b2 = np.zeros((n_h2, 1))
    #softmax激活
    W3 = np.random.randn(n_y, n_h2) * 0.1
    b3 = np.zeros((n_y, 1))


    ### END CODE HERE ###
    parameters = {"W1": W1,
                  "b1": b1,
                  "W2": W2,
                  "b2": b2,
                  "W3": W3,
                  "b3": b3}
    return parameters

输入:
n_x ----------输入神经元个数
n_h1---------隐藏I层神经元个数
n_h2---------隐藏II层神经元个数
n_y-----------输出层神经元个数(多分类的类别数)

输出:
parameters-----一个包含了所有权重矩阵以及偏置值的字典,用于后续的传播

在这段代码中W2是使用了HE初始化,W2对应的输出会用到relu激活函数,对于relu激活的层,使用HE可以更快的收敛,也能减少梯度爆炸概率(原理这里就不细说了),实测random.randn初始化也是可行的,但最后精度会少0.4%(不排除模型随机性影响)。
PS:如果要增加层数或是减少层数,需要更改本函数。举个例子降为2层:输入变为:nx,n_h,n_y,对应的也只需要做W1,b1,W2,b2的初始化,parameters更新为4个元素。

前向传播:

L_model_forward

L层model前向传播

def L_model_forward(X, parameters):
    #parameters会存有(w,b)两个数据
    caches = []
    A = X
    L = len(parameters) // 2  # number of layers in the neural network

    for l in range(1, L):
        A_prev = A
        A, cache = linear_activation_forward(A_prev, parameters['W' + str(l)], parameters['b' + str(l)],activation="relu")
        caches.append(cache)
    #outPut
    AL, cache = linear_activation_forward(A, parameters['W' + str(L)], parameters['b' + str(L)],activation="softmax")
    #AL (10,1)
    caches.append(cache)
    # assert (AL.shape == (1, X.shape[1]))
    return AL, caches

输入:
------------------ 拉伸后图片对象(将原28*28的图片拉伸成(784,1)的数组)
parameters ------ 存储了所有的W(权重矩阵),b(偏置值对象)。ps:举个例子,我有3层,那么parameters中就存储了6个参数分别是:W_{1},b_{1},W_{2},b_{2},W_{3},b_{3},所以L就反应了全连接的层数。
输出:
AL------------------- 前向传播L层后的输出值,在本例中位0-9每个数字的预测值
caches------------- 存储前向传播过程中的一些参数,这些参数在反向传播中会使用
在获取了层数之后,接下来就是进入循环遍历(除了输出层)PS:由于我W是从1开始计数的,所以循环范围取(1,L)

A, cache = linear_activation_forward(A_prev, parameters['W' + str(l)], parameters['b' + str(l)],activation="relu")

这句代码调用linear_activation_forward函数,激活函数使用relu(ps:relu一般适用于隐藏层),这个函数的作用就是进行如下操作:
Relu(W_{i}^{}*A_{i}+b_{i})
对于输出层,直接用relu截断是一个很不明智的选择,对于多分类任务,有一个常用组合softmax激活+交叉熵损失函数(ps:这个组合在有良好的模型评估能力的同时有着非常简单的导数便于反向传播),所以最后一层也就是第L层使用softmax激活函数。

linear_activation_forward

全连接层+激活函数

def linear_activation_forward(A_prev, W, b,activation):
    if activation == "softmax":
        Z, linear_cache = linear_forward(A_prev, W, b)
        A, activation_cache = softmax(Z)
    elif activation == "relu":
        Z, linear_cache = linear_forward(A_prev, W, b)
        A, activation_cache = relu(Z)

    assert (A.shape == (W.shape[0], A_prev.shape[1]))
    cache = (linear_cache, activation_cache)

    return A, cache

 输入:
A_prev ---------- 前一层的输出(第一层时A_prev位原图像)
W ----------------- 权重矩阵
b ------------------ 偏置值
activation ------- 激活函数类型

输出:
-----------------  输出矩阵
cache ----------- (linear_cache, activation_cache),其中linear_forward中产生的与反向传播有关参数,activation_cache是激活函数中产生的与反向传播有关参数

        这一函数功能很是明了了,对全连接层前向传播后,根据不同的激活函数对结果进行非线性化。这里只实现了relu以及softmax,有需要这可以加入更多的激活函数譬如relu的各种变种等等。

so,让我们继续吧,这个函数中设计了2(3)个新的函数,让我们一起看看吧

linear_forward

单层全连接层前向传播

def linear_forward(A, W, b):

    #(64,784) * (784,1)
    # print(W.shape,A.shape)
    Z = np.dot(W, A) + b  #A 784,1  W:10,784
    cache = (A, W, b)
    return Z, cache

输入:
A,W,b ---------- 这个不用我重复了吧。。。

输出:
Z
--------------------- 输出矩阵
cache --------------- (A,W,b),注意啊,这个也就是linear_activation_forward函数输出cache中的linear_cache
       这功能一眼明了了好吧。。。就不多说了W_{i}^{}*A_{i}+b_{i}

activation

        主要涉及2种激活函数relu以及softmax,函数比较简单,就放一起了

def relu(Z):
    A = np.maximum(0, Z)
    cache = Z
    return A, cache

def softmax(Z):
    m = Z.shape[-1]
    A = np.zeros_like(Z)
    for i in range(m):
        a = np.exp(Z[:, i])/np.sum(np.exp(Z[:, i]))
        A[:,i] = a
    cache = A
    return A,cache

        值得一提的是两者返回的与反向传播相关的参数也就是activation_cache有些许不同,relu是返回的输入值,而softmax返回的是输出值(其实返回一个A就行,但是为了统一格式,还是返回了一个cache对象),softmax中的Z(种类,批次数量),不过我并没有使用mini_batch梯度下降的方式来进行反向传播,所以m将会为1。

        OK 至此前向传播就全部完成了,再次贴上最开始的总纲函数,这么看是不是更加清晰了:

def L_model_forward(X, parameters):
    #parameters会存有(w,b)两个数据
    caches = []
    A = X
    L = len(parameters) // 2  # number of layers in the neural network

    for l in range(1, L):
        A_prev = A
        A, cache = linear_activation_forward(A_prev, parameters['W' + str(l)], parameters['b' + str(l)],activation="relu")
        caches.append(cache)
    #outPut
    AL, cache = linear_activation_forward(A, parameters['W' + str(L)], parameters['b' + str(L)],activation="softmax")
    #AL (10,1)
    caches.append(cache)
    # assert (AL.shape == (1, X.shape[1]))
    return AL, caches

to do :
损失函数计算 + 反向传播emmm明天再写吧

全部代码
Ps:如果要在本地跑通请确定安装有numpy,pandas,以及sklearn(这个库是用来随机划分训练集和测试集的,我偷懒没纯手工实现),同时需要将train,以及X_test对应的地址换成Minst数据集的文件

1import numpy as np
import pandas as pd
def he_init(input_dim,output_dim):
    stddev = np.sqrt(2.0/input_dim)
    W = np.random.normal(0,stddev,size =(output_dim,input_dim) )
    return  W
def linear_initialize_parameters(n_x,n_h1,n_h2,n_y):

    W1 = np.random.randn(n_h1, n_x) * 0.1
    b1 = np.zeros((n_h1, 1))
    #HE初始化呢
    # W2 = np.random.randn(n_h2, n_h1) * 0.1
    W2 = he_init(n_h1,n_h2)
    b2 = np.zeros((n_h2, 1))
    #softmax激活
    W3 = np.random.randn(n_y, n_h2) * 0.1
    b3 = np.zeros((n_y, 1))


    ### END CODE HERE ###
    parameters = {"W1": W1,
                  "b1": b1,
                  "W2": W2,
                  "b2": b2,
                  "W3": W3,
                  "b3": b3}
    return parameters


def softmax(Z):
    #Z = (10,1)
    m = Z.shape[-1]
    A = np.zeros_like(Z)
    for i in range(m):
        a = np.exp(Z[:, i])/np.sum(np.exp(Z[:, i]))
        A[:,i] = a
    cache = A
    return A,cache

def relu(Z):
    A = np.maximum(0, Z)
    cache = Z
    return A, cache

def relu_backward(dA, cache):
    Z = cache
    dZ = np.array(dA, copy=True)  # just converting dz to a correct object.
    dZ[Z <= 0] = 0
    return dZ

def linear_forward(A, W, b):

    #(64,784) * (784,1)
    # print(W.shape,A.shape)
    Z = np.dot(W, A) + b  #A 784,1  W:10,784
    cache = (A, W, b)
    return Z, cache

def linear_activation_forward(A_prev, W, b,activation):
    if activation == "softmax":
        Z, linear_cache = linear_forward(A_prev, W, b)
        A, activation_cache = softmax(Z)
    elif activation == "relu":
        Z, linear_cache = linear_forward(A_prev, W, b)
        A, activation_cache = relu(Z)

    assert (A.shape == (W.shape[0], A_prev.shape[1]))
    cache = (linear_cache, activation_cache)

    return A, cache

#return AL, cache
def L_model_forward(X, parameters):
    #parameters会存有(w,b)两个数据
    caches = []
    A = X
    L = len(parameters) // 2  # number of layers in the neural network

    for l in range(1, L):
        A_prev = A
        A, cache = linear_activation_forward(A_prev, parameters['W' + str(l)], parameters['b' + str(l)],activation="relu")
        caches.append(cache)
    #outPut
    AL, cache = linear_activation_forward(A, parameters['W' + str(L)], parameters['b' + str(L)],activation="softmax")
    #AL (10,1)
    caches.append(cache)
    # assert (AL.shape == (1, X.shape[1]))
    return AL, caches
#return cost

###有问题
#二分类任务
def compute_cost(AL, Y):
    #AL预测值 (10,1)  Y--标签(10,1)
    # Y = Y.
    m = Y.shape[1]
    #multiply内积 (10,1)@(10,1) = (10,1)
    pass
    NEAR_0 = 1e-10
    cost = -np.sum(np.multiply(np.log(AL + NEAR_0), Y) + np.multiply(np.log(1 - AL + NEAR_0), 1 - Y)) / m

    cost = np.squeeze(cost)
    assert (cost.shape == ())

    return cost

def softmax_backward(Y,activation_cache):
    """
    Y是真实标签
    activation_cache是softmax后的预测标签
    """
    dZ = activation_cache - Y
    return  dZ

#多分类任务
def compute_cost_multi(AL,Y):
    """
    AL:预测值10,1
    Y标签 10,1
    """
    ### m = 1
    m = AL.shape[-1]
    for i in range(m):
        cost = -(1/m) * np.sum(np.multiply(np.log(AL[:,i]),Y[:,i]))
    return cost
#return dA_prev, dW, db  #dZ(10,1),A_prev(100,1) W(10,100)
def linear_backward(dZ, cache):
    A_prev, W, b = cache
    m = A_prev.shape[1]

    A_prev, W, b = cache
    m = A_prev.shape[1]
    dW = np.dot(dZ, A_prev.T) / m
    db = np.sum(dZ, axis=1, keepdims=True) / m
    dA_prev = np.dot(W.T, dZ)

    return dA_prev, dW, db

#return dA_prev, dW, db
def linear_activation_backward(dA, cache,Y,activation="relu"):
    #y_hat 经过softmax激活后的预测概率
    #dL/dA     dZ = dL/dA * dA/dA_prev(对激活函数求导)
    # cache  =
    linear_cache, activation_cache = cache
    if activation == "relu":
        dZ = relu_backward(dA, activation_cache)
        dA_prev, dW, db = linear_backward(dZ, linear_cache)

    ####
    elif activation == "softmax":
        dZ = softmax_backward( Y, activation_cache)
        #activation_cache 是softmax后的预测数据
        dA_prev, dW, db = linear_backward(dZ, linear_cache)

    return dA_prev, dW, db

#return grads
def L_model_backward(AL, Y, caches):
    ##1.标签对损失函数求导
    ##AY
    grads = {}
    L = len(caches)  # the number of layers
    m = AL.shape[1]
    Y = Y.reshape(AL.shape)  # after this line, Y is the same shape as AL

    # Initializing the backpropagation
    ### START CODE HERE ### (1 line of code)

    epsilon = 1e-10  # 或者根据具体情况选择一个适当的值
    AL = np.clip(AL, epsilon, 1 - epsilon)
    # dAL = - (np.divide(Y, AL) - np.divide(1 - Y, 1 - AL))  #dL/dA  #损失函数导数
    # print(dAL)
    # Lth layer (SIGMOID -> LINEAR) gradients. Inputs: "AL, Y, caches". Outputs: "grads["dAL"], grads["dWL"], grads["dbL"]
    current_cache = caches[L - 1]
    # print(current_cache)
    #dL/dA * dA/d
    #计算输出层的导数
    grads["dA" + str(L)], grads["dW" + str(L)], grads["db" + str(L)] = linear_activation_backward(_, current_cache,Y,activation="softmax")

    ### END CODE HERE ###

    for l in reversed(range(L - 1)):
        # lth layer: (RELU -> LINEAR) gradients.
        # Inputs: "grads["dA" + str(l + 2)], caches". Outputs: "grads["dA" + str(l + 1)] , grads["dW" + str(l + 1)] , grads["db" + str(l + 1)]
        ### START CODE HERE ### (approx. 5 lines)
        current_cache = caches[l]
        dA_prev_temp, dW_temp, db_temp = linear_activation_backward(grads["dA" + str(l + 2)], current_cache,_,activation="relu")
        grads["dA" + str(l + 1)] = dA_prev_temp
        grads["dW" + str(l + 1)] = dW_temp
        grads["db" + str(l + 1)] = db_temp

        ### END CODE HERE ###

    return grads


def update_parameters(parameters, grads, learning_rate):

    L = len(parameters) // 2  # number of layers in the neural network

    for l in range(L):
        parameters["W" + str(l + 1)] = parameters["W" + str(l + 1)] - learning_rate * grads["dW" + str(l + 1)]
        parameters["b" + str(l + 1)] = parameters["b" + str(l + 1)] - learning_rate * grads["db" + str(l + 1)]
        # print(parameters["b" + str(l + 1)])
    return parameters




##################data clear
digital_map = {
    "0": [1, 0, 0, 0, 0, 0, 0, 0, 0, 0],
    "1": [0, 1, 0, 0, 0, 0, 0, 0, 0, 0],
    "2": [0, 0, 1, 0, 0, 0, 0, 0, 0, 0],
    "3": [0, 0, 0, 1, 0, 0, 0, 0, 0, 0],
    "4": [0, 0, 0, 0, 1, 0, 0, 0, 0, 0],
    "5": [0, 0, 0, 0, 0, 1, 0, 0, 0, 0],
    "6": [0, 0, 0, 0, 0, 0, 1, 0, 0, 0],
    "7": [0, 0, 0, 0, 0, 0, 0, 1, 0, 0],
    "8": [0, 0, 0, 0, 0, 0, 0, 0, 1, 0],
    "9": [0, 0, 0, 0, 0, 0, 0, 0, 0, 1]
}
digital_map = {int(key): value for key, value in digital_map.items()}
train = pd.read_csv(r'CNN_Minst/data/train.csv')
X_test = pd.read_csv(r'CNN_Minst/data/test.csv')
X = train.drop('label',axis=1)
y = train['label']
X = X / 255.0   #0-1缩放
X = X.values.reshape(-1, 1, 28, 28)

from sklearn.model_selection import train_test_split
X_train, X_val, y_train, y_val = train_test_split(X, y, test_size=0.1, random_state=42)

_,_,H,W = X_train.shape
learning_rate = 0.05
train_images = X_train[:30000]
# 1000,1,28,28
train_labels = y_train[:30000]
# list[1000[10]]
train_labels = [digital_map[num] for num in train_labels]
test_images = X_val[:1000]
#1000,1,28,28
# print(X_val.shape)
test_labels = y_val[:1000]
test_labels = test_labels.map(digital_map)
para = linear_initialize_parameters(28 * 28, 256, 64, 10)
################train
for epoch in range(3):
    print('--- Epoch %d ---' % (epoch + 1))

    permutation = np.random.permutation(len(train_images))
    train_images = train_images[permutation]
    train_labels = np.array(train_labels)[permutation]

    loss = 0
    num_correct = 0
    for i, (im, label) in enumerate(zip(train_images, train_labels)):
        label = label[:,np.newaxis]
        # print(im.shape,label.shape)  #(1,28,28) (10,)
        if i > 0 and i % 1000 == 999:
            print(
                '[Step %d] Past 1000 steps: Average Loss %.3f | Accuracy: %d%%' %
                (i + 1, loss / 1000, num_correct/10)
            )
            loss = 0
            num_correct = 0
    ###########全连接层
        im = im.reshape(28*28,1)
        #权重矩阵初始化
        #前向传播
        AL,chaches = L_model_forward(im,parameters=para)  #AL是softmax激活后的概率
        #计算准确率
        acc = 1 if np.argmax(AL) == np.argmax(label) else 0
        num_correct += acc
        #计算损失函数
        cost = compute_cost_multi(AL, label)
        loss += cost
        #反向传播获得grads集合
        grads = L_model_backward(AL,label,chaches)
        #根据集合更新权重矩阵
        parameters = update_parameters(para,grads,0.01)

#######predict
permutation = np.random.permutation(len(test_images))
test_images = test_images[permutation]
test_labels = np.array(test_labels)[permutation]
num_correct = 0
loss = 0
for i,(im,label) in enumerate(zip(test_images,test_labels)):
    label =np.array(label).reshape(1, -1)
    label = label[:,np.newaxis]
    im = im.reshape(28*28,1)
    AL,chaches = L_model_forward(im,parameters=para)
    acc = 1 if np.argmax(AL) == np.argmax(label) else 0
    num_correct += acc
    cost = compute_cost_multi(AL, label)
    loss += cost
print("testdata  accuary:",num_correct/10,"  |loss:",loss/1000)

  • 29
    点赞
  • 12
    收藏
    觉得还不错? 一键收藏
  • 打赏
    打赏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

渊兮旷兮

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值