初学深度学习

狄洺

已于 2024-08-03 15:25:00 修改

阅读量624

点赞数 11

分类专栏：深度学习文章标签：深度学习人工智能

于 2024-08-01 14:57:58 首次发布

本文链接：https://blog.csdn.net/weixin_43952169/article/details/140846121

版权

深度学习专栏收录该内容

1 篇文章 0 订阅

订阅专栏

介绍

深度学习和机器学习的区别：

机器学习涵盖了深度学习
特征是通过机器学习手动给出的
另一方面，深度学习直接从数据中学习特征

首先我们导入包

所需要用的数据
https://github.com/dm24530/Self-learning-code/tree/master/DeepLearningTutorialForBeginners/input

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import warnings

warnings.filterwarnings('ignore')
from subprocess import check_output
# 展示input文件夹下的所有文件
print(check_output(["ls", "input"]).decode('utf8'))

数据集概述

我使用的是“手语数字数据集"，该数据集有2062个手语数字图像，数字是0～9，为了简单起见，我们仅使用符号0和1。在数据集中符号零索引在204～408之间，零符号数量205，符号1位于索引822～1027之间，符号1数量206个，因此我们将使用每个类（标签）中的205个样本。

x_1 = np.load('input/X.npy')
Y_1 = np.load('input/Y.npy')
img_size = 64

plt.subplot(1, 2, 1)
plt.imshow(x_1[260].reshape(img_size, img_size))
plt.axis('off')
plt.subplot(1, 2, 2)
plt.imshow(x_1[900].reshape(img_size, img_size))
plt.axis('off')

在这里插入图片描述
讲解一下图片中的代码

第一二行为加载文件夹中的数据
第三行设置图片大小
第五行为创建一个1行2列的字图，激活第一个子图
第六行为绘制x_l中的第261张图片，并设置大小
第七行为去除当前子图的坐标轴
第八行为激活第二个子图

处理数据集

为了创建图像数组，我连接符号0和符号1
然后我创建符号0图像创建标签为0，符号1图像创建标签为1

X = np.concatenate((x_l[204:409], x_l[822:1027] ), axis=0) # from 0 to 204 is zero sign and from 205 to 410 is one sign 
z = np.zeros(205)
o = np.ones(205)
Y = np.concatenate((z, o), axis=0).reshape(X.shape[0],1)
print("X shape: " , X.shape)
print("Y shape: " , Y.shape)

X的形状为(410, 64, 64)，410意味着我们有410张图片， 64表示的是图片大小
Y的形状为(410, 1)， 410意味着我们有410个标签（0和1）

分割数据集

测试集占15%，训练集占85%
random_state是在随机化时使用相同的种子，这意味着如果你在多个实验中使用相同的 random_state 值，你将得到相同的分割结果，即相同的训练集和测试集。这对于确保结果的可重复性非常重要，因为数据集的随机划分可能会导致模型的性能评估结果不同

from sklearn.model_selection import train_test_split
X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=0.15, random_state=42)
number_of_train = X_train.shape[0]
number_of_test = X_test.shape[0]

分割之后我们就知道训练集有348个，测试集62个

展平数据集

现在我们直到我们的输入数组X是三维的(410, 64, 64)，因此我们需将其展平，以便作为我们的输入
数组y已经是2维的因此不需要处理

X_train_flatten = X_train.reshape(number_of_train,X_train.shape[1]*X_train.shape[2])
X_test_flatten = X_test.reshape(number_of_test,X_test.shape[1]*X_test.shape[2])
print("X train flatten",X_train_flatten.shape)
print("X test flatten",X_test_flatten.shape)

展平后的X_train_flatten的shape为(348, 4096)
x_test_flatten的shape为(62, 4096)

转置

x_train = X_train_flatten.T
x_test = X_test_flatten.T
y_train = Y_train.T
y_test = Y_test.T

最终我们输入图像和输出（标签或类）如下所示：
在这里插入图片描述

逻辑回归 (Logistic Regression)

首先我们现在在做二元分类，我们就会想到逻辑回归，我们看一下计算图
在这里插入图片描述

参数是权重和偏差
权重：每个像素的系数
z = (w.T)x+b
z = b + px1 w1 + px2 w2 + … + px4096*w4096
z_head = sigmoid(z)

初始化参数

权重w设置为0.01，偏差b设置为0，最后返回w和b

def initialize_weights_and_bias(dimension):
    w = np.full((dimension,1),0.01)
    b = 0.0
    return w, b

sigmoid激活函数

def sigmoid(z):
    y_head = 1/(1+np.exp(-z))
    return y_head

前向传播

损失函数为为下边这个，我们将其代入到前向传播过程中，计算出损失值cost
在这里插入图片描述

def forward_propagation(w, b, x_train, y_train):
    z = np.dot(w.T, x_train) + b
    y_head = sigmoid(z)
#     print(y_head.shape, y_train.shape, )
    loss = -y_train*np.log(y_head) - (1-y_train)*np.log(1-y_head)
#     print(loss.shape)
    cost = (np.sum(loss))/x_train.shape[1]
    
    return cost

梯度下降优化

在这里插入图片描述

根据这两个图片可以将梯度优化算法加入到前向传播过程中

def forward_backward_propagation(w, b, x_train, y_train):
    
    z = np.dot(w.T, x_train) + b
    
    y_head = sigmoid(z)
    
    loss = -y_train*np.log(y_head)- (1-y_train)*np.log(1-y_head)
    
    cost = (np.sum(loss))/x_train.shape[1]
    
    derivative_weight = (np.dot(x_train, ((y_head-y_train).T)))/x_train.shape[1]
    derivative_bias = np.sum(y_head-y_train) / x_train.shape[1]
    
    gradients = {"derivative_weight":derivative_weight, 
                "derivative_bias":derivative_bias}
    return cost, gradients

这段代码跟上边forward_propagation函数相比，多了两个参数derivative_weight和derivative_bias，这两个参数代表的就是上图J对w求导和J对b求导，最终将求的的两个参数全部加入到gradients中，并返回cost和gradients

更新参数（权重和偏差）

# w:权重，b：偏置， x_train：训练数据, y_train：训练标签
# learning_rate：学习率，number_of_iterarion：迭代次数
def update(w, b, x_train, y_train, learning_rate, number_of_iterarion):
    # 用于存储每次迭代的cost
    cost_list = []
    # 用于存储每10次迭代的cost， 后续将用来画图
    cost_list2 = []
    # 记录哪些迭代的cost被存储，以便绘图
    index = []
    # 开始一个循环，迭代number_of_iterarion次
    for i in range(number_of_iterarion):
    	#调用forward_backward_propagation函数，计算当前的cost和gradients，用于更新
        cost, gradients = forward_backward_propagation(w, b, x_train, y_train)
        # 将当前迭代的cost添加到cost_list中
        cost_list.append(cost)
        # 根基计算出的梯度更新权重和偏置
        w = w - learning_rate * gradients["derivative_weight"]
        b = b - learning_rate * gradients["derivative_bias"]
        # 没经过10次迭代
        if i % 10 == 0:
        	# 将当前的cost存储到cost_list2中
            cost_list2.append(cost)
            # 将当前迭代次数存储到index中，以便后续绘图
            index.append(i)
            print("cost after iteration %i: %f"%(i, cost))
    # 将更新后的权重和偏置打包成一个字典形式parameters
    parameters = {"weight":w, "bias":b}
    # 绘制cost_list2随迭代次数变化的图表
    plt.plot(index, cost_list2)
    # 设置x轴刻度为index， 并将刻度标签旋转为垂直方向以便更好的显示
    plt.xticks(index, rotation='vertical')
    plt.xlabel("Number of Iterarion")
    plt.ylabel("cost")
    plt.show()
    return parameters, gradients, cost_list

前向预测

def predict(w, b, x_test):
    # z = wx+b
    # z = sigmoid(z)
    z = sigmoid(np.dot(w.T, x_test) + b)
    # 初始化一个用于存储预测结果的数组，初始值为0
    Y_prediction = np.zeros((1, x_test.shape[1]))
    # z.shape[1]表示样本数量，这个循环会逐个处理每个样本的预测值
    for i in range(z.shape[1]):
    	# 检查样本i的预测概率z[0,i]是否小于0.5
    	# 若小于等于0.5则标记为0
    	# 若大于0.5标记为1
        if z[0, i] <= 0.5:
            Y_prediction[0, i] = 0
        else:
            Y_prediction[0, i] = 1
            
    return Y_prediction

逻辑回归

def logistic_regression(x_train, y_train, x_test, y_test, learning_rate, num_iterations):
    # 获取特征维度
    dimension = x_train.shape[0]
    # 初始化权重和偏置
    w, b = initialize_weights_and_bias(dimension)
    # 更新参数
    parameters, gradients, cost_list = update(w, b, x_train, y_train, learning_rate, num_iterations)
    # # 使用更新后的权重和偏置来预测测试数据x_test的标签
    y_prediction_test = predict(parameters["weight"], parameters["bias"], x_test)
    # 来预测训练数据x_train的标签
    y_prediction_train = predict(parameters["weight"], parameters["bias"], x_train)
    # 计算预测值与真实值之间的绝对差
    print("train accuracy:{}%".format(100 - np.mean(np.abs(y_prediction_train - y_train)) * 100))
    print("test accuracy:{}%".format(100 - np.mean(np.abs(y_prediction_test - y_test)) * 100))

整体运行

logistic_regression(x_train, y_train ,x_test, y_test, learning_rate=0.01, num_iterations=150)

运行结果图：
在这里插入图片描述

使用Sklearn进行逻辑回归

from sklearn import linear_model
# logreg是LogisticRegression类的实例，设置随机种子参数为42， 最大迭代次数为150
logreg = linear_model.LogisticRegression(random_state = 42,max_iter= 150)
# 训练模型
# .score方法用于评估模型的性能，他返回模型在给定测试集x_test, 和标签y_test上的准确率
# 准确率是正确预测样本占总样本的比例
print("test accuracy: {} ".format(logreg.fit(x_train.T, y_train.T).score(x_test.T, y_test.T)))
print("train accuracy: {} ".format(logreg.fit(x_train.T, y_train.T).score(x_train.T, y_train.T)))

人工神经网络(ANN)

什么是神经网络：它基本采用逻辑回归并重复至少2次

在逻辑回归中，有输入层和输出层，在神经网络中，输入层和输出层之间至少有一个隐藏层

接下来将根据这个图片进行神经网络学习
在这里插入图片描述
这是一个两层的神经网络，为什么看着三列而说是两层神经网络，因为计算层数是输入层会被忽略，在这个神经网络中：输入=>隐藏层=>输出，可以认为隐藏层是第一部分的输出和第二部分的输入

2层神经网络

步骤：

层的大小和初始化参数权重和偏差
前向传播
损失函数（Loss function)和成本函数(cost function)
反向传播
更新参数
使用学习参数权重和偏差进行预测
创建模型

层的大小和初始化权重和偏差

接下来我们将进行以下步骤，初始化权重为0.01和偏差0
在这里插入图片描述

def initialize_parameters_and_layer_sizes_NN(x_train, y_train):
    parameters = {"weight1":np.random.randn(3, x_train.shape[0])*0.1,
                 "bias1":np.zeros((3,1)),
                 "weight2":np.random.randn(y_train.shape[0], 3) * 0.1,
                 "bias2":np.zeros((y_train.shape[0], 1))}
    
    return parameters

前向传播

def forward_propagation_NN(x_train, parameters):
    Z1 = np.dot(parameters["weight1"], x_train) + parameters["bias1"]
    A1 = np.tanh(Z1)
    Z2 = np.dot(parameters["weight2"], A1) + parameters["bias2"]
    A2 = sigmoid(Z2)
    
    cache = {"Z1":Z1, 
            "A1":A1,
            "Z2":Z2,
            "A2":A2}
    
    return A2, cache

Loss function and Cost function

def compute_cost_NN(A2, Y, parameters):
    logprobs = np.multiply(np.log(A2), Y)
    cost = -np.sum(logprobs) / Y.shape[1]
    
    return cost

反向传播

# Backward Propagation
def backward_propagation_NN(parameters, cache, X, Y):
 
    dZ2 = cache["A2"]-Y
    # dA2 / dw2 = (dA2 / dZ2) * (dZ2 / dw2)
    # Z2 = w2*A1 + b2
    # dZ2 / dw2 = A1
    # dw2 = dZ2 * A1
    dW2 = np.dot(dZ2,cache["A1"].T)/X.shape[1]
    db2 = np.sum(dZ2,axis =1,keepdims=True)/X.shape[1]
    # dZ1 = (dA2 / dZ2) * (dZ2 / dA1) * (dA1 / dZ1)
    dZ1 = np.dot(parameters["weight2"].T,dZ2)*(1 - np.power(cache["A1"], 2))
    # dW1 = dZ1 * x
    dW1 = np.dot(dZ1,X.T)/X.shape[1]
    db1 = np.sum(dZ1,axis =1,keepdims=True)/X.shape[1]
    grads = {"dweight1": dW1,
             "dbias1": db1,
             "dweight2": dW2,
             "dbias2": db2}
    return grads

更新参数

更新参数流程为：
例如weight1更新参数为weight1 - 学习率(learning_rate) * weight1在这个位置的的梯度

# update parameters
def update_parameters_NN(parameters, grads, learning_rate = 0.01):
    parameters = {"weight1": parameters["weight1"]-learning_rate*grads["dweight1"],
                  "bias1": parameters["bias1"]-learning_rate*grads["dbias1"],
                  "weight2": parameters["weight2"]-learning_rate*grads["dweight2"],
                  "bias2": parameters["bias2"]-learning_rate*grads["dbias2"]}
    
    return parameters

使用学习参数权重和偏差进行预测

先获取前向传播的数据，A2为前向传播最后算得的结果，cache中是每一步算得的结果
然后初始化一个数组Y_prediction用于存储预测结果
其余步骤与上边预测步骤一样

def predict_NN(parameters, x_test):
    A2, cache = forward_propagation_NN(x_test, parameters)
    Y_prediction = np.zeros((1, x_test.shape[1]))
    
    for i in range(A2.shape[1]):
        if A2[0,i] <=0.5:
            Y_prediction[0,i] = 0
        else:
            Y_prediction[0, i] = 1
            
    return Y_prediction

创建模型

def two_layer_neural_network(x_train, y_train, x_test, y_test, num_iterations):
    
    cost_list = []
    index_list = []
    # initialize
    parameters = initialize_parameters_and_layer_sizes_NN(x_train, y_train)
    
    for i in range(0, num_iterations):
        # forward propagation
        A2, cache = forward_propagation_NN(x_train, parameters)
		# compute cost
        cost = compute_cost_NN(A2, y_train, parameters)
        # backward propagation
        grads = backward_propagation_NN(parameters, cache, x_train, y_train)
        # update parameters
        parameters = update_parameters_NN(parameters, grads)
        
        if i % 100 == 0:
            cost_list.append(cost)
            index_list.append(i)
            print("Cost after iteration %i: %f" % (i, cost))
            
    plt.plot(index_list, cost_list)
    plt.xlabel(index_list, rotation='vertical')
    plt.xlabel("Number of Iterarion")
    plt.ylabel("Cost")
    plt.show()
        
    # predict
    y_prediction_test = predict_NN(parameters,x_test)
    y_prediction_train = predict_NN(parameters,x_train)

    # Print train/test Errors
    print("train accuracy: {} %".format(100 - np.mean(np.abs(y_prediction_train - y_train)) * 100))
    print("test accuracy: {} %".format(100 - np.mean(np.abs(y_prediction_test - y_test)) * 100))
    return parameters
parameters = two_layer_neural_network(x_train, y_train,x_test,y_test, num_iterations=2500)

在这里插入图片描述

L层神经网络

使用keras库实现

前两个库大家可以试试哪个能用，我的第二个库不能用，所以用了第一个引用方法

from scikeras.wrappers import KerasClassifier
# from keras.wrappers.scikit_learn import KerasClassifier
from sklearn.model_selection import cross_val_score
from keras.models import Sequential # initialize neural network library
from keras.layers import Dense # build our layers library

搭建网络

def build_classifier():
	# 这是keras中创建神经网络的常用方法。顺序模型允许我们按层添加神经网络的层
    classifier = Sequential() 
    # 添加第一层，全连接层Dense，这一层有8个单元节点，使用均匀分布初始化权重，激活函数为relu
    # 输入的维度为训练数据的特征数
    classifier.add(Dense(units = 8, kernel_initializer = 'uniform', activation = 'relu', input_dim = x_train.shape[1]))
    # 第二层有四个节点，不需要指定输入，keras会自动处理
    classifier.add(Dense(units = 4, kernel_initializer = 'uniform', activation = 'relu'))
    # 因为我们进行的是二分类任务，所以我们将输出压缩到0～1之间
    classifier.add(Dense(units = 1, kernel_initializer = 'uniform', activation = 'sigmoid'))
    # 设置优化器为Adam， 损失函数为二元交叉熵，这个是二分类问题中常用的损失函数
    # 指定了指标为准确率
    classifier.compile(optimizer = 'adam', loss = 'binary_crossentropy', metrics = ['accuracy'])
    return classifier
# 将Keras模型封装成一个scikit-learn兼容的分类器
classifier = KerasClassifier(build_fn = build_classifier, epochs = 100)
# 交叉验证评估分类器表现， estimator参数是之前创建的模型
# x,y分别是训练数据和标签
# cv=3表示使用3折交叉验证，即将数据划分为3份进行训练和验证，每次用其中一份进行验证，其他两份进行训练
accuracies = cross_val_score(estimator = classifier, X = x_train, y = y_train, cv = 3)
# 计算交叉验证过程中得到的准确率的平均值
mean = accuracies.mean()
# 计算准确率的标准差，用于评估模型表现的稳定性
variance = accuracies.std()
print("Accuracy mean: "+ str(mean))
print("Accuracy variance: "+ str(variance))