CNN第一周Convolutional Model Step by Step

吴恩达卷积神经网络第一周代码作业

1. 0填充

Figure 1

代码:

# GRADED FUNCTION: zero_pad

def zero_pad(X, pad):
    """
    Pad with zeros all images of the dataset X. The padding is applied to the height and width of an image, 
    填充是应用于一个图像的高度和宽度
     as illustrated in Figure 1.
    
    Argument:
    X -- python numpy array of shape (m, n_H, n_W, n_C) representing a batch of m images
    X -- python numpy数组的形状(m, n_H, n_W, n_C)表示一批m个图像
    pad -- integer, amount of padding around each image on vertical and horizontal dimensions
    pad -- 整数,在垂直和水平维度上每个图像周围的填充量
    
    Returns:
    X_pad -- padded image of shape (m, n_H + 2 * pad, n_W + 2 * pad, n_C)
    X_pad -- 形状的填充图像(m, n_H + 2 * pad, n_W + 2 * pad, n_C)
    """
    
    #(≈ 1 line)
    # X_pad = None
    # YOUR CODE STARTS HERE
   --这里是填写的代码--
    # YOUR CODE ENDS HERE
   
    
    return X_pad
np.random.seed(1)
x = np.random.randn(4, 3, 3, 2)		#产生的x为4行3列的数组元素,每个元素是3行2列
x_pad = zero_pad(x, 3)			# 调用我们编写的zero_pad()函数进行零填充
print ("x.shape =\n", x.shape)		#打印x的shape值
print ("x_pad.shape =\n", x_pad.shape)		#打印x_pad的shape值
print ("x[1,1] =\n", x[1, 1])		#打印x第二行第二列的值,为3x2的矩阵
print ("x_pad[1,1] =\n", x_pad[1, 1])	#打印x_pad第二行第二列的值

#当x_pad为np.ndarray 执行下去
assert type(x_pad) == np.ndarray, "输出必须是一个np数组"
assert x_pad.shape == (4, 9, 9, 2), f"Wrong shape: {x_pad.shape} != (4, 9, 9, 2)"
print(x_pad[0, 0:2,:, 0])
assert np.allclose(x_pad[0, 0:2,:, 0], [[0, 0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0, 0]], 1e-15), "Rows are not padded with zeros"
assert np.allclose(x_pad[0, :, 7:9, 1].transpose(), [[0, 0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0, 0]], 1e-15), "Columns are not padded with zeros"
assert np.allclose(x_pad[:, 3:6, 3:6, :], x, 1e-15), "Internal values are different"

fig, axarr = plt.subplots(1, 2)
axarr[0].set_title('x')
axarr[0].imshow(x[0, :, :, 0])
axarr[1].set_title('x_pad')
axarr[1].imshow(x_pad[0, :, :, 0])
zero_pad_test(zero_pad)

分析:
代码段2:

  1. 首先是在下面的代码中先创建一个4x3的矩阵,矩阵中的元素是3x2的矩阵
  2. x_pad执行0填充,这里面是需要填写的内容,下面具体说明
  3. 打印出相关信息,并且使用assert去判断,后面用到了np.allclose()函数,该函数是比较两个array是不是每一元素都相等,默认在1e-05的误差范围内。
  4. 后面使用到了绘画函数去输出图像

代码段1:

  1. 这是0填充函数的具体内容
  2. 从代码2中知道传入的参数第一个是x也就是4x3的矩阵,第二个参数是3 给出提示内容就是指的padding
  3. 通过提示我们知道X的意义是4个3x3的图像且通道数为2
  4. 我们需要使用np.pad()函数去做0填充,其中的参数

pad(array, pad_width, mode, **kwargs)

参数解释:

array:也就是其中我们要做填充的图像矩阵数据
pad_width:表示每个轴(axis)边缘需要填充的数值数目。
mode:填充方式,一般是constant,后面是(0,0)的填充

对于pad_width这个参数大意上可以做一个例子

import numpy as np

x=[[1,1],
   [1,1]]
print(np.array(x))
x_d=np.pad(x,((2,1),(1,2)),'constant',constant_values=(2,-2))
print('----------')
print(x_d)

输出为:

[[1 1]
 [1 1]]
----------
[[ 2  2  2 -2 -2]
 [ 2  2  2 -2 -2]
 [ 2  1  1 -2 -2]
 [ 2  1  1 -2 -2]
 [ 2 -2 -2 -2 -2]]
  1. 对于题目中我们要对3x3的矩阵边界3圈(pad)的0值
 X_pad=np.pad(X,   # 输入的图像数据3x3x2,数量为4
 						((0,0),	#样本数,不需要做填充还是4
 						(pad,pad),  # 上面先填pad个,下面填pad个
 						(pad,pad),	# 左边填pad个,右边填pad个
 						(0,0)),		# 通道数无需做填充
 						'constant',	#mode填充方式,直接填(0,0)对应(pad,pad)中填的数
 						constant_values=(0,0))

2. Single Step of Convolution

代码段1

# GRADED FUNCTION: conv_single_step

def conv_single_step(a_slice_prev, W, b):
    """
    在输出激活的前一层的片段(a_slice_prev)上应用由参数W定义的过滤器。
    
    参数:
    a_slice_prev -- 输入数据的一个片段为(f,f,n_C_prev),其实就是输入
    W -- 权值参数维度为(f, f, n_C_prev)	,指的就是过滤器
    b -- 偏差参数维度为 (1, 1, 1)
    
    返回值:
    Z -- 输出的结果,因为输入和过滤器大小一样那么做完卷积后输出的是一个数
    """

    #(≈ 3 lines of code)
    # Element-wise product between a_slice_prev and W. Do not add the bias yet.
    # s = None
    # Sum over all entries of the volume s.
    # Z = None
    # Add bias b to Z. Cast b to a float() so that Z results in a scalar value.
    # Z = None
    # YOUR CODE STARTS HERE
    
    # YOUR CODE ENDS HERE

    return Z

代码段2:

np.random.seed(1)
a_slice_prev = np.random.randn(4, 4, 3)
W = np.random.randn(4, 4, 3)
b = np.random.randn(1, 1, 1)

Z = conv_single_step(a_slice_prev, W, b)
print("Z =", Z)
conv_single_step_test(conv_single_step)

assert (type(Z) == np.float64 or type(Z) == np.float32), "You must cast the output to float"
assert np.isclose(Z, -6.999089450680221), "Wrong value"

分析:

  1. 可以看出a_slice_prev 是输入图像它是一个4x4x3的图像,W也是4x4x3,b是1x1x1
  2. 在代码段1中要对4x4x3进行4x4x3过滤器的卷积,最后加上偏差b
  3. 利用numpy.multiply()函数可以对矩阵中的每个元素做乘法,对偏差值b要转换成float,对于做出的numpy.multiply()的矩阵对应的是每一个元素做乘法,最后我们需要把做完乘法后加在一起使用了numpy.sum()函数
    先看一下numpy.multiply()函数numpy.sum()函数出来的结果
import numpy as np

np.random.seed(1)
input = np.random.randn(2, 2, 2)
W = np.random.randn(2, 2, 2)
b = np.random.randn(1, 1, 1)
print('input:\n',input)
print('W:\n',W)
conv1= np.multiply(input,W)
print('conv1:\n',conv1)
conv2=np.sum(conv1)
print('conv2:\n',conv2)
output=conv2+float(b)
print('output:\n',output)

使用2x2x2的输入与2x2x2的过滤器卷积后加入偏差
结果:

input:
 [[[ 1.62434536 -0.61175641]
  [-0.52817175 -1.07296862]]

 [[ 0.86540763 -2.3015387 ]
  [ 1.74481176 -0.7612069 ]]]
W:
 [[[ 0.3190391  -0.24937038]
  [ 1.46210794 -2.06014071]]

 [[-0.3224172  -0.38405435]
  [ 1.13376944 -1.09989127]]]
conv1:
 [[[ 0.51822968  0.15255393]
  [-0.77224411  2.21046634]]

 [[-0.27902231  0.88391596]
  [ 1.97821426  0.83724482]]]
conv2:
 5.529358565096128
output:
 5.356930357545693

可以看出在multiply函数中只是把对应元素进行相乘操作,最后使用sun()函数让矩阵中的所有数字都加在一起成为一个数

由此这道题填写应该为:

s=np.multiply(a_slice_prev,W)
Z=np.sum(s)+ float(b)

3. conv_forward

np.random.seed(1)
A_prev = np.random.randn(2, 5, 7, 4)
W = np.random.randn(3, 3, 4, 8)
b = np.random.randn(1, 1, 1, 8)
hparameters = {"pad" : 1,
               "stride": 2}

Z, cache_conv = conv_forward(A_prev, W, b, hparameters)
print("Z's mean =\n", np.mean(Z))
print("Z[0,2,1] =\n", Z[0, 2, 1])
print("cache_conv[0][1][2][3] =\n", cache_conv[0][1][2][3])

conv_forward_test(conv_forward)

# GRADED FUNCTION: conv_forward

def conv_forward(A_prev, W, b, hparameters):
    """
    实现卷积函数的前向传播
    
    参数:
    A_prev -- 前一层的输出激活,numpy数组的形状(m, n_H_prev, n_W_prev, n_C_prev)
    W -- 权重参数, numpy array of shape (f, f, n_C_prev, n_C)
    b -- 偏差参数, numpy array of shape (1, 1, 1, n_C)
    hparameters -- 包含"stride" 和 "pad"的超参数字典
        
    返回值:
    Z -- 卷积输出,维度为 (m, n_H, n_W, n_C)
    cache -- 缓存了一些反向传播函数 conv_backward()的数据
    """
    
    # 获取上一层的输入信息(≈1 line)  
    # (m, n_H_prev, n_W_prev, n_C_prev) = None
    
    # 获取W的信息 (≈1 line)
    # (f, f, n_C_prev, n_C) = None
    
    # 获取 "hparameters"的信息 (≈2 lines)
    # stride = None
    # pad = None
    
    # 计算. 卷积后的图像的宽度高度,使用int()来向下取整
    # Hint: use int() to apply the 'floor' operation. (≈2 lines)
    # n_H = None
    # n_W = None
    
    # 用0初始化卷积输出Z. (≈1 line)
    # Z = None
    
    #创造 A_prev_pad通过填充 A_prev
    # A_prev_pad = None
    
    # for i in range(None):               # 循环这些样本
        # a_prev_pad = None               # 选择第i个训练示例的填充激活
        # for h in range(None):           # 在垂直方向上循环
            # 找到垂直方向的开始和结束位置(≈2 lines)
            # vert_start = None
            # vert_end = None
            
            # for w in range(None):       #水平方向循环
                # 查找当前“切片”的水平起始和结束 (≈2 lines)
                # horiz_start = None
                # horiz_end = None
                
                # for c in range(None):   # l循环遍历输出卷积核的通道数
                                        
                    # Use the corners to define the (3D) slice of a_prev_pad (See Hint above the cell). (≈1 line)
                    # a_slice_prev = None
                    
                    # Convolve the (3D) slice with the correct filter W and bias b, to get back one output neuron. (≈3 line)
                    # weights = None
                    # biases = None
                    # Z[i, h, w, c] = None
    # YOUR CODE STARTS HERE
    
    
    # YOUR CODE ENDS HERE
    
    # Save information in "cache" for the backprop
    cache = (A_prev, W, b, hparameters)
    
    return Z, cache

分析:

  1. 首先看代码段1,A_prev是输入图像,W是过滤器,b是偏差,hparameters里面有两个超参数,进入函数conv_forward()中将上面的参数都输入进去做前向传播。
  2. 看到提示# (m, n_H_prev, n_W_prev, n_C_prev) = None 也就是里面的数就是上一层也就是函数参数中第一个A_prev输入图像中的参数,用A_prev.shape可以得到输入A_prev的各项参数
  3. 下面的(f, f, n_C_prev, n_C) = None是W的各项参数,# stride = None pad = None对应hparameters字典中的参数都给提取出来。
  4. 下面计算n_H和n_W的值,并给出提示用int()向下取整,我第一次用numpy.floor()向下取整后面会有错误因为返回的是float。
  5. 下面初始化Z卷积输出的值为0使用到的函数是numpy.zeros(),使用此函数可以将矩阵中的所有数初始为0按照参数的大小,然后创建一个A_prev_pad是填充过的A_prev,对A_prev进行zero_pad()填充得到A_prev_pad它还是有4个参数(m,?,?,n_C_prev)
  6. 后面遍历这些示例那就先从m遍历,提示是从第i个输入选择,也就是A_prev_pad[i]中选择。然后从垂直方向遍历,垂直方向的大小应该为n_H,后面水平一样。其中vert_start和vert_end的值开始有提示:提示
    垂直方向开始肯定是0,后面遍历过程是根据步数变化,所以vert_start的值应该是h*stride,结束位置应该是开始位置加上过滤器f的大小位置
  7. 三个循环(n_H,n_w,n_C)后需要提取过滤器框住的位置上的数a_slice_prev上面给过提示,后面W,b对应每个通道数的值,最后使用一次卷积操作的函数conv_single_step()

因此这道题的答案应该是:

def conv_forward(A_prev, W, b, hparameters):
    # Retrieve dimensions from A_prev's shape (≈1 line)  
    # (m, n_H_prev, n_W_prev, n_C_prev) = None
    (m, n_H_prev, n_W_prev, n_C_prev)=A_prev.shape
    # Retrieve dimensions from W's shape (≈1 line)
    # (f, f, n_C_prev, n_C) = None
    (f, f, n_C_prev, n_C)=W.shape
    # Retrieve information from "hparameters" (≈2 lines)
    # stride = None
    # pad = None
    stride=hparameters['stride']
    pad = hparameters['pad']
    # Compute hparameters['stride']the dimensions of the CONV output volume using the formula given above. 
    # Hint: use int() to apply the 'floor' operation. (≈2 lines)
    # n_H = None
    # n_W = None
    n_H=int(((n_H_prev-f+2*pad)/stride))+1
    n_W=int(((n_W_prev-f+2*pad)/stride))+1
    # Initialize the output volume Z with zeros. (≈1 line)用0初始化卷积核中的数
    # Z = None
    Z = np.zeros((m, n_H, n_W, n_C))
    # Create A_prev_pad by padding A_prev 创建一个A_prev_pad是填充过的A_prev
    # A_prev_pad = None
    A_prev_pad=zero_pad(A_prev,pad)
    # YOUR CODE STARTS HERE
    for i in range(m):
        a_prev_pad= A_prev_pad[i]
        for h in range(n_H):
            vert_start = h * stride
            vert_end = vert_start + f
            
            for w in range(n_W): 
                horiz_start = w * stride
                horiz_end = horiz_start + f
            
                for c in range(n_C): 
                    a_slice_prev = a_prev_pad[vert_start:vert_end,horiz_start:horiz_end,:]
                
                    weights = W[:,:,:,c]
                    biases =  b[:,:,:,c]
                    Z[i, h, w, c] = conv_single_step(a_slice_prev,weights,biases)
    # YOUR CODE ENDS HERE
    
    # Save information in "cache" for the backprop
    cache = (A_prev, W, b, hparameters)
    
    return Z, cache

4. Pooling Layer

代码段1

# Case 1: stride of 1
np.random.seed(1)
A_prev = np.random.randn(2, 5, 5, 3)
hparameters = {"stride" : 1, "f": 3}

A, cache = pool_forward(A_prev, hparameters, mode = "max")
print("mode = max")
print("A.shape = " + str(A.shape))
print("A[1, 1] =\n", A[1, 1])
print()
A, cache = pool_forward(A_prev, hparameters, mode = "average")
print("mode = average")
print("A.shape = " + str(A.shape))
print("A[1, 1] =\n", A[1, 1])

pool_forward_test(pool_forward)

代码段2

def pool_forward(A_prev, hparameters, mode = "max"):
    """
    Implements the forward pass of the pooling layer
    
    Arguments:
    A_prev -- Input data, numpy array of shape (m, n_H_prev, n_W_prev, n_C_prev)
    hparameters -- python dictionary containing "f" and "stride"
    mode -- the pooling mode you would like to use, defined as a string ("max" or "average")
    
    Returns:
    A -- output of the pool layer, a numpy array of shape (m, n_H, n_W, n_C)
    cache -- cache used in the backward pass of the pooling layer, contains the input and hparameters 
    """
    
    # Retrieve dimensions from the input shape
    (m, n_H_prev, n_W_prev, n_C_prev) = A_prev.shape
    
    # Retrieve hyperparameters from "hparameters"
    f = hparameters["f"]
    stride = hparameters["stride"]
    
    # Define the dimensions of the output
    n_H = int(1 + (n_H_prev - f) / stride)
    n_W = int(1 + (n_W_prev - f) / stride)
    n_C = n_C_prev
    
    # Initialize output matrix A
    A = np.zeros((m, n_H, n_W, n_C))              
    
    # for i in range(None):                         # loop over the training examples
        # for h in range(None):                     # loop on the vertical axis of the output volume
            # Find the vertical start and end of the current "slice" (≈2 lines)
            # vert_start = None
            # vert_end = None
            
            # for w in range(None):                 # loop on the horizontal axis of the output volume
                # Find the vertical start and end of the current "slice" (≈2 lines)
                # horiz_start = None
                # horiz_end = None
                
                # for c in range (None):            # loop over the channels of the output volume
                    
                    # Use the corners to define the current slice on the ith training example of A_prev, channel c. (≈1 line)
                    # a_prev_slice = None
                    
                    # Compute the pooling operation on the slice. 
                    # Use an if statement to differentiate the modes. 
                    # Use np.max and np.mean.
                    # if mode == "max":
                        # A[i, h, w, c] = None
                    # elif mode == "average":
                        # A[i, h, w, c] = None
    
    # YOUR CODE STARTS HERE
    
    
    # YOUR CODE ENDS HERE
    
    # Store the input and hparameters in "cache" for pool_backward()
    cache = (A_prev, hparameters)
    
    # Making sure your output shape is correct
    #assert(A.shape == (m, n_H, n_W, n_C))
    
    return A, cache

分析:

  1. 输入是A_prev,hparameters是超参数有步长和f的大小进入池化层
  2. 在池化层中先获取基本的参数利用公式得到n_H和n_W,n_C保持不变,先定义一个空的输出矩阵,在m数量、nH、nW、nC上进行遍历,在每一个框中使用numpy.max()函数去得到最大池化使用numpy.mean()函数得到平均池化的结果

答案:

def pool_forward(A_prev, hparameters, mode = "max"):
    (m, n_H_prev, n_W_prev, n_C_prev) = A_prev.shape

    f = hparameters["f"]
    stride = hparameters["stride"]

    n_H = int(1 + (n_H_prev - f) / stride)
    n_W = int(1 + (n_W_prev - f) / stride)
    n_C = n_C_prev
    
    # Initialize output matrix A
    A = np.zeros((m, n_H, n_W, n_C))              

    for i in range(m):
        for h in range(n_H):
            for w in range(n_W):
                for c in range (n_C):
                    vert_start = h * stride
                    vert_end = vert_start + f
                    horiz_start = w * stride
                    horiz_end = horiz_start + f
                    a_prev_slice = A_prev[i,vert_start:vert_end,horiz_start:horiz_end,c]
                    if mode == "max":
                        A[i, h, w, c] = np.max(a_prev_slice)
                    elif mode == "average":
                        A[i, h, w, c] = np.mean(a_prev_slice)
    cache = (A_prev, hparameters)

    assert(A.shape == (m, n_H, n_W, n_C))
    
    return A, cache

5.Backpropagation in Convolutional Neural Networks (OPTIONAL / UNGRADED)

def conv_backward(dZ, cache):
    """
    Implement the backward propagation for a convolution function
    
    Arguments:
    dZ -- gradient of the cost with respect to the output of the conv layer (Z), numpy array of shape (m, n_H, n_W, n_C)
    cache -- cache of values needed for the conv_backward(), output of conv_forward()
    
    Returns:
    dA_prev -- gradient of the cost with respect to the input of the conv layer (A_prev),
               numpy array of shape (m, n_H_prev, n_W_prev, n_C_prev)
    dW -- gradient of the cost with respect to the weights of the conv layer (W)
          numpy array of shape (f, f, n_C_prev, n_C)
    db -- gradient of the cost with respect to the biases of the conv layer (b)
          numpy array of shape (1, 1, 1, n_C)
    """    
    
        
    # Retrieve information from "cache"
    (A_prev, W, b, hparameters) = cache
    # Retrieve dimensions from A_prev's shape
    (m, n_H_prev, n_W_prev, n_C_prev) = A_prev.shape
    # Retrieve dimensions from W's shape
    (f, f, n_C_prev, n_C) = W.shape
    
    # Retrieve information from "hparameters"
    stride = hparameters['stride']
    pad = hparameters['pad']
    
    # Retrieve dimensions from dZ's shape
    (m, n_H, n_W, n_C) = dZ.shape
    
    # Initialize dA_prev, dW, db with the correct shapes
    dA_prev = np.zeros((m, n_H_prev, n_W_prev, n_C_prev))                          
    dW = np.zeros((f, f, n_C_prev, n_C))
    db = np.zeros((1, 1, 1, n_C))
    
    # Pad A_prev and dA_prev
    A_prev_pad = zero_pad(A_prev, pad)
    dA_prev_pad = zero_pad(dA_prev, pad)
    
    for i in range(m):                       # loop over the training examples
        
        # select ith training example from A_prev_pad and dA_prev_pad
        a_prev_pad = A_prev_pad[i]
        da_prev_pad = dA_prev_pad[i]
        
        for h in range(n_H):                   # loop over vertical axis of the output volume
            for w in range(n_W):               # loop over horizontal axis of the output volume
                for c in range(n_C):           # loop over the channels of the output volume
                    
                    # Find the corners of the current "slice"
                     vert_start = h * stride
                     vert_end = vert_start + f
                     horiz_start = w * stride
                     horiz_end = horiz_start + f

                    # Use the corners to define the slice from a_prev_pad
                     a_slice = a_prev_pad[vert_start:vert_end,horiz_start:horiz_end,:]

                    # Update gradients for the window and the filter's parameters using the code formulas given above
                     da_prev_pad[vert_start:vert_end, horiz_start:horiz_end, :] += W[:,:,:,c] * dZ[i,h,w,c]
                     dW[:,:,:,c] += a_slice * dZ[i,h,w,c]
                     db[:,:,:,c] += dZ[i,h,w,c]
                    
        # Set the ith training example's dA_prev to the unpadded da_prev_pad (Hint: use X[pad:-pad, pad:-pad, :])
        dA_prev[i, :, :, :] = da_prev_pad[pad:-pad, pad:-pad, :]
    # YOUR CODE STARTS HERE
    
    
    # YOUR CODE ENDS HERE
    
    # Making sure your output shape is correct
    assert(dA_prev.shape == (m, n_H_prev, n_W_prev, n_C_prev))
    
    return dA_prev, dW, db

分析:

  1. 这道题作者给出了大量的提示,首先我们删掉作者给出的按照提示加上之后可以写完大部分的地方
  2. 做后面的地方按照上面给出的讲解进行填写

6.create_mask_from_window

提示

def create_mask_from_window(x):
 
    # (≈1 line)
    mask = (x == np.max(x))
    # YOUR CODE STARTS HERE
    
    
    # YOUR CODE ENDS HERE
    return mask

分析:

  1. 这道题的关键在于作者给出的提示,主要构建出一个mask函数,该函数的作用是标记出一个矩阵中最大的数,最大的是true,其他的都是false。并且该函数返回的值是参数x.shape的矩阵。
  2. 作者给出两个方法,首先是numpy.max()函数它能够计算出数组中的最大值,第二个方法是A = (X == x) ,由于对其不了解在pycharm中进行尝试
import numpy as np
#先创建一个3x3的矩阵
x=np.array([[1,2,3],
            [4,5,6],
            [7,8,9]])
print(x)
print('----------')
y=(x==5)
print(y)

结果

[[1 2 3]
 [4 5 6]
 [7 8 9]]
----------
[[False False False]
 [False  True False]
 [False False False]]

可以看出A = (X == x) 的大概意思是对于X矩阵中的数如果等于后面x的数那么就为true放在矩阵A中,不等于x的为false放在矩阵A中
3. 因此这道题想要mark矩阵用这种形式判断是否等于X矩阵中的最大值即可

7.distribute_value

提示

def distribute_value(dz, shape):
   
    # Retrieve dimensions from shape (≈1 line)
    (n_H, n_W) = shape
    
    # Compute the value to distribute on the matrix (≈1 line)
    average = dz/(n_H*n_W)
    
    # Create a matrix where every entry is the "average" value (≈1 line)
    a = np.ones(shape) * average
    
    return a

分析:

  1. 先看作者给的解释:在最大池化过程中每一个输出的值都来自于输入窗口的最大值,但是在平均池化层输出值取决于每一个输入的窗口中的值。所以对于平均池化操作我们要重新计算
  2. 代码中输入的参数是dz和shape,表示dz的值在最后将被平均分配在shape中的矩阵中。因此先是要算出shape矩阵的大小并且求出每个元素中应该是多少值,那么average的值就出来
  3. 第二步是创建一个大小为shape的矩阵并且里面的值都是average。在这一步我卡了半天原因是我以为numpy库中有一个函数可以直接把average作为参数再将shape作为参数构建矩阵,看到作者给的**numpy.ones()**的提示也不知道啥意思,最后才知道只要用该函数构建矩阵后乘以average就好了。

8.pool_backward

np.random.seed(1)
# 输入A_prev为5 5 3 2
A_prev = np.random.randn(5, 5, 3, 2)
# s=1  f=2
hparameters = {"stride" : 1, "f": 2}
# 池化层前向传播返回值是A输出矩阵也就是张量,cache是输入的矩阵A_prev和hparameters
A, cache = pool_forward(A_prev, hparameters)
print(A.shape)			#输出值为(5,4,2,2)
print(cache[0].shape)		#cache中第一个量也就是输入矩阵就是(5,5,3,2)
dA = np.random.randn(5, 4, 2, 2) #dA的结构为(5,4,2,2)和前向传播输出一致
# 反向传播输入参数dA(5,4,2,2),开始输入的矩阵和参数,mode为max
dA_prev1 = pool_backward(dA, cache, mode = "max")
print("mode = max")
print('mean of dA = ', np.mean(dA))
print('dA_prev1[1,1] = ', dA_prev1[1, 1])  
print()
dA_prev2 = pool_backward(dA, cache, mode = "average")
print("mode = average")
print('mean of dA = ', np.mean(dA))
print('dA_prev2[1,1] = ', dA_prev2[1, 1]) 

assert type(dA_prev1) == np.ndarray, "Wrong type"
assert dA_prev1.shape == (5, 5, 3, 2), f"Wrong shape {dA_prev1.shape} != (5, 5, 3, 2)"
assert np.allclose(dA_prev1[1, 1], [[0, 0], 
                                    [ 5.05844394, -1.68282702],
                                    [ 0, 0]]), "Wrong values for mode max"
assert np.allclose(dA_prev2[1, 1], [[0.08485462,  0.2787552], 
                                    [1.26461098, -0.25749373], 
                                    [1.17975636, -0.53624893]]), "Wrong values for mode average"
print("\033[92m All tests passed.")

代码段2:

def pool_backward(dA, cache, mode = "max"):
   #dA是与前向传播输出的结构一致,cache是前向传播输入矩阵和参数
    # Retrieve information from cache (≈1 line)
    (A_prev, hparameters) = cache	#A_prev也就是刚开始输入的图像(5,5,3,2)
    
    # Retrieve hyperparameters from "hparameters" (≈2 lines)
    stride = hparameters['stride']		#步长和池化过滤器大小
    f = hparameters['f']
    
    # Retrieve dimensions from A_prev's shape and dA's shape (≈2 lines)
    (m, n_H_prev, n_W_prev, n_C_prev) = A_prev.shape	#得到输入图像各项参数数量,高度,宽度,通道数
    (m, n_H, n_W, n_C) = dA.shape		#得到输出图像各项参数个数
    
    # Initialize dA_prev with zeros (≈1 line)
    #初始化一个大小为(5,5,3,2)的dA_prev里面元素都为0
    dA_prev = np.zeros((m, n_H_prev, n_W_prev, n_C_prev))
    
    for i in range(m): # loop over the training examples
        
        # select training example from A_prev (≈1 line)
        a_prev = A_prev[i]  #选择输入的第i+1个图像进行操作
        
        for h in range(n_H):                   # loop on the vertical axis
            for w in range(n_W):               # loop on the horizontal axis
                for c in range(n_C):           # loop over the channels (depth)
        
                    # Find the corners of the current "slice" (≈4 lines)
                    #得到一个窗口的位置
                    vert_start = h * stride
                    vert_end = vert_start + f
                    horiz_start = w * stride
                    horiz_end = horiz_start + f
                    
                    # Compute the backward propagation in both modes.
                    if mode == "max":
                        
                        # Use the corners and "c" to define the current slice from a_prev (≈1 line)
                        # 一个a_prev_slice表示的是输入图像的一个窗口的矩阵
                        a_prev_slice = a_prev[vert_start:vert_end,horiz_start:horiz_end,c]
                        
                        # Create the mask from a_prev_slice (≈1 line)
                        #通过create_mask_from_window()函数可以得到一个窗口中的最大值mask大小为a_prev_slice的大小
                        mask = create_mask_from_window(a_prev_slice)

                        # Set dA_prev to be dA_prev + (the mask multiplied by the correct entry of dA) (≈1 line)
                        #对dA_prev(大小为窗口大小)设置值为dA在这个位置的最大值
                        dA_prev[i, vert_start: vert_end, horiz_start: horiz_end, c] += np.multiply(mask,dA[i,h,w,c])
                        
                    elif mode == "average":
                        
                        # Get the value da from dA (≈1 line)
                        da = dA[i,h,w,c]
                        
                        # Define the shape of the filter as fxf (≈1 line)
                        shape = (f,f)

                        # Distribute it to get the correct slice of dA_prev. i.e. Add the distributed value of da. (≈1 line)
                        dA_prev[i, vert_start: vert_end, horiz_start: horiz_end, c] += distribute_value(da,shape)
    # YOUR CODE STARTS HERE
    
    
    # YOUR CODE ENDS HERE
    
    # Making sure your output shape is correct
    assert(dA_prev.shape == A_prev.shape)
    
    return dA_prev

分析:

  1. 在注释中有详细的说明,有一点不太明白就是在np.multiply(mask,dA[i,h,w,c])在pycharm里面进行验证
import numpy as np
a= np.array([[0,0],
             [0,4]])
# 将a设置为只有[1][1]位置为true的数组
a=np.array(a,dtype=bool)
print(a)
b=np.multiply(a,7)
print(b)

结果

[[False False]
 [False  True]]
[[0 0]
 [0 7]]

也就是说得到的最大值给了dA_prev,dA_prev的shape和刚输入图像的shape大小一致

  • 1
    点赞
  • 2
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
YOLOv3是一种目标检测算法,不是用于文字识别的。而CNN是一种深度学习模型,可以用于各种任务,包括文字识别。因此,下面提供的是基于CNN的文字识别代码。 首先,你需要准备一个包含训练数据和标签的数据集。在这个数据集中,每个样本都有一张图片和对应的文本标签。 接下来,我们使用Python和PyTorch框架编写CNN模型。以下是一个简单的CNN模型示例: ```python import torch.nn as nn class CNN(nn.Module): def __init__(self): super(CNN, self).__init__() # Convolutional layers self.conv1 = nn.Conv2d(1, 32, kernel_size=3, stride=1, padding=1) self.conv2 = nn.Conv2d(32, 64, kernel_size=3, stride=1, padding=1) self.conv3 = nn.Conv2d(64, 128, kernel_size=3, stride=1, padding=1) # Max pooling layer self.pool = nn.MaxPool2d(kernel_size=2, stride=2) # Fully connected layers self.fc1 = nn.Linear(128 * 7 * 7, 512) self.fc2 = nn.Linear(512, 10) # Dropout layer self.dropout = nn.Dropout(0.25) def forward(self, x): x = self.conv1(x) x = nn.ReLU()(x) x = self.pool(x) x = self.conv2(x) x = nn.ReLU()(x) x = self.pool(x) x = self.conv3(x) x = nn.ReLU()(x) x = self.pool(x) # Flatten the output x = x.view(-1, 128 * 7 * 7) x = self.fc1(x) x = nn.ReLU()(x) x = self.dropout(x) x = self.fc2(x) return x ``` 这个CNN模型包含三个卷积层、max pooling层、两个全连接层和一个dropout层。输入是28x28的灰度图像,输出是10个类别的概率分布。 接下来,我们使用PyTorch编写训练代码。以下是一个简单的训练代码示例: ```python import torch.optim as optim # Instantiate the CNN model model = CNN() # Define loss function and optimizer criterion = nn.CrossEntropyLoss() optimizer = optim.SGD(model.parameters(), lr=0.01, momentum=0.9) # Train the model for epoch in range(10): running_loss = 0.0 for i, data in enumerate(train_loader, 0): inputs, labels = data # Zero the parameter gradients optimizer.zero_grad() # Forward + backward + optimize outputs = model(inputs) loss = criterion(outputs, labels) loss.backward() optimizer.step() # Print statistics running_loss += loss.item() if i % 1000 == 999: print('[%d, %5d] loss: %.3f' % (epoch + 1, i + 1, running_loss / 1000)) running_loss = 0.0 ``` 在这个代码中,我们使用交叉熵损失函数和SGD优化器进行训练。我们将训练数据分成小批次,每次使用一个小批次进行模型训练。训练过程中,我们打印每1000个小批次的平均损失。 最后,我们可以使用训练好的模型进行预测。以下是一个简单的预测代码示例: ```python # Predict the class of an image def predict(image_path): image = Image.open(image_path).convert('L') transform = transforms.Compose([ transforms.Resize((28, 28)), transforms.ToTensor(), transforms.Normalize((0.5,), (0.5,)) ]) image = transform(image).unsqueeze(0) model.eval() with torch.no_grad(): output = model(image) _, predicted = torch.max(output.data, 1) return predicted.item() ``` 在这个代码中,我们将输入图像转换成28x28的灰度图像,并进行归一化。然后,我们将图像传递给训练好的模型,并使用torch.max函数找到输出中的最大值,即预测的类别。

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值