0.python基础
https://www.codecademy.com/catalog
1.github
https://github.com/iamtrask/Grokking-Deep-Learning
2.AI
https://www.codecademy.com/catalog/subject/artificial-intelligence
————————————————
10. 1 应对过拟合的另一种方法
模型中权重的数量与学习这些权重的数据点的数量之比,和过拟合高度相关。因此,有一个更好的防止过拟合的方法——如有可能,最好使用松散定义的模型,或者说网络结构。
网络结构指的是,在神经网络中,因为我们相信能够在多个位置检测到相同的模式,所以可以有选择地重用针对多个目标的权重。正如所见,这可以显著地减少过拟合,并导致模型的精度更高,因为它降低了权重数量与数据量的比例。
尽管删除参数通常来说会降低模型的表达能力(或者说降低对模式的学习能力),但是如果能够巧妙地重用权重,那么模型的表现力可以是相同的。但对过拟合的鲁棒性会更强一些。
这种技术也趋于是模型更小(因为要存储的实际参数更少)。why?
10.2 卷积层
化整为零,将许多小的线性神经元层在各处重用。
每个小神经元层都被称为卷积核,但它实际上只是一个很小的线性层,接受少量的输入并作为单一的输出。
网络的训练过程允许每个卷积核学习特定的模式,然后在图像的某个地方寻找该模式的存在。一个简单小巧的权重集合可以学习更大的一组训练实例,因为即使数据集没有改变,每个小巧的卷积核也都在多组数据上进行了多次前向传播,从而改变了权重数量与训练这些权重的数据量的比例。这对网络产生了显著影响,极大地降低了神经网络对训练数据的过拟合现象,提高了网络的泛化能力。
10.3 基于Numpy 的简单实现
import numpy as np, sys
np.random.seed(1)
from keras.datasets import mnist
(x_train, y_train), (x_test, y_test) = mnist.load_data()
images, labels = (x_train[0:1000].reshape(1000,28*28) / 255,
y_train[0:1000])
one_hot_labels = np.zeros((len(labels),10))
for i,l in enumerate(labels):
one_hot_labels[i][l] = 1
labels = one_hot_labels
test_images = x_test.reshape(len(x_test),28*28) / 255
test_labels = np.zeros((len(y_test),10))
for i,l in enumerate(y_test):
test_labels[i][l] = 1
def tanh(x):
return np.tanh(x)
def tanh2deriv(output):
return 1 - (output ** 2)
def softmax(x):
temp = np.exp(x)
return temp / np.sum(temp, axis=1, keepdims=True)
alpha, iterations = (2, 300)
pixels_per_image, num_labels = (784, 10)
batch_size = 128
input_rows = 28
input_cols = 28
kernel_rows = 3
kernel_cols = 3
num_kernels = 16
hidden_size = ((input_rows - kernel_rows) *
(input_cols - kernel_cols)) * num_kernels
# weights_0_1 = 0.02*np.random.random((pixels_per_image,hidden_size))-0.01
kernels = 0.02*np.random.random((kernel_rows*kernel_cols,
num_kernels))-0.01
weights_1_2 = 0.2*np.random.random((hidden_size,
num_labels)) - 0.1
def get_image_section(layer,row_from, row_to, col_from, col_to):#在一批图像中选择子区域
section = layer[:,row_from:row_to,col_from:col_to]
return section.reshape(-1,1,row_to-row_from, col_to-col_from)
for j in range(iterations):
correct_cnt = 0
for i in range(int(len(images) / batch_size)):
batch_start, batch_end=((i * batch_size),((i+1)*batch_size))
layer_0 = images[batch_start:batch_end]
layer_0 = layer_0.reshape(layer_0.shape[0],28,28)#layer_0是一批大小为28×28的图像,for循环遍历了图像中每个(kernel_rows×kernel_col)子区域,J将他们放在一个名为sects(切片)的列表中,然后,将sects(列表)连接起来,并且形成一种特殊结构。
layer_0.shape
sects = list()
for row_start in range(layer_0.shape[1]-kernel_rows):#选择了一批图像的一小部分,多次调用
for col_start in range(layer_0.shape[2] - kernel_cols):
sect = get_image_section(layer_0,
row_start,
row_start+kernel_rows,
col_start,
col_start+kernel_cols)
sects.append(sect)
expanded_input = np.concatenate(sects,axis=1)
es = expanded_input.shape
flattened_input = expanded_input.reshape(es[0]*es[1],-1)
kernel_output = flattened_input.dot(kernels)
layer_1 = tanh(kernel_output.reshape(es[0],-1))
dropout_mask = np.random.randint(2,size=layer_1.shape)
layer_1 *= dropout_mask * 2
layer_2 = softmax(np.dot(layer_1,weights_1_2))
for k in range(batch_size):
labelset = labels[batch_start+k:batch_start+k+1]
_inc = int(np.argmax(layer_2[k:k+1]) ==
np.argmax(labelset))
correct_cnt += _inc
layer_2_delta = (labels[batch_start:batch_end]-layer_2)\
/ (batch_size * layer_2.shape[0])
layer_1_delta = layer_2_delta.dot(weights_1_2.T) * \
tanh2deriv(layer_1)
layer_1_delta *= dropout_mask
weights_1_2 += alpha * layer_1.T.dot(layer_2_delta)
l1d_reshape = layer_1_delta.reshape(kernel_output.shape)
k_update = flattened_input.T.dot(l1d_reshape)
kernels -= alpha * k_update
test_correct_cnt = 0
for i in range(len(test_images)):
layer_0 = test_images[i:i+1]
# layer_1 = tanh(np.dot(layer_0,weights_0_1))
layer_0 = layer_0.reshape(layer_0.shape[0],28,28)
layer_0.shape
sects = list()
for row_start in range(layer_0.shape[1]-kernel_rows):
for col_start in range(layer_0.shape[2] - kernel_cols):
sect = get_image_section(layer_0,
row_start,
row_start+kernel_rows,
col_start,
col_start+kernel_cols)
sects.append(sect)
expanded_input = np.concatenate(sects,axis=1)
es = expanded_input.shape
flattened_input = expanded_input.reshape(es[0]*es[1],-1)
kernel_output = flattened_input.dot(kernels)
layer_1 = tanh(kernel_output.reshape(es[0],-1))
layer_2 = np.dot(layer_1,weights_1_2)
test_correct_cnt += int(np.argmax(layer_2) ==
np.argmax(test_labels[i:i+1]))
if(j % 1 == 0):
sys.stdout.write("\n"+ \
"I:" + str(j) + \
" Test-Acc:"+str(test_correct_cnt/float(len(test_images)))+\
" Train-Acc:" + str(correct_cnt/float(len(images))))
假设每个子区域都可以被看作它自己的图像。因此,如果批处理大小为8个图像,并且每个图像有100个子区域,那么可以假设它的一批总数为800张的较小图像。通过线性层的一个输出神经元对它们进行正向传播这一过程,与每批图像在各个子区域上基于线性层进行预测是一样的。
复用权重是深度学习最重要的创新之一。
当神经网络需要在多处使用相同的想法时,应试着在这些地方使用相同的权重。这样做会使那些权重有更多的样本可以学习并提高泛化能力,从而让权重更智能。