自然语言处理自学笔记-04 用卷积神经网络进行句子分类
卷积神经网络
CNN基础
CNN网络由卷积层、池化层和全连接层构成。
卷积层使用卷积核在输入上滑动进行卷积运算,这些权重会共享。可以选择让卷积层与池化层/降采样层互相连接在一起,以降低输入的维度。
卷积操作
对于大小为
n
∗
n
n*n
n∗n的输入和
m
∗
m
m*m
m∗m的卷积核,其中
m
≤
n
m\le n
m≤n。输入用
X
X
X表示,权重用
W
W
W表示,输出用
H
H
H表示。在每个位置
i
,
j
i,j
i,j,输出计算过程为:
h
i
,
j
=
∑
k
=
1
m
∑
l
=
1
m
w
k
,
l
x
i
+
k
−
1
,
j
+
l
−
1
w
h
e
r
e
≤
i
,
j
≤
n
−
m
+
1
{{h}_{i,j}}=\sum\limits_{k=1}^{m}{\sum\limits_{l=1}^{m}{_{{{w}_{k}}_{,l}{{x}_{i+k-1,j+l-1}}}where\le i,j\le n-m+1}}
hi,j=k=1∑ml=1∑mwk,lxi+k−1,j+l−1where≤i,j≤n−m+1
经过卷积之后,输出的维度会变小,为规避这一问题,可以在输入的边界填充0来实现,可以使得输出和输入大小相等。
池化操作(降采样操作)
引入池化操作是为了减少中间件输出的大小,并使CNN具有平移不变性。常用的池化操作有最大值池化和均值池化。
最大值池化操作在定义的输入卷积核中选择最大元素作为输出。在窗口移动过程中,每次去窗口内的最大值。其数学定义为:
h
i
,
j
=
m
a
x
(
x
i
,
j
,
x
i
,
j
+
1
,
.
.
.
,
x
i
,
j
+
m
−
1
,
x
i
+
1
,
j
,
.
.
.
,
x
i
+
1
,
j
+
m
−
1
,
.
.
.
,
x
i
+
m
−
1
,
j
,
.
.
.
,
x
i
+
m
−
1
,
j
+
m
−
1
)
w
h
e
r
e
1
≤
i
,
j
≤
n
−
m
+
1
h_{i,j}=max({x_{i,j},x_{i,j+1},...,x_{i,j+m-1},x_{i+1,j},...,x_{i+1,j+m-1},...,x_{i+m-1,j},...,x_{i+m-1,j+m-1}})where1\le i,j\le n-m+1
hi,j=max(xi,j,xi,j+1,...,xi,j+m−1,xi+1,j,...,xi+1,j+m−1,...,xi+m−1,j,...,xi+m−1,j+m−1)where1≤i,j≤n−m+1
均值池化操作的工作方式类似于最大值池化,但不是取最大值,而是取窗口中所有输入的平均值。
全连接层
全连接层是从输入到输出的完全连接的权重的集合。全连接层允许将卷积层所学习的特征进行全局组合,以产生有意义的输出。
在MNIST数据集上用CNN进行图片分类
MNIST数据集是从0到9的手写数字标记图像数据集,包含训练集、验证机和测试集。
import tensorflow as tf
from matplotlib import pylab
from tensorflow.examples.tutorials.mnist import input_data
import numpy as np
# Required for Data downaload and preparation
import struct
import gzip
import os
from six.moves.urllib.request import urlretrieve
batch_size = 100 # This is the typical batch size we've been using
image_size = 28 # This is the width/height of a single image
# Number of color channels in an image. These are black and white images
n_channels = 1
# Number of different digits we have images for (i.e. classes)
n_classes = 10
n_train = 55000 # Train dataset size
n_valid = 5000 # Validation dataset size
n_test = 10000 # Test dataset size
# Layers in the CNN in the order from input to output
cnn_layer_ids = ['conv1','pool1','conv2','pool2','fulcon1','softmax']
# Hyperparameters of each layer (e.g. filter size of each convolution layer)
layer_hyperparameters = {'conv1':{'weight_shape':[3,3,n_channels,16],'stride':[1,1,1,1],'padding':'SAME'},
'pool1':{'kernel_shape':[1,3,3,1],'stride':[1,2,2,1],'padding':'SAME'},
'conv2':{'weight_shape':[3,3,16,32],'stride':[1,1,1,1],'padding':'SAME'},
'pool2':{'kernel_shape':[1,3,3,1],'stride':[1,2,2,1],'padding':'SAME'},
'fulcon1':{'weight_shape':[7*7*32,128]},
'softmax':{'weight_shape':[128,n_classes]}
}
# Inputs (Images) and Outputs (Labels) Placeholders
tf_inputs = tf.placeholder(shape=[batch_size, image_size, image_size, n_channels],dtype=tf.float32,name='tf_mnist_images')
tf_labels = tf.placeholder(shape=[batch_size, n_classes],dtype=tf.float32,name='tf_mnist_labels')
# Global step for decaying the learning rate
global_step = tf.Variable(0, trainable=False)
# Initializing the variables
layer_weights = {}
layer_biases = {}
for layer_id in cnn_layer_ids:
if 'pool' not in layer_id:
layer_weights[layer_id] = tf.Variable(
initial_value=tf.random_normal(shape=layer_hyperparameters[layer_id]['weight_shape'],
stddev=0.02, dtype=tf.float32), name=layer_id + '_weights')
layer_biases[layer_id] = tf.Variable(
initial_value=tf.random_normal(shape=[layer_hyperparameters[layer_id]['weight_shape'][-1]],
stddev=0.01, dtype=tf.float32), name=layer_id + '_bias')
print('Variables initialized')
# Calculating Logits
h = tf_inputs
for layer_id in cnn_layer_ids:
if 'conv' in layer_id:
# For each convolution layer, compute the output by using conv2d function
# This operation results in a [batch_size, output_height, output_width, out_channels]
# sized 4 dimensional tensor
h = tf.nn.conv2d(h,layer_weights[layer_id],layer_hyperparameters[layer_id]['stride'],
layer_hyperparameters[layer_id]['padding']) + layer_biases[layer_id]
h = tf.nn.relu(h)
elif 'pool' in layer_id:
# For each pooling layer, compute the output by max pooling
# This operation results in a [batch_size, output_height, output_width, out_channels]
# sized 4 dimensional tensor
h = tf.nn.max_pool(h, layer_hyperparameters[layer_id]['kernel_shape'],layer_hyperparameters[layer_id]['stride'],
layer_hyperparameters[layer_id]['padding'])
elif layer_id == 'fulcon1':
# At the first fulcon layer we need to reshape the 4 dimensional output to a
# 2 dimensional output to be processed by fully connected layers
# Note this should only done once, before
# computing the output of the first fulcon layer
h = tf.reshape(h,[batch_size,-1])
h = tf.matmul(h,layer_weights[layer_id]) + layer_biases[layer_id]
h = tf.nn.relu(h)
elif layer_id == 'softmax':
# Note that here we do not perform the same reshaping we did for fulcon1
# We only perform the matrix multiplication on previous output
h = tf.matmul(h,layer_weights[layer_id]) + layer_biases[layer_id]
print('Calculated logits')
tf_logits = h
# Calculating the softmax cross entropy loss with the computed logits and true labels (one hot encoded)
tf_loss = tf.nn.softmax_cross_entropy_with_logits_v2(logits=tf_logits,labels=tf_labels)
print('Loss defined')
# Optimization
# Here we define the function to decay the learning rate exponentially.
# Everytime the global step increases the learning rate decreases
tf_learning_rate = tf.train.exponential_decay(learning_rate=0.001,global_step=global_step,decay_rate=0.5,decay_steps=1,staircase=True)
tf_loss_minimize = tf.train.RMSPropOptimizer(learning_rate=tf_learning_rate, momentum=0.9).minimize(tf_loss)
print('Loss minimization defined')
tf_predictions = tf.nn.softmax(tf_logits)
print('Prediction defined')
tf_tic_toc = tf.assign(global_step, global_step + 1)
def accuracy(predictions, labels):
'''
Accuracy of a given set of predictions of size (N x n_classes) and
labels of size (N x n_classes)
'''
return np.sum(np.argmax(predictions, axis=1) == np.argmax(labels, axis=1)) * 100.0 / labels.shape[0]
def maybe_download(url, filename, expected_bytes, force=False):
"""Download a file if not present, and make sure it's the right size."""
if force or not os.path.exists(filename):
print('Attempting to download:', filename)
filename, _ = urlretrieve(url + filename, filename)
print('\nDownload Complete!')
statinfo = os.stat(filename)
if statinfo.st_size == expected_bytes:
print('Found and verified', filename)
else:
raise Exception(
'Failed to verify ' + filename + '. Can you get to it with a browser?')
return filename
def read_mnist(fname_img, fname_lbl, one_hot=False):
print('\nReading files %s and %s' % (fname_img, fname_lbl))
# Processing images
with gzip.open(fname_img) as fimg:
magic, num, rows, cols = struct.unpack(">IIII", fimg.read(16))
print(num, rows, cols)
img = (np.frombuffer(fimg.read(num * rows * cols), dtype=np.uint8).reshape(num, rows, cols, 1)).astype(
np.float32)
print('(Images) Returned a tensor of shape ', img.shape)
# img = (img - np.mean(img)) /np.std(img)
img *= 1.0 / 255.0
# Processing labels
with gzip.open(fname_lbl) as flbl:
# flbl.read(8) reads upto 8 bytes
magic, num = struct.unpack(">II", flbl.read(8))
lbl = np.frombuffer(flbl.read(num), dtype=np.int8)
if one_hot:
one_hot_lbl = np.zeros(shape=(num, 10), dtype=np.float32)
one_hot_lbl[np.arange(num), lbl] = 1.0
print('(Labels) Returned a tensor of shape: %s' % lbl.shape)
print('Sample labels: ', lbl[:10])
if not one_hot:
return img, lbl
else:
return img, one_hot_lbl
# Download data if needed
url = 'http://yann.lecun.com/exdb/mnist/'
# training data
maybe_download(url, 'train-images-idx3-ubyte.gz', 9912422)
maybe_download(url, 'train-labels-idx1-ubyte.gz', 28881)
# testing data
maybe_download(url, 't10k-images-idx3-ubyte.gz', 1648877)
maybe_download(url, 't10k-labels-idx1-ubyte.gz', 4542)
# Read the training and testing data
train_inputs, train_labels = read_mnist('train-images-idx3-ubyte.gz', 'train-labels-idx1-ubyte.gz', True)
test_inputs, test_labels = read_mnist('t10k-images-idx3-ubyte.gz', 't10k-labels-idx1-ubyte.gz', True)
valid_inputs, valid_labels = train_inputs[-n_valid:, :, :, :], train_labels[-n_valid:, :]
train_inputs, train_labels = train_inputs[:-n_valid, :, :, :], train_labels[:-n_valid, :]
print('\nTrain size: ', train_inputs.shape[0])
print('\nValid size: ', valid_inputs.shape[0])
print('\nTest size: ', test_inputs.shape[0])
train_index, valid_index, test_index = 0,0,0
def get_train_batch(images, labels, batch_size):
global train_index
batch = images[train_index:train_index+batch_size,:,:,:], labels[train_index:train_index+batch_size,:]
train_index = (train_index + batch_size)%(images.shape[0] - batch_size)
return batch
def get_valid_batch(images, labels, batch_size):
global valid_index
batch = images[valid_index:valid_index+batch_size,:,:,:], labels[valid_index:valid_index+batch_size,:]
valid_index = (valid_index + batch_size)%(images.shape[0] - batch_size)
return batch
def get_test_batch(images, labels, batch_size):
global test_index
batch = images[test_index:test_index+batch_size,:,:,:], labels[test_index:test_index+batch_size,:]
test_index = (test_index + batch_size)%(images.shape[0] - batch_size)
return batch
# Makes sure we only collect 10 samples for each
correct_fill_index, incorrect_fill_index = 0, 0
# Visualization purposes
correctly_predicted = np.empty(shape=(10, 28, 28, 1), dtype=np.float32)
correct_predictions = np.empty(shape=(10, n_classes), dtype=np.float32)
incorrectly_predicted = np.empty(shape=(10, 28, 28, 1), dtype=np.float32)
incorrect_predictions = np.empty(shape=(10, n_classes), dtype=np.float32)
def collect_samples(test_batch_predictions, test_images, test_labels):
global correctly_predicted, correct_predictions
global incorrectly_predicted, incorrect_predictions
global correct_fill_index, incorrect_fill_index
correct_indices = np.where(np.argmax(test_batch_predictions, axis=1) == np.argmax(test_labels, axis=1))[0]
incorrect_indices = np.where(np.argmax(test_batch_predictions, axis=1) != np.argmax(test_labels, axis=1))[0]
if correct_indices.size > 0 and correct_fill_index < 10:
print('\nCollecting Correctly Predicted Samples')
chosen_index = np.random.choice(correct_indices)
correctly_predicted[correct_fill_index, :, :, :] = test_images[chosen_index, :].reshape(1, image_size,
image_size, n_channels)
correct_predictions[correct_fill_index, :] = test_batch_predictions[chosen_index, :]
correct_fill_index += 1
if incorrect_indices.size > 0 and incorrect_fill_index < 10:
print('Collecting InCorrectly Predicted Samples')
chosen_index = np.random.choice(incorrect_indices)
incorrectly_predicted[incorrect_fill_index, :, :, :] = test_images[chosen_index, :].reshape(1, image_size,
image_size,
n_channels)
incorrect_predictions[incorrect_fill_index, :] = test_batch_predictions[chosen_index, :]
incorrect_fill_index += 1
# Parameters related to learning rate decay
# counts how many times the validation accuracy has not increased consecutively for
v_acc_not_increased_for = 0
# if the above count is above this value, decrease the learning rate
v_acc_threshold = 3
# currently recorded best validation accuracy
max_v_acc = 0.0
config = tf.ConfigProto(allow_soft_placement=True)
# Good practice to use this to avoid any surprising errors thrown by TensorFlow
config.gpu_options.allow_growth = True
config.gpu_options.per_process_gpu_memory_fraction = 0.9 # Making sure Tensorflow doesn't overflow the GPU
n_epochs = 25 # Number of epochs the training runs for
session = tf.InteractiveSession(config=config)
# Initialize all variables
tf.global_variables_initializer().run()
# Run training loop
for epoch in range(n_epochs):
loss_per_epoch = []
# Training phase. We train with all training data
# processing one batch at a time
for i in range(n_train // batch_size):
# Get the next batch of MNIST dataset
batch = get_train_batch(train_inputs, train_labels, batch_size)
# Run TensorFlow opeartions
l, _ = session.run([tf_loss, tf_loss_minimize],
feed_dict={tf_inputs: batch[0].reshape(batch_size, image_size, image_size, n_channels),
tf_labels: batch[1]})
# Add the loss value to a list
loss_per_epoch.append(l)
print('Average loss in epoch %d: %.5f' % (epoch, np.mean(loss_per_epoch)))
# Validation phase. We compute validation accuracy
# processing one batch at a time
valid_accuracy_per_epoch = []
for i in range(n_valid // batch_size):
# Get the next validation data batch
vbatch_images, vbatch_labels = get_valid_batch(valid_inputs, valid_labels, batch_size)
# Compute validation predictions
valid_batch_predictions = session.run(
tf_predictions, feed_dict={tf_inputs: vbatch_images}
)
# Compute and add the validation accuracy to a python list
valid_accuracy_per_epoch.append(accuracy(valid_batch_predictions, vbatch_labels))
# Compute and print average validation accuracy
mean_v_acc = np.mean(valid_accuracy_per_epoch)
print('\tAverage Valid Accuracy in epoch %d: %.5f' % (epoch, np.mean(valid_accuracy_per_epoch)))
# Learning rate decay logic
if mean_v_acc > max_v_acc:
max_v_acc = mean_v_acc
else:
v_acc_not_increased_for += 1
# Time to decrease learning rate
if v_acc_not_increased_for >= v_acc_threshold:
print('\nDecreasing Learning rate\n')
session.run(tf_tic_toc) # Increase global_step
v_acc_not_increased_for = 0
# Testing phase. We compute test accuracy
# processing one batch at a time
accuracy_per_epoch = []
for i in range(n_test // batch_size):
btest_images, btest_labels = get_test_batch(test_inputs, test_labels, batch_size)
test_batch_predictions = session.run(tf_predictions, feed_dict={tf_inputs: btest_images})
accuracy_per_epoch.append(accuracy(test_batch_predictions, btest_labels))
# Collect samples for visualization only in the last epoch
if epoch == n_epochs - 1:
collect_samples(test_batch_predictions, btest_images, btest_labels)
print('\tAverage Test Accuracy in epoch %d: %.5f\n' % (epoch, np.mean(accuracy_per_epoch)))
session.close()
# Defining the plot related settings
pylab.figure(figsize=(25, 20)) # in inches
width = 0.5 # Width of a bar in the barchart
padding = 0.05 # Padding between two bars
labels = list(range(0, 10)) # Class labels
# Defining X axis
x_axis = np.arange(0, 10)
# We create 4 rows and 7 column set of subplots
# We choose these to put the titles in
# First row middle
pylab.subplot(4, 7, 4)
pylab.title('Correctly Classified Samples', fontsize=24)
# Second row middle
pylab.subplot(4, 7, 11)
pylab.title('Softmax Predictions for Correctly Classified Samples', fontsize=24)
# For 7 steps
for sub_i in range(7):
# Draw the top row (digit images)
pylab.subplot(4, 7, sub_i + 1)
pylab.imshow(np.squeeze(correctly_predicted[sub_i]), cmap='gray')
pylab.axis('off')
# Draw the second row (prediction bar chart)
pylab.subplot(4, 7, 7 + sub_i + 1)
pylab.bar(x_axis + padding, correct_predictions[sub_i], width)
pylab.ylim([0.0, 1.0])
pylab.xticks(x_axis, labels)
# Set titles for the third and fourth rows
pylab.subplot(4, 7, 18)
pylab.title('Incorrectly Classified Samples', fontsize=26)
pylab.subplot(4, 7, 25)
pylab.title('Softmax Predictions for Incorrectly Classified Samples', fontsize=24)
# For 7 steps
for sub_i in range(7):
# Draw the third row (incorrectly classified digit images)
pylab.subplot(4, 7, 14 + sub_i + 1)
pylab.imshow(np.squeeze(incorrectly_predicted[sub_i]), cmap='gray')
pylab.axis('off')
# Draw the fourth row (incorrect predictions bar chart)
pylab.subplot(4, 7, 21 + sub_i + 1)
pylab.bar(x_axis + padding, incorrect_predictions[sub_i], width)
pylab.ylim([0.0, 1.0])
pylab.xticks(x_axis, labels)
# Save the figure
pylab.savefig('mnist_results.png')
pylab.show()