AlexNet

论文信息

原文地址: ImageNet Classification with Deep Convolutional Neural Networks

作者:Alex Krizhevsky,Ilya Sutskever,Geoffrey E. Hinton

成就:模型在ImageNet ILSVRC-2012竞赛中,赢得了冠军,与第二名 top-5 26.2%的错误率相比,取得了top-5 15.3%的错误率。

AlexNet 网络架构

AlexNet 网络包含8个带权重的层,前5层是卷积层,剩下的3层是全连接层。查看网络层拓扑

AlexNet 算法改进

  • 使用 ReLU 非线性激活函数:提高训练速度

  • 局部响应归一化 LRN (Local Response Normalization):提高精度

  • 重叠 pool 池化:提升精度,不容易产生过拟合

  • 数据增益、Dropout:减少过拟合

重叠池化 (Overlapping Pooling)

池化层可看作由池化单元组成,步长为 S S S 个像素,池化单元大小为 Z × Z Z × Z Z×Z

S = Z S = Z S=Z 时,会得到通常在CNN中采用的传统局部池化,设置 S &lt; Z S &lt; Z S<Z 时,会得到重叠池化。在 AlexNet 中使用重叠池化,设置 S = 2 S = 2 S=2 Z = 3 Z = 3 Z=3,分别降低了 top-1: 0.4 % 0.4\% 0.4%,top-5: 0.3 % 0.3\% 0.3% 的错误率,与非重叠方案 S = 2 , S = 2 S = 2,S = 2 S=2S=2相比,输出的维度是相等的。

作者在训练过程中通常观察采用重叠池化的模型,发现它更难过拟合。

数据增强 (Data Augmentation)

图像数据中,常用的减少过拟合的方法是进行数据增强。

  • 图像变换和水平翻转
    在测试时,网络会从256×256图像上,提取5个224 × 224的图像块(四个角上的图像块和中心的图像块)和它们的水平翻转(因此总共10个图像块)进行预测,然后对网络在10个图像块上的 softmax 层进行平均。

  • 改变训练图像的RGB通道的强度
    具体地,在整个ImageNet训练集上对RGB像素值集合执行PCA。
    对于每幅训练图像,将找到的主成分的倍数相加,其大小与对应的特征值成比例,再乘以一个随机变量,该随机变量来自于均值为 0 、标准差为 0.1 的高斯分布。因此对于每幅RGB图像像素 I x y = [ I x y R , I x y G , I x y B ] T I_{xy} = [I^R_{xy} , I^G_{xy} , I^B_{xy} ]^T Ixy=[IxyR,IxyG,IxyB]T,我们加上下面的数量:
    [ p 1 , p 2 , p 3 ] [ α 1 λ 1 , α 2 λ 2 , α 3 λ 3 ] T [p_1, p_2, p_3][\alpha_1\lambda_1, \alpha_2\lambda_2, \alpha_3\lambda_3]^T [p1,p2,p3][α1λ1,α2λ2,α3λ3]T p i p_i pi λ i \lambda_i λi分别是RGB像素值3 × 3协方差矩阵的第 i i i个特征向量和特征值, α i \alpha_i αi是前面提到的随机变量。对于某个训练图像的所有像素,每个 α i \alpha_i αi只获取一次,直到图像进行下一次训练时才重新获取。这个方案近似抓住了自然图像的一个重要特性,即光照的颜色和强度发生变化时,目标身份是不变的。这个方案减少了 top 1错误率1%以上。

失活 (Dropout)

Dropout ,会以一定的概率(如, d r o p o u t = 0.5 dropout = 0.5 dropout=0.5)对每个隐层神经元的输出设为 0 。那些“失活”的神经元不再进行前向传播并且不参与反向传播。因此每次输入时,神经网络会采样一个不同的架构,但所有架构共享权重。

这个技术减少了复杂的神经元互适应,因为一个神经元不能依赖特定的其它神经元的存在。因此,神经元被强迫学习更鲁棒的特征,它在与许多不同的其它神经元的随机子集结合时是有用的。

在测试时,使用所有的神经元但它们的输出乘以 d r o p o u t dropout dropout,对指数级的许多失活网络的预测分布进行几何平均,这是一种合理的近似。

alexNet中,前两个全连接层使用失活。作者发现,如果没有失活,网络会表现出大量的过拟合。

d r o p o u t = 0.5 dropout = 0.5 dropout=0.5 时,失活大致上使要求收敛的迭代次数翻了一倍。

AlexNet 微调

AlexNet 权重文件下载:地址

AlexNet 网络结构:

import tensorflow as tf
import numpy as np

class AlexNet(object):

	def __init__(self, x, keep_prob, num_classes, skip_layer,
	             weights_path='DEFAULT'):

		# Parse input arguments into class variables
		self.X = x
		self.NUM_CLASSES = num_classes
		self.KEEP_PROB = keep_prob
		self.SKIP_LAYER = skip_layer

		if weights_path == 'DEFAULT':
			self.WEIGHTS_PATH = 'bvlc_alexnet.npy'
		else:
			self.WEIGHTS_PATH = weights_path

		# Call the create function to build the computational graph of AlexNet
		self.create()

	def create(self):

		# 1st Layer: Conv (w ReLu) -> Pool -> Lrn
		conv1 = conv(self.X, 11, 11, 96, 4, 4, padding='VALID', name='conv1')
		pool1 = max_pool(conv1, 3, 3, 2, 2, padding='VALID', name='pool1')
		norm1 = lrn(pool1, 2, 2e-05, 0.75, name='norm1')

		# 2nd Layer: Conv (w ReLu) -> Pool -> Lrn with 2 groups
		conv2 = conv(norm1, 5, 5, 256, 1, 1, groups=2, name='conv2')
		pool2 = max_pool(conv2, 3, 3, 2, 2, padding='VALID', name='pool2')
		norm2 = lrn(pool2, 2, 2e-05, 0.75, name='norm2')

		# 3rd Layer: Conv (w ReLu)
		conv3 = conv(norm2, 3, 3, 384, 1, 1, name='conv3')

		# 4th Layer: Conv (w ReLu) splitted into two groups
		conv4 = conv(conv3, 3, 3, 384, 1, 1, groups=2, name='conv4')

		# 5th Layer: Conv (w ReLu) -> Pool splitted into two groups
		conv5 = conv(conv4, 3, 3, 256, 1, 1, groups=2, name='conv5')
		pool5 = max_pool(conv5, 3, 3, 2, 2, padding='VALID', name='pool5')

		# 6th Layer: Flatten -> FC (w ReLu) -> Dropout
		flattened = tf.reshape(pool5, [-1, 6 * 6 * 256])
		fc6 = fc(flattened, 6 * 6 * 256, 4096, name='fc6')
		dropout6 = dropout(fc6, self.KEEP_PROB)

		# 7th Layer: FC (w ReLu) -> Dropout
		fc7 = fc(dropout6, 4096, 4096, name='fc7')
		dropout7 = dropout(fc7, self.KEEP_PROB)

		# 8th Layer: FC and return unscaled activations (for tf.nn.softmax_cross_entropy_with_logits)
		self.fc8 = fc(dropout7, 4096, self.NUM_CLASSES, relu=False, name='fc8')

	def load_initial_weights(self, session):
		"""
    As the weights from http://www.cs.toronto.edu/~guerzhoy/tf_alexnet/ come 
    as a dict of lists (e.g. weights['conv1'] is a list) and not as dict of 
    dicts (e.g. weights['conv1'] is a dict with keys 'weights' & 'biases') we
    need a special load function
    """
		# Load the weights into memory
		weights_dict = np.load(self.WEIGHTS_PATH, encoding='bytes').item()

		# Loop over all layer names stored in the weights dict
		for op_name in weights_dict:

			# Check if the layer is one of the layers that should be reinitialized
			if op_name not in self.SKIP_LAYER:

				with tf.variable_scope(op_name, reuse=True):

					# Loop over list of weights/biases and assign them to their corresponding tf variable
					for data in weights_dict[op_name]:

						# Biases
						if len(data.shape) == 1:
							var = tf.get_variable('biases', trainable=False)
							session.run(var.assign(data))

						# Weights
						else:
							var = tf.get_variable('weights', trainable=False)
							session.run(var.assign(data))


#################################################
# Predefine all necessary layer for the AlexNet #
#################################################
def conv(x, filter_height, filter_width, num_filters, stride_y, stride_x, name,
         padding='SAME', groups=1):
	"""
  Adapted from: https://github.com/ethereon/caffe-tensorflow
  """
	# Get number of input channels
	input_channels = int(x.get_shape()[-1])

	# Create lambda function for the convolution
	convolve = lambda i, k: tf.nn.conv2d(i, k,
	                                     strides=[1, stride_y, stride_x, 1],
	                                     padding=padding)

	with tf.variable_scope(name) as scope:
		# Create tf variables for the weights and biases of the conv layer
		weights = tf.get_variable('weights', shape=[filter_height, filter_width, input_channels / groups, num_filters])
		biases = tf.get_variable('biases', shape=[num_filters])

		if groups == 1:
			conv = convolve(x, weights)

		# In the cases of multiple groups, split inputs & weights and
		else:
			# Split input and weights and convolve them separately
			input_groups = tf.split(axis=3, num_or_size_splits=groups, value=x)
			weight_groups = tf.split(axis=3, num_or_size_splits=groups, value=weights)
			output_groups = [convolve(i, k) for i, k in zip(input_groups, weight_groups)]

			# Concat the convolved output together again
			conv = tf.concat(axis=3, values=output_groups)

		# Add biases
		bias = tf.reshape(tf.nn.bias_add(conv, biases), conv.get_shape().as_list())

		# Apply relu function
		relu = tf.nn.relu(bias, name=scope.name)

		return relu

def fc(x, num_in, num_out, name, relu=True):
	with tf.variable_scope(name) as scope:

		# Create tf variables for the weights and biases
		weights = tf.get_variable('weights', shape=[num_in, num_out], trainable=True)
		biases = tf.get_variable('biases', [num_out], trainable=True)

		# Matrix multiply weights and inputs and add bias
		act = tf.nn.xw_plus_b(x, weights, biases, name=scope.name)

		if relu == True:
			# Apply ReLu non linearity
			relu = tf.nn.relu(act)
			return relu
		else:
			return act

def max_pool(x, filter_height, filter_width, stride_y, stride_x, name, padding='SAME'):
	return tf.nn.max_pool(x, ksize=[1, filter_height, filter_width, 1],
	                      strides=[1, stride_y, stride_x, 1],
	                      padding=padding, name=name)

def lrn(x, radius, alpha, beta, name, bias=1.0):
	return tf.nn.local_response_normalization(x, depth_radius=radius, alpha=alpha,
	                                          beta=beta, bias=bias, name=name)

def dropout(x, keep_prob):
	return tf.nn.dropout(x, keep_prob)

AlexNet 微调:

import os
import numpy as np
import tensorflow as tf
from datetime import datetime
from alexnet import AlexNet
from get_traffic_dataset import TrafficImageDataGenerator

"""
Configuration settings
"""
# Path to the textfiles for the trainings and validation set
current_path = os.path.abspath(os.path.dirname(__file__))
train_file = './citySpace/outData/train/'
val_file = './citySpace/outData/val/'

# Learning params
learning_rate = 0.01
num_epochs = 10
batch_size = 128

# Network params
dropout_rate = 0.5
num_classes = 3
train_layers = ['fc8', 'fc7']

# How often we want to write the tf.summary data to disk
display_step = 16

# Path for tf.summary.FileWriter and to store model checkpoints
filewriter_path = current_path + "/tmp/finetune_alexnet/traffic"
checkpoint_path = current_path + "/tmp/finetune_alexnet/"
org_model = current_path + '/bvlc_alexnet.npy'

# Create parent path if it doesn't exist
if not os.path.isdir(checkpoint_path):
    os.makedirs(checkpoint_path)

# TF placeholder for graph input and output
x = tf.placeholder(tf.float32, [batch_size, 227, 227, 3])
y = tf.placeholder(tf.float32, [None, num_classes])
keep_prob = tf.placeholder(tf.float32)

# Initialize model
model = AlexNet(x, keep_prob, num_classes, train_layers, weights_path=org_model)

# Link variable to model output
score = model.fc8

# List of trainable variables of the layers we want to train
var_list = [v for v in tf.trainable_variables() if v.name.split('/')[0] in train_layers]

# Op for calculating the loss
with tf.name_scope("cross_ent"):
  loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits = score, labels = y))  

# Train op
with tf.name_scope("train"):
  # Get gradients of all trainable variables
  gradients = tf.gradients(loss, var_list)
  gradients = list(zip(gradients, var_list))
  
  # Create optimizer and apply gradient descent to the trainable variables
  optimizer = tf.train.GradientDescentOptimizer(learning_rate)
  train_op = optimizer.apply_gradients(grads_and_vars=gradients)

# Add gradients to summary  
for gradient, var in gradients:
  tf.summary.histogram(var.name + '/gradient', gradient)

# Add the variables we train to the summary  
for var in var_list:
  tf.summary.histogram(var.name, var)
  
# Add the loss to summary
tf.summary.scalar('cross_entropy', loss)
  
  
# Evaluation op: Accuracy of the model
with tf.name_scope("accuracy"):
  correct_pred = tf.equal(tf.argmax(score, 1), tf.argmax(y, 1))
  accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))
  
# Add the accuracy to the summary
tf.summary.scalar('accuracy', accuracy)

# Merge all summaries together
merged_summary = tf.summary.merge_all()

# Initialize the FileWriter
writer = tf.summary.FileWriter(filewriter_path)

# Initialize an saver for store model checkpoints
saver = tf.train.Saver()

# Initalize the data generator seperately for the training and validation set
train_generator = TrafficImageDataGenerator(train_file, horizontal_flip = True, shuffle = True)
val_generator = TrafficImageDataGenerator(val_file, shuffle = False)

# Get the number of training/validation steps per epoch
train_batches_per_epoch = np.floor(train_generator.data_size / batch_size).astype(np.int16)
val_batches_per_epoch = np.floor(val_generator.data_size / batch_size).astype(np.int16)

# Start Tensorflow session
with tf.Session() as sess:
 
  # Initialize all variables
  sess.run(tf.global_variables_initializer())
  
  # Add the model graph to TensorBoard
  writer.add_graph(sess.graph)
  
  # Load the pretrained weights into the non-trainable layer
  model.load_initial_weights(sess)
  
  print("{} Start training...".format(datetime.now()))
  print("{} Open Tensorboard at --logdir {}".format(datetime.now(), 
                                                    filewriter_path))
  
  # Loop over number of epochs
  for epoch in range(num_epochs):
    
        print("{} Epoch number: {}".format(datetime.now(), epoch+1))
        
        step = 1
        
        while step < train_batches_per_epoch:
            
            # Get a batch of images and labels
            batch_xs, batch_ys, labels = train_generator.next_batch(batch_size)
            
            # And run the training op
            sess.run(train_op, feed_dict={x: batch_xs, 
                                          y: batch_ys, 
                                          keep_prob: dropout_rate})
            
            # Generate summary with the current batch of data and write to file
            if step%display_step == 0:
                s = sess.run(merged_summary, feed_dict={x: batch_xs, 
                                                        y: batch_ys, 
                                                        keep_prob: 1.})
                writer.add_summary(s, epoch*train_batches_per_epoch + step)
                
            step += 1
            
        # Validate the model on the entire validation set
        print("{} Start validation".format(datetime.now()))
        test_acc = 0.
        test_count = 0
        for _ in range(val_batches_per_epoch):
            batch_tx, batch_ty, labels = val_generator.next_batch(batch_size)
            acc = sess.run(accuracy, feed_dict={x: batch_tx, 
                                                y: batch_ty, 
                                                keep_prob: 1.})
            test_acc += acc
            test_count += 1
        test_acc /= test_count
        print("{} Validation Accuracy = {:.4f}".format(datetime.now(), test_acc))
        
        # Reset the file pointer of the image data generator
        val_generator.reset_pointer()
        train_generator.reset_pointer()
        
        print("{} Saving checkpoint of model...".format(datetime.now()))  
        
        #save checkpoint of the model
        checkpoint_name = os.path.join(checkpoint_path, 'model_epoch'+str(epoch+1)+'.ckpt')
        save_path = saver.save(sess, checkpoint_name)  
        
        print("{} Model checkpoint saved at {}".format(datetime.now(), checkpoint_name))
        

数据访问器:

import numpy as np
import cv2,os

class TrafficImageDataGenerator:
    def __init__(self, data_folder, horizontal_flip=False, shuffle=False,
                 mean = np.array([104., 117., 124.]), scale_size=(227, 227),
                 nb_classes = 3):


        # Init params
        self.horizontal_flip = horizontal_flip
        self.n_classes = nb_classes
        self.shuffle = shuffle
        self.mean = mean
        self.scale_size = scale_size
        self.pointer = 0
        self.classes=['person','car','cycle']
        
        self.read_class_list(data_folder)
        
        if self.shuffle:
            self.shuffle_data()

    def read_class_list(self, data_folder):
        """
        Scan the image file and get the image paths and labels
        """
        self.images = []
        self.labels = []

        for i in range(len(self.classes)):
            folder = data_folder + self.classes[i]
            if not os.path.exists(folder):
                return

            files = os.listdir(folder)
            for f in files:
                npath = folder + '/' + f
                fsize = os.path.getsize(npath)
                if fsize == 0:
                    os.remove(npath)
                    continue

                self.images.append(npath)
                self.labels.append(i)
        self.data_size = len(self.labels)
        
    def shuffle_data(self):
        """
        Random shuffle the images and labels
        """
        images = self.images.copy()
        labels = self.labels.copy()
        self.images = []
        self.labels = []
        
        #create list of permutated index and shuffle data accoding to list
        idx = np.random.permutation(len(labels))
        for i in idx:
            self.images.append(images[i])
            self.labels.append(labels[i])
                
    def reset_pointer(self):
        """
        reset pointer to begin of the list
        """
        self.pointer = 0
        if self.shuffle:
            self.shuffle_data()

    def next_batch(self, batch_size):
        """
        This function gets the next n ( = batch_size) images from the path list
        and labels and loads the images into them into memory
        """
        # Get next batch of image (path) and labels
        imgs = self.images[self.pointer:self.pointer + batch_size]
        labels = self.labels[self.pointer:self.pointer + batch_size]

        # update pointer
        self.pointer += batch_size

        # Read images
        images = np.ndarray([batch_size, self.scale_size[0], self.scale_size[1], 3])

        try:
            for i in range(len(imgs)):

                img = cv2.imread(imgs[i])

                if img is None:
                    continue

                # rescale image
                img = cv2.resize(img, (self.scale_size[0], self.scale_size[1]))
                img = img.astype(np.float32)

                # subtract mean
                img -= self.mean

                if self.horizontal_flip and np.random.random() < 0.5:
                    img = cv2.flip(img, 1)

                images[i] = img

        except Exception as e:
            print(e)

        # Expand labels to one hot encoding
        one_hot_labels = np.zeros((batch_size, self.n_classes))
        for i in range(len(labels)):
            one_hot_labels[i][labels[i]] = 1

        # return array of images and labels
        return images, one_hot_labels, labels
评论 2
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值