VGG

论文信息

原文地址: Very Deep Convolutional Networks for Large-Scale Image Recognition

作者:Karen Simonyan,Andrew Zisserman

成就:模型在ImageNet Challenge 2014中,赢得了定位任务(localisation )的第一名,分类任务(classification)的第二名(仅次于GoogleNet)

VGG命名:因为该论文来自牛津大学Visual Geometry Group。

VGG 网络配置

随着更多的层被添加,配置的深度从左(A)增加到右(E)(添加的层以粗体显示)。卷积层参数表示为“conv⟨感受野大小⟩-通道数⟩”。为了简洁起见,不显示ReLU激活功能。查看VGG16 网络层拓扑

参数数量(百万级别):

VGG 性能评估

各模型准确率:

  • 首先可以注意到,与没有使用归一化层的模型A相比, 使用局部响应归一化 LRN (Local Response Normalization) 的模型A-LRN 的准确率并没有得到提升。因此,作者在较深的架构(B ~ E)中不采用归一化。

  • 其次我们观察到,分类误差随着ConvNet深度的增加而减小:从A中的11层到E中的19层。

  • 值得注意的是,尽管深度相同,模型C(包含三个 1 × 1 1×1 1×1卷积层)比在整个网络层中使用 3 × 3 3×3 3×3卷积的模型D稍差。这表明,虽然额外的非线性确实有帮助(C优于B),但也可以通过使用具有非平凡感受野( non-trivial receptive fields)(D比C好)的卷积滤波器来捕获空间上下文。

  • 当深度达到19层(模型E)时,架构的错误率饱和,但更深的模型可能有益于较大的数据集。

3×3 感受野小滤波器

作者在整个网络中使用非常小的 3 × 3 3×3 3×3感受野,与输入的每个像素(步长为1)进行卷积。很容易看到 2 2 2 3 × 3 3×3 3×3卷积层堆叠(没有空间池化)有 5 × 5 5×5 5×5的有效感受野; 3 3 3 个这样的层具有 7 × 7 7×7 7×7的有效感受野。

通过使用 3 3 3 3 × 3 3×3 3×3卷积层的堆叠来替换单个 7 × 7 7×7 7×7层时:

  • 首先,由于结合了 3 3 3 个非线性修正层,而不是单一的,这使得决策函数更具判别性。

  • 其次,减少参数的数量:假设 3 3 3 3 × 3 3×3 3×3卷积堆叠的输入和输出有 C C C 个通道,堆叠卷积层的参数为 3 ( 3 2 C 2 ) = 27 C 2 3(3^2C^2)=27C^2 3(32C2)=27C2 个权重;同时,单个 7 × 7 7×7 7×7 卷积层将需要 7 2 C 2 = 49 C 2 7^2C^2=49C^2 72C2=49C2 个参数,即参数多 81%。

  • 同时,堆叠替换操作可以看作是对 7 × 7 7×7 7×7 卷积滤波器进行正则化(regularization),迫使它们通过 3 × 3 3×3 3×3 滤波器(在它们之间注入非线性)进行分解。

作者还将网络B与具有 5 × 5 5×5 5×5卷积层的浅层网络进行了比较,浅层网络可以通过用单个 5 × 5 5×5 5×5卷积层替换B中每对 3 × 3 3×3 3×3 卷积层得到。测量的浅层网络top-1错误率比网络B的top-1错误率(在中心裁剪图像上)高7%,这证实了具有小滤波器的深层网络优于具有较大滤波器的浅层网络。

VGG 微调

VGG16 权重文件下载:地址

VGG16 网络结构:

import tensorflow as tf
import numpy as np

class VggNet(object):

	def __init__(self, x, keep_prob, num_classes, skip_layer, weights_path):

		# Parse input arguments into class variables
		self.X = x
		self.NUM_CLASSES = num_classes
		self.KEEP_PROB = keep_prob
		self.SKIP_LAYER = skip_layer
		self.WEIGHTS_PATH = weights_path

		# Call the create function to build the computational graph of Network
		self.create()

	def create(self):

		conv1_1 = conv(self.X, 3, 3, 64, 1, 1, padding='SAME', name='conv1_1')
		conv1_2 = conv(conv1_1, 3, 3, 64, 1, 1, padding='SAME', name='conv1_2')
		pool1 = max_pool(conv1_2, 2, 2, 2, 2, padding='SAME', name='pool1')

		conv2_1 = conv(pool1, 3, 3, 128, 1, 1, padding='SAME', name='conv2_1')
		conv2_2 = conv(conv2_1, 3, 3, 128, 1, 1, padding='SAME', name='conv2_2')
		pool2 = max_pool(conv2_2, 2, 2, 2, 2, padding='SAME', name='pool2')

		conv3_1 = conv(pool2, 3, 3, 256, 1, 1, padding='SAME', name='conv3_1')
		conv3_2 = conv(conv3_1, 3, 3, 256, 1, 1, padding='SAME', name='conv3_2')
		conv3_3 = conv(conv3_2, 3, 3, 256, 1, 1, padding='SAME', name='conv3_3')
		pool3 = max_pool(conv3_3, 2, 2, 2, 2, padding='SAME', name='pool3')

		conv4_1 = conv(pool3, 3, 3, 512, 1, 1, padding='SAME', name='conv4_1')
		conv4_2 = conv(conv4_1, 3, 3, 512, 1, 1, padding='SAME', name='conv4_2')
		conv4_3 = conv(conv4_2, 3, 3, 512, 1, 1, padding='SAME', name='conv4_3')
		pool4 = max_pool(conv4_3, 2, 2, 2, 2, padding='SAME', name='pool4')

		conv5_1 = conv(pool4, 3, 3, 512, 1, 1, padding='SAME', name='conv5_1')
		conv5_2 = conv(conv5_1, 3, 3, 512, 1, 1, padding='SAME', name='conv5_2')
		conv5_3 = conv(conv5_2, 3, 3, 512, 1, 1, padding='SAME', name='conv5_3')
		pool5 = max_pool(conv5_3, 2, 2, 2, 2, padding='SAME', name='pool5')

		# 6th Layer: Flatten -> FC (w ReLu) -> Dropout
		flattened = tf.reshape(pool5, [-1, 7 * 7 * 512])
		fc6 = fc(flattened, 7 * 7 * 512, 4096, name='fc6')
		dropout6 = dropout(fc6, self.KEEP_PROB)

		# 7th Layer: FC (w ReLu) -> Dropout
		fc7 = fc(dropout6, 4096, 4096, name='fc7')
		dropout7 = dropout(fc7, self.KEEP_PROB)

		# 8th Layer: FC and return unscaled activations (for tf.nn.softmax_cross_entropy_with_logits)
		self.fc8 = fc(dropout7, 4096, self.NUM_CLASSES, relu=False, name='fc8')

	def load_initial_weights(self, session, encoding):
		"""
    As the weights from http://www.cs.toronto.edu/~guerzhoy/tf_alexnet/ come 
    as a dict of lists (e.g. weights['conv1'] is a list) and not as dict of 
    dicts (e.g. weights['conv1'] is a dict with keys 'weights' & 'biases') we
    need a special load function
    """

		# Load the weights into memory
		weights_dict = np.load(self.WEIGHTS_PATH, encoding=encoding).item()

		# Loop over all layer names stored in the weights dict

		try:
			for op_name in weights_dict:

				# Check if the layer is one of the layers that should be reinitialized
				if op_name not in self.SKIP_LAYER:

					with tf.variable_scope(op_name, reuse=True):

						# Loop over list of weights/biases and assign them to their corresponding tf variable
						for data in weights_dict[op_name]:

							# Biases
							if len(data.shape) == 1:

								var = tf.get_variable('biases', trainable=False)
								session.run(var.assign(data))

							# Weights
							else:

								var = tf.get_variable('weights', trainable=False)
								session.run(var.assign(data))
		except Exception as e:
			print(e)

#################################################
# Predefine all necessary layer for the VGGNet #
#################################################
def conv(x, filter_height, filter_width, num_filters, stride_y, stride_x, name,
         padding='SAME', groups=1):
	"""
  Adapted from: https://github.com/ethereon/caffe-tensorflow
  """
	# Get number of input channels
	input_channels = int(x.get_shape()[-1])

	# Create lambda function for the convolution
	convolve = lambda i, k: tf.nn.conv2d(i, k,
	                                     strides=[1, stride_y, stride_x, 1],
	                                     padding=padding)

	with tf.variable_scope(name) as scope:
		# Create tf variables for the weights and biases of the conv layer
		weights = tf.get_variable('weights', shape=[filter_height, filter_width, input_channels / groups, num_filters])
		biases = tf.get_variable('biases', shape=[num_filters])

		conv = convolve(x, weights)

		# Add biases
		bias = tf.reshape(tf.nn.bias_add(conv, biases), conv.get_shape().as_list())

		# Apply relu function
		relu = tf.nn.relu(bias, name=scope.name)

		return relu


def fc(x, num_in, num_out, name, relu=True):
	with tf.variable_scope(name) as scope:

		# Create tf variables for the weights and biases
		weights = tf.get_variable('weights', shape=[num_in, num_out], trainable=True)
		biases = tf.get_variable('biases', [num_out], trainable=True)

		# Matrix multiply weights and inputs and add bias
		act = tf.nn.xw_plus_b(x, weights, biases, name=scope.name)

		if relu == True:
			# Apply ReLu non linearity
			relu = tf.nn.relu(act)
			return relu
		else:
			return act


def max_pool(x, filter_height, filter_width, stride_y, stride_x, name, padding='SAME'):
	return tf.nn.max_pool(x, ksize=[1, filter_height, filter_width, 1],
	                      strides=[1, stride_y, stride_x, 1],
	                      padding=padding, name=name)


def dropout(x, keep_prob):
	return tf.nn.dropout(x, keep_prob)

VGG 微调:

import os
import numpy as np
import tensorflow as tf
from datetime import datetime
from vgg_net import VggNet
from get_traffic_dataset import TrafficImageDataGenerator

"""
Configuration settings
"""

# Path to the textfiles for the trainings and validation set
current_path = os.path.abspath(os.path.dirname(__file__))
train_file = './citySpace/outData/train/'
val_file = './citySpace/outData/val/'

# Learning params
learning_rate = 0.01
num_epochs = 10
batch_size = 32

# Network params
dropout_rate = 0.5
num_classes = 3
train_layers = ['fc8','fc7','fc6']

# How often we want to write the tf.summary data to disk
display_step = 100

# Path for tf.summary.FileWriter and to store model checkpoints
filewriter_path = current_path + "/tmp/finetune_vggNet/traffic"
checkpoint_path = current_path + "/tmp/finetune_vggNet/"
org_model = current_path + '/vgg16.npy'

# Create parent path if it doesn't exist
if not os.path.isdir(checkpoint_path):
    os.makedirs(checkpoint_path)


# TF placeholder for graph input and output
x = tf.placeholder(tf.float32, [batch_size, 224, 224, 3])
y = tf.placeholder(tf.float32, [None, num_classes])
keep_prob = tf.placeholder(tf.float32)

# Initialize model
model = VggNet(x, keep_prob, num_classes, train_layers, weights_path=org_model)

# Link variable to model output
score = model.fc8

# List of trainable variables of the layers we want to train
var_list = [v for v in tf.trainable_variables() if v.name.split('/')[0] in train_layers]

# Op for calculating the loss
with tf.name_scope("cross_ent"):
  loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits = score, labels = y))  

# Train op
with tf.name_scope("train"):
  # Get gradients of all trainable variables
  gradients = tf.gradients(loss, var_list)
  gradients = list(zip(gradients, var_list))
  
  # Create optimizer and apply gradient descent to the trainable variables
  optimizer = tf.train.RMSPropOptimizer(learning_rate)
  train_op = optimizer.apply_gradients(grads_and_vars=gradients)

# Add gradients to summary  
for gradient, var in gradients:
  tf.summary.histogram(var.name + '/gradient', gradient)

# Add the variables we train to the summary  
for var in var_list:
  tf.summary.histogram(var.name, var)
  
# Add the loss to summary
tf.summary.scalar('cross_entropy', loss)
  

# Evaluation op: Accuracy of the model
with tf.name_scope("accuracy"):
  correct_pred = tf.equal(tf.argmax(score, 1), tf.argmax(y, 1))
  accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))
  
# Add the accuracy to the summary
tf.summary.scalar('accuracy', accuracy)

# Merge all summaries together
merged_summary = tf.summary.merge_all()

# Initialize the FileWriter
writer = tf.summary.FileWriter(filewriter_path)

# Initialize an saver for store model checkpoints
saver = tf.train.Saver()

# Initalize the data generator seperately for the training and validation set
train_generator = TrafficImageDataGenerator(train_file, horizontal_flip = True, shuffle = True)
val_generator = TrafficImageDataGenerator(val_file, shuffle = False)

# Get the number of training/validation steps per epoch
train_batches_per_epoch = np.floor(train_generator.data_size / batch_size).astype(np.int16)
val_batches_per_epoch = np.floor(val_generator.data_size / batch_size).astype(np.int16)

# Start Tensorflow session
with tf.Session() as sess:
 
  # Initialize all variables
  sess.run(tf.global_variables_initializer())
  
  # Add the model graph to TensorBoard
  writer.add_graph(sess.graph)
  
  # Load the pretrained weights into the non-trainable layer
  model.load_initial_weights(sess, 'latin1')
  
  print("{} Start training...".format(datetime.now()))
  print("{} Open Tensorboard at --logdir {}".format(datetime.now(), 
                                                    filewriter_path))
  
  # Loop over number of epochs
  for epoch in range(num_epochs):
    
        print("{} Epoch number: {}".format(datetime.now(), epoch+1))
        
        step = 1
        
        while step < train_batches_per_epoch:
            
            # Get a batch of images and labels
            batch_xs, batch_ys, labels = train_generator.next_batch(batch_size)
            
            # And run the training op
            sess.run(train_op, feed_dict={x: batch_xs, 
                                          y: batch_ys, 
                                          keep_prob: dropout_rate})
            
            # Generate summary with the current batch of data and write to file
            if step%display_step == 0:
                s = sess.run(merged_summary, feed_dict={x: batch_xs, 
                                                        y: batch_ys, 
                                                        keep_prob: 1.})
                writer.add_summary(s, epoch*train_batches_per_epoch + step)
                
            step += 1
            
        # Validate the model on the entire validation set
        print("{} Start validation".format(datetime.now()))
        test_acc = 0.
        test_count = 0
        for _ in range(val_batches_per_epoch):
            batch_tx, batch_ty, labels = val_generator.next_batch(batch_size)
            acc = sess.run(accuracy, feed_dict={x: batch_tx, 
                                                y: batch_ty, 
                                                keep_prob: 1.})
            test_acc += acc
            test_count += 1
        test_acc /= test_count
        print("{} Validation Accuracy = {:.4f}".format(datetime.now(), test_acc))
        
        # Reset the file pointer of the image data generator
        val_generator.reset_pointer()
        train_generator.reset_pointer()
        
        print("{} Saving checkpoint of model...".format(datetime.now()))  
        
        #save checkpoint of the model
        checkpoint_name = os.path.join(checkpoint_path, 'model_epoch'+str(epoch+1)+'.ckpt')
        save_path = saver.save(sess, checkpoint_name)  
        
        print("{} Model checkpoint saved at {}".format(datetime.now(), checkpoint_name))
        

数据访问器:TrafficImageDataGenerator 类,参见 地址

  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值