卷积神经网络的主要结构是卷积层+池化层,该算法在图像上有较好的效果
小知识:图片有彩色图片和黑白图片,颜色都是有RGB三种颜色调和而成,所以彩色图片有三层通道,黑白图片有一层通道
咱们拿黑白图片说事:
简单来讲一个图片可以看作是一个矩阵24*24的,来一个卷积核( 这个是自己指定大小数值随机的小矩阵,假如2*2),与前面那个24*24的相乘,先与24*24左上角2*2的小矩阵相乘,得出一个数值作为这个过程结果矩阵的左上角数值,之后卷积和往右移动(移动的不长stride自己设置一般设置1或2),不断重复之间操作(卷积核小矩阵从图片的左上角一直滑动到右下角),这个过程叫做一次卷积过程,这个过程得到的结果是一个矩阵,为了降低维度,采用池化操作,一般采用均值池化或最大池化,假如你采取2*2平均池化,前面的卷积结果为4*4 ,2*2池化的意思就是在4*4的左上角2*2当中取均值当作结果的左上角结果,右上角2*2矩阵的均值作为结果的右上角值,左下角和右下角同理,如果采用最大值池化就是取最大值不是取均值;假如池化得到的结果是2*2,之后用tf.reshape给它变成flat,就是变成1维的( 原来是2*2,这回编程1*4),之后作为全连接神经网络的输入得到分类结果
这里面主要学习的参数就是卷积核,不断通过反向传递学习卷积核里面参数,知道结果收敛或达到实现设定好的阈值,上面说的只是进行一个卷积池化操作而已,还可以在后面再添加卷积池化操作,之后连接全连接层
比较详细的解释可以参考帖子:
https://blog.csdn.net/laingliang/article/details/53073591
https://blog.csdn.net/laingliang/article/details/53073591
https://blog.csdn.net/qq_33414271/article/details/79337141
代码:
- #encoding='utf-8'
- """
- Description:以cifar10_input数据为例,
- cifar数据和代码下载地址:git clone https://github.com/tensorflow/model.git
- 代码下载/model/tutorials/image/CIFAR10文件夹即是操作区域,建一个.py文件
- .py文件里面代码写上:
- #encoding='utf-8'
- import cifar10
- cifar10.maybe_download_and_extract()
- 运行文件,即可获取数据
- """
- import cifar10_input
- import tensorflow as tf
- import numpy as np
- batch_size = 128
- data_dir = '/tmp/cifar10_data/cifar-10-batches-bin'
- print("begin")
- images_train,labels_train = cifar10_input.inputs(eval_data = False,
- data_dir = data_dir,
- batch_size = batch_size)
- images_test,labels_test = cifar10_input.inputs(eval_data = True,
- data_dir = data_dir,
- batch_size = batch_size)
- print("begin data")
- def weight_variable(shape):
- initial = tf.truncated_normal(shape,stddev=0.1)
- return tf.Variable(initial)
- def bias_variable(shape):
- initial = tf.constant(0.1,shape=shape)
- return tf.Variable(initial)
- def conv2d(x,w):
- return tf.nn.conv2d(x,w,strides=[1,1,1,1],padding='SAME')
- def max_pool_2x2(x):
- return tf.nn.max_pool(x,ksize=[1,2,2,1],strides=[1,2,2,1],padding='SAME')
- def avg_pool_6x6(x):
- return tf.nn.avg_pool(x,ksize=[1,6,6,1],strides=[1,6,6,1],padding='SAME')
- x = tf.placeholder(tf.float32,[None,24,24,3])
- y = tf.placeholder(tf.float32,[None,10])
- w_conv1 = weight_variable([5,5,3,64])
- b_conv1 = bias_variable([64])
- x_image = tf.reshape(x,[-1,24,24,3])
- h_pool1 = max_pool_2x2(tf.nn.relu(conv2d(x_image,w_conv1))+b_conv1)
- w_conv2 = weight_variable([5,5,64,64])
- b_conv2 = bias_variable([64])
- h_pool2 = max_pool_2x2(tf.nn.relu(conv2d(h_pool1,w_conv2))+b_conv2)
- w_conv3 = weight_variable([5,5,64,10])
- b_conv3 = bias_variable([10])
- h_conv3 = max_pool_2x2(tf.nn.relu(conv2d(h_pool2,w_conv3))+b_conv3)
- h_pool3 = avg_pool_6x6(h_conv3)
- h_pool3_flat = tf.reshape(h_pool3,[-1,10])
- y_conv = tf.nn.softmax(h_pool3_flat)
- cross_entropy = -tf.reduce_sum(y*tf.log(y_conv))
- train_step = tf.trainable_variables.AdamOptimizer(1e-4).minimize(cross_entropy)
- correct_prediction = tf.equal(tf.argmax(y_conv,1),tf.argmax(y,1))
- accuracy = tf.reduce_mean(tf.cast(correct_prediction,"float"))
- sess = tf.Session()
- sess.run(tf.global_variables_initializer())
- tf.train.start_queue_runners(sess=sess)
- for i in range(15000):
- image_batch,label_batch = sess.run([images_train,labels_train])
- label_b = np.eye(10,dtype=float)[label_batch]
- train_step.run(feed_dict={x:image_batch,y:label_b},session=sess)
- if i%200 == 0:
- train_accuracy = accuracy.eval(feed_dict={x:image_batch,y:label_b},session=sess)
- print("step %d,training accuracy %g"%(i,train_accuracy))