TensorFlow学习-基于CNN实现手写数字识别

最新推荐文章于 2023-09-03 21:59:13 发布

d4shman

最新推荐文章于 2023-09-03 21:59:13 发布

阅读量2k

点赞数

本文链接：https://blog.csdn.net/wusuopubupt/article/details/71516835

版权

一、网络结构

使用2个卷积层，2个池化层, 2个全连接层组成网络

输入→ 卷积→ ReLU→max polling→ 卷积→ ReLU→max polling→ FC→输出

输入
一个4维的tensor: [batch_size, image_width, image_height, channels], 分别代表梯度下降处理的批量数据大小，图片宽度，图片高度和图片的channel个数（彩色图片channel数为3[Red, Green, Blue]，单色图片channel数为1）

 
            # Input Layer 
           
            # Reshape X to 4-D tensor: [batch_size, width, height, channels] 
           
            # MNIST images are 28x28 pixels, and have one color channel 
           
            input_layer  
            =  
            tf.reshape(features, [ 
            - 
            1 
            ,  
            28 
            ,  
            28 
            ,  
            1 
            ])

卷积层#1：
采用32(channel)个5*5的过滤器(kernel)对原始输入图像做卷积(局部感知)，另外对输入矩阵加了zero padding以保持卷积输出宽高和输入一致，并用ReLU作为激活函数引入非线性特性

 
            # Convolutional Layer #1 
           
            # Computes 32 features using a 5x5 filter with ReLU activation. 
           
            # Padding is added to preserve width and height. 
           
            # Input Tensor Shape: [batch_size, 28, 28, 1] 
           
            # Output Tensor Shape: [batch_size, 28, 28, 32] 
           
            conv1  
            =  
            tf.layers.conv2d( 
           
            inputs 
            = 
            input_layer, 
           
            filters 
            = 
            32 
            , 
           
            kernel_size 
            = 
            [ 
            5 
            ,  
            5 
            ], 
           
            padding 
            = 
            "same" 
            , 
           
            activation 
            = 
            tf.nn.relu)

池化层#1
采用2*2的过滤器(stride=2)对卷积层#1的输出做最大值下采样(max polling)，降低了数据纬度，并避免过拟合

 
            # Pooling Layer #1 
           
            # First max pooling layer with a 2x2 filter and stride of 2 
           
            # Input Tensor Shape: [batch_size, 28, 28, 32] 
           
            # Output Tensor Shape: [batch_size, 14, 14, 32] 
           
            pool1  
            =  
            tf.layers.max_pooling2d(inputs 
            = 
            conv1, pool_size 
            = 
            [ 
            2 
            ,  
            2 
            ], strides 
            = 
            2 
            )

卷积层#2
采用64个5*5的过滤器(kernel)对池化层#1的输出做卷积，并用ReLU作为激活函数

 
            # Convolutional Layer #2 
           
            # Computes 64 features using a 5x5 filter. 
           
            # Padding is added to preserve width and height. 
           
            # Input Tensor Shape: [batch_size, 14, 14, 32] 
           
            # Output Tensor Shape: [batch_size, 14, 14, 64] 
           
            conv2  
            =  
            tf.layers.conv2d( 
           
            inputs 
            = 
            pool1, 
           
            filters 
            = 
            64 
            , 
           
            kernel_size 
            = 
            [ 
            5 
            ,  
            5 
            ], 
           
            padding 
            = 
            "same" 
            , 
           
            activation 
            = 
            tf.nn.relu)

池化层#2
采用2*2的过滤器(stride=2)对卷积层#2的输出做最大值下采样(max polling)

 
            # Pooling Layer #2 
           
            # Second max pooling layer with a 2x2 filter and stride of 2 
           
            # Input Tensor Shape: [batch_size, 14, 14, 64] 
           
            # Output Tensor Shape: [batch_size, 7, 7, 64] 
           
            pool2  
            =  
            tf.layers.max_pooling2d(inputs 
            = 
            conv2, pool_size 
            = 
            [ 
            2 
            ,  
            2 
            ], strides 
            = 
            2 
            )

全连接层#1
首先把池化层#2的输出打平(flatten)成二维[batch_size, 7*7*64]矩阵，然后和1024个神经元做全连接，同时指定dropout=0.4(随机保留60%的数据做训练，避免过拟合)

 
            # Flatten tensor into a batch of vectors 
           
            # Input Tensor Shape: [batch_size, 7, 7, 64] 
           
            # Output Tensor Shape: [batch_size, 7 * 7 * 64] 
           
            pool2_flat  
            =  
            tf.reshape(pool2, [ 
            - 
            1 
            ,  
            7  
            *  
            7  
            *  
            64 
            ]) 
           
            # Dense Layer 
           
            # Densely connected layer with 1024 neurons 
           
            # Input Tensor Shape: [batch_size, 7 * 7 * 64] 
           
            # Output Tensor Shape: [batch_size, 1024] 
           
            dense  
            =  
            tf.layers.dense(inputs 
            = 
            pool2_flat, units 
            = 
            1024 
            , activation 
            = 
            tf.nn.relu) 
           
            # Add dropout operation; 0.6 probability that element will be kept 
           
            dropout  
            =  
            tf.layers.dropout( 
           
            inputs 
            = 
            dense, rate 
            = 
            0.4 
            , training 
            = 
            mode  
            = 
            =  
            learn.ModeKeys.TRAIN)

输出
10个神经元，依次代表0-9

 
            # Logits layer 
           
            # Input Tensor Shape: [batch_size, 1024] 
           
            # Output Tensor Shape: [batch_size, 10] 
           
            logits  
            =  
            tf.layers.dense(inputs 
            = 
            dropout, units 
            = 
            10 
            )

二、模型训练

对label做one-hot encoding

 
            # tf.one_hot接受两个参数： 
           
            # indices代表one-hot encoding后，值为1的位置(其余为0) 
           
            # depth代表目标值的个数(以手写数字识别为例，目标值为0-9， 所以depth=10) 
           
            onehot_labels  
            =  
            tf.one_hot(indices 
            = 
            tf.cast(labels, tf.int32), depth 
            = 
            10 
            )

计算交叉熵损失:

 
            loss  
            =  
            tf.losses.softmax_cross_entropy(onehot_labels 
            = 
            onehot_labels, logits 
            = 
            logits)

配置训练操作，学习率=0.001，优化方法采用随机梯度下降：

 
            train_op  
            =  
            tf.contrib.layers.optimize_loss( 
           
            loss 
            = 
            loss, 
           
            global_step 
            = 
            tf.contrib.framework.get_global_step(), 
           
            learning_rate 
            = 
            0.001 
            , 
           
            optimizer 
            = 
            "SGD" 
            )

模型预测

 
            # Generate Predictions 
           
            # classes： 预测的分类,取值0-9 
           
            # probabilities: classed对应的可能性， 经过softmax激活函数处理 
           
            predictions  
            =  
            { 
           
            "classes" 
            : tf.argmax( 
           
            input 
            = 
            logits, axis 
            = 
            1 
            ), 
           
            "probabilities" 
            : tf.nn.softmax( 
           
            logits, name 
            = 
            "softmax_tensor" 
            ) 
           
            }

创建评估器(Estimator)，返回一个分类器，能做训练和评估

 
            # Create the Estimator 
           
            # 这里的cnn_model_fn几乎就是上面全部代码的一个wrap, 详见：https://www.tensorflow.org/tutorials/layers#building_the_cnn_mnist_classifier 
           
            mnist_classifier  
            =  
            learn.Estimator( 
           
            model_fn 
            = 
            cnn_model_fn, model_dir 
            = 
            "/tmp/mnist_convnet_model" 
            )

训练：

 
            # Train the model 
           
            mnist_classifier.fit( 
           
            x 
            = 
            train_data, 
           
            y 
            = 
            train_labels, 
           
            batch_size 
            = 
            100 
            , 
           
            steps 
            = 
            20000 
            , 
           
            monitors 
            = 
            [logging_hook])

三、模型评估

配置评估metric并做评估

 
            # Configure the accuracy metric for evaluation 
           
            metrics  
            =  
            { 
           
            "accuracy" 
            : 
           
            learn.MetricSpec( 
           
            metric_fn 
            = 
            tf.metrics.accuracy, prediction_key 
            = 
            "classes" 
            ), 
           
            } 
           
            # Evaluate the model and print results 
           
            eval_results  
            =  
            mnist_classifier.evaluate( 
           
            x 
            = 
            eval_data, y 
            = 
            eval_labels, metrics 
            = 
            metrics)

四、源码

完整代码： https://www.github.com/tensorflow/tensorflow/blob/r1.1/tensorflow/examples/tutorials/layers/cnn_mnist.py

原文地址： https://www.tensorflow.org/tutorials/layers

d4shman

关注

0
点赞
踩
3

收藏

觉得还不错? 一键收藏
0
评论
TensorFlow学习-基于CNN实现手写数字识别

一、网络结构二、模型训练三、模型评估四、源码一、网络结构使用2个卷积层，2个池化层, 2个全连接层组成网络输入→ 卷积→ ReLU→max polling→ 卷积→ ReLU→max polling→ FC→输出输入一个4维的tensor: [batch_size, image_width, image_height, channels]
复制链接

扫一扫