Kaggle-Digit Recognizer_use standard file apis to delete files with this p-CSDN博客

本文链接：https://blog.csdn.net/m0_37520918/article/details/101292395

该博客介绍了如何使用Tensorflow在Kaggle上进行手写数字识别项目。通过构建卷积神经网络（CNN），包括卷积层、池化层、全连接层和Dropout层，实现了高精度的识别。训练结果显示，模型在不同数字上的识别准确率从90.6%到98.7%不等。

摘要由CSDN通过智能技术生成

Kaggle-Digit Recognizer

用Tensorflow实现Kaggle的手写识别项目
卷积神经网络的构建：卷积层1+池化层1+卷积层2+池化层2+全连接1+Dropout层+输出层

代码如下

import numpy as np
import tensorflow as tf
import matplotlib.pyplot as plot  #画图
from tensorflow.examples.tutorials.mnist import input_data
import pandas as pd

#加载数据集
train=pd.read_csv(r'E:\kaggle\DigitRecognizer\train.csv')
test=pd.read_csv(r'E:\kaggle\DigitRecognizer\test.csv')

#查看训练集信息
train.info()

<class ‘pandas.core.frame.DataFrame’>
RangeIndex: 42000 entries, 0 to 41999
Columns: 785 entries, label to pixel783
dtypes: int64(785)
memory usage: 251.5 MB

#查看测试集信息，缺少Label一列
test.info()

<class ‘pandas.core.frame.DataFrame’>
RangeIndex: 28000 entries, 0 to 27999
Columns: 784 entries, pixel0 to pixel783
dtypes: int64(784)
memory usage: 167.5 MB

#查看训练集是否有缺失值，结果是不存在缺失值
train.isnull().any().describe()

count 785
unique 1
top False
freq 785
dtype: object

#查看测试集是否有缺失值，结果是不存在缺失值
test.isnull().any().describe()

count 784
unique 1
top False
freq 784
dtype: object

# 查看训练集和测试集的行列数
print(train.shape)
print(test.shape)

(42000, 785)
(28000, 784)

#把图片数据取出来，进行处理
x_train=train.iloc[:,1:].values  #选取
x_train=x_train.astype(np.float)  #对数据类型进行转换
# 给到的图片的灰度数值在0-255，这里将图片的信息控制在0-1之间
x_train=np.multiply(x_train,1.0/255.0)
#计算图片的长和高
image_size=x_train.shape[1]  ##图片水平尺寸
image_width=image_height=np.ceil(np.sqrt(image_size).astype(np.uint8))
print('图像样本大小：(%g  %g)' %x_train.shape)
print('图片的维度大小=>{0}'.format(image_size))
#print('图片长=>{0}\n图片高=>{1}'.format(image_width,image_height))

图像样本大小：(42000 784)
图片的维度大小=>784

# 将数据集的标签结果取出来
#labels_flat=train[[0]].values.ravel()   #获取label   ravel()numpy中将多维数组转换为一维数组
labels_flat=train.iloc[:, 0].values.ravel()
labels_count=np.unique(labels_flat).shape[0]  #获取label中