TensorFlow Mnist 之CNN

训练Mnist CNN 网络

根据TensorFlow 提供的Offcial/mnist提供的代码,第一次未能正确运行,修改个人目录下的.bashrc文件,添加一句:

export CUDA_VISIBLE_DEVICES="1"

之后source ~/.bashrc

然后运行:python mnist.py 会自动调用GPU运行训练网络。
启动Tensorboard观察训练曲线:

tensorboard --logdir mnist_cnn_model

之后在浏览器中打开: http://zc-GE62-2QD:6006,(链接因人而异)即可调出训练曲线和Graph。

重启电脑之后无法失败nvidia-smi命令

重启电脑之后电脑无法失败之前安装的Nvidia 驱动,按照Install blog的介绍,加入如下配置:
1、检查BIOS启动项,关闭一些选项
在开机启动项的Security选项中检查UEFI是否开启,如果开启的话请立马关掉它(重要)
在开机启动项的Boot选项中检查Secure Boot是否开启,如果开启的话请立马关掉它(重要)

2、安装相关依赖

sudo apt-get install libprotobuf-dev libleveldb-dev libsnappy-dev libopencv-dev libhdf5-serial-dev protobuf-compiler
sudo apt-get install --no-install-recommends libboost-all-dev
sudo apt-get install libopenblas-dev liblapack-dev libatlas-base-dev
sudo apt-get install libgflags-dev libgoogle-glog-dev liblmdb-dev

如果有图形化界面则需要禁用x-window服务

sudo service lightdm stop
或
sudo /etc/init.d/lightdm stop

3、禁用 nouveau
安装好依赖包后需要禁用 nouveau,只有在禁用掉 nouveau 后才能顺利安装 NVIDIA 显卡驱动,禁用方法就是在 sudo vim /etc/modprobe.d/blacklist.conf文件中添加一条禁用命令

blacklist nouveau
blacklist lbm-nouveau
options nouveau modeset=0
alias nouveau off
alias lbm-nouveau off

执行如下命令:

echo options nouveau modeset=0 | sudo tee -a /etc/modprobe.d/nouveau-kms.conf build the new kernel by:

最后更新后重启电脑

sudo update-initramfs -u
reboot

之后按照之前的博客invidia 驱动安装

命令行无法识别GPU的问题

按照上述的步骤完成之后,可以在PyCharm 的环境下运行mnist.py文件使用GPU进行训练,但是在命令行里调用该只来会提示:

2018-08-20 00:06:22.191504: I tensorflow/core/platform/cpu_feature_guard.cc:140] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA
2018-08-20 00:06:22.337492: E tensorflow/stream_executor/cuda/cuda_driver.cc:406] failed call to cuInit: CUDA_ERROR_NO_DEVICE
2018-08-20 00:06:22.337549: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:158] retrieving CUDA diagnostic information for host: zc-GE62-2QD
2018-08-20 00:06:22.337558: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:165] hostname: zc-GE62-2QD
2018-08-20 00:06:22.337611: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:189] libcuda reported version is: 396.26.0
2018-08-20 00:06:22.337639: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:193] kernel reported version is: 396.26.0
2018-08-20 00:06:22.337647: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:300] kernel version seems to match DSO: 396.26.0
0
I0820 00:06:22.339188 140026981222144 tf_logging.py:116] Using default config.
I0820 00:06:22.339440 140026981222144 tf_logging.py:116] Using config: {'_model_dir': './mnist_cnn_model', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': None, '_save_checkpoints_secs': 600, '_session_config': None, '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': 100, '_train_distribute': None, '_service': None, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x7f5a2f8b9278>, '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master': '', '_evaluation_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1}
I0820 00:06:22.387485 140026981222144 tf_logging.py:116] Calling model_fn.
I0820 00:06:22.727335 140026981222144 tf_logging.py:116] Done calling model_fn.
I0820 00:06:22.728235 140026981222144 tf_logging.py:116] Create CheckpointSaverHook.
I0820 00:06:22.891083 140026981222144 tf_logging.py:116] Graph was finalized.
I0820 00:06:22.892676 140026981222144 tf_logging.py:116] Restoring parameters from ./mnist_cnn_model/model.ckpt-62405
I0820 00:06:22.968533 140026981222144 tf_logging.py:116] Running local_init_op.
I0820 00:06:22.973660 140026981222144 tf_logging.py:116] Done running local_init_op.
2018-08-20 00:06:23.323585: E tensorflow/core/common_runtime/executor.cc:660] Executor failed to create kernel. Invalid argument: Default MaxPoolingOp only supports NHWC on device type CPU

解决办法:
在命令行丽执行:

CUDA_VISIBLE_DEVICES=0
python mnist.py

输出:
正常的训练log

(tensorflow) zc@zc-GE62-2QD:~/workspace/go_elife/AI_Demo/mnist_cnn$ python mnist.py
2018-08-20 00:27:17.136794: I tensorflow/core/platform/cpu_feature_guard.cc:140] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA
2018-08-20 00:27:17.400050: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:898] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2018-08-20 00:27:17.400468: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1356] Found device 0 with properties:
name: GeForce GTX 960M major: 5 minor: 0 memoryClockRate(GHz): 1.176
pciBusID: 0000:01:00.0
totalMemory: 1.96GiB freeMemory: 1.92GiB
2018-08-20 00:27:17.400488: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1435] Adding visible gpu devices: 0
2018-08-20 00:27:29.237605: I tensorflow/core/common_runtime/gpu/gpu_device.cc:923] Device interconnect StreamExecutor with strength 1 edge matrix:
2018-08-20 00:27:29.237665: I tensorflow/core/common_runtime/gpu/gpu_device.cc:929] 0
2018-08-20 00:27:29.237684: I tensorflow/core/common_runtime/gpu/gpu_device.cc:942] 0: N
2018-08-20 00:27:29.237926: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1053] Created TensorFlow device (/device:GPU:0 with 1687 MB memory) -> physical GPU (device: 0, name: GeForce GTX 960M, pci bus id: 0000:01:00.0, compute capability: 5.0)
1
I0820 00:27:29.270864 140173382854400 tf_logging.py:116] Using default config.
I0820 00:27:29.271144 140173382854400 tf_logging.py:116] Using config: {‘_model_dir’: ‘./mnist_cnn_model’, ‘_tf_random_seed’: None, ‘_save_summary_steps’: 100, ‘_save_checkpoints_steps’: None, ‘_save_checkpoints_secs’: 600, ‘_session_config’: None, ‘_keep_checkpoint_max’: 5, ‘_keep_checkpoint_every_n_hours’: 10000, ‘_log_step_count_steps’: 100, ‘_train_distribute’: None, ‘_service’: None, ‘_cluster_spec’:

调用Model运行测试

保留训练的文件到saved_model,关于: SavedModel

python mnist.py --export_dir ./mnist_cnn_saved_model
上述命令hi执行一圈训练,并把训练的Model保存。
 ``
saved_model_cli run --dir ./mnist_cnn_saved_model/1534951616/ --tag_set serve --signature_def classify --inputs image=examples.npy
输出:
2018-08-22 23:48:29.194109: I tensorflow/core/platform/cpu_feature_guard.cc:140] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA
2018-08-22 23:48:29.396440: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:898] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2018-08-22 23:48:29.396802: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1356] Found device 0 with properties: 
name: GeForce GTX 960M major: 5 minor: 0 memoryClockRate(GHz): 1.176
pciBusID: 0000:01:00.0
totalMemory: 1.96GiB freeMemory: 1.92GiB
2018-08-22 23:48:29.396817: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1435] Adding visible gpu devices: 0
2018-08-22 23:48:34.662749: I tensorflow/core/common_runtime/gpu/gpu_device.cc:923] Device interconnect StreamExecutor with strength 1 edge matrix:
2018-08-22 23:48:34.662780: I tensorflow/core/common_runtime/gpu/gpu_device.cc:929]      0 
2018-08-22 23:48:34.662792: I tensorflow/core/common_runtime/gpu/gpu_device.cc:942] 0:   N 
2018-08-22 23:48:34.662945: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1053] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 1687 MB memory) -> physical GPU (device: 0, name: GeForce GTX 960M, pci bus id: 0000:01:00.0, compute capability: 5.0)
Result for output key classes:
[3 3]
Result for output key probabilities:
[[2.8152197e-18 5.6790460e-20 6.9913247e-18 1.0000000e+00 0.0000000e+00
  6.2972307e-09 2.4830664e-19 3.2860752e-29 1.6266099e-26 5.6905280e-09]
 [5.3256835e-13 1.2014016e-10 2.4255571e-09 9.9993646e-01 3.9114351e-15
  4.1689669e-05 1.8050653e-07 6.5699896e-23 5.9275825e-19 2.1656551e-05]]

输出模型预测的结果。上述结果出错,应该是训练迭代次数过多,出现了过拟合的现象,重新训练了一次模型:预测结果如下:

(tensorflow) zc@zc-GE62-2QD:~/workspace/go_elife/AI_Demo/mnist_cnn$ saved_model_cli run --dir ./mnist_cnn_saved_model/1534953492/ --tag_set serve --signature_def classify --inputs image=examples.npy
2018-08-22 23:59:09.114953: I tensorflow/core/platform/cpu_feature_guard.cc:140] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA
2018-08-22 23:59:09.317977: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:898] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2018-08-22 23:59:09.318344: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1356] Found device 0 with properties: 
name: GeForce GTX 960M major: 5 minor: 0 memoryClockRate(GHz): 1.176
pciBusID: 0000:01:00.0
totalMemory: 1.96GiB freeMemory: 1.92GiB
2018-08-22 23:59:09.318364: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1435] Adding visible gpu devices: 0
2018-08-22 23:59:09.828057: I tensorflow/core/common_runtime/gpu/gpu_device.cc:923] Device interconnect StreamExecutor with strength 1 edge matrix:
2018-08-22 23:59:09.828108: I tensorflow/core/common_runtime/gpu/gpu_device.cc:929]      0 
2018-08-22 23:59:09.828121: I tensorflow/core/common_runtime/gpu/gpu_device.cc:942] 0:   N 
2018-08-22 23:59:09.828241: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1053] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 1687 MB memory) -> physical GPU (device: 0, name: GeForce GTX 960M, pci bus id: 0000:01:00.0, compute capability: 5.0)
Result for output key classes:
[5 3]
Result for output key probabilities:
[[1.4278330e-03 4.7121157e-10 5.4629195e-06 4.7545331e-03 2.8239561e-12
  5.6043041e-01 4.3334922e-01 6.1847216e-12 3.2551703e-05 1.2838069e-08]
 [2.0283554e-02 7.5934164e-05 1.5059297e-02 7.2744995e-01 8.3420676e-04
  2.1087909e-01 2.5073985e-02 3.2057675e-12 3.4292994e-04 9.9194108e-07]]

调用图片预测

调用图片读取需要用到Oppencv,直接安装conda install opencv

conda install opencv
不推荐用此指令安装,我用此指令安装把我的Tensorflow版本降低为1.5.0,气人。

tensorflow版本切换:

conda install --channel https://conda.anaconda.org/anaconda tensorflow-gpu=1.8.0

再次遇到:NVIDIA-SMI has failed because it couldn’t communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.
运行命令:sudo apt-get install cuda-drivers
nvidia-smi
暂时解决问题,感觉不是从根本上解决的问题。

  • Markdown和扩展Markdown简洁的语法
  • 代码块高亮
  • 图片链接和图片上传
  • LaTex数学公式
  • UML序列图和流程图
  • 离线写博客
  • 导入导出Markdown文件
  • 丰富的快捷键

快捷键

  • 加粗 Ctrl + B
  • 斜体 Ctrl + I
  • 引用 Ctrl + Q
  • 插入链接 Ctrl + L
  • 插入代码 Ctrl + K
  • 插入图片 Ctrl + G
  • 提升标题 Ctrl + H
  • 有序列表 Ctrl + O
  • 无序列表 Ctrl + U
  • 横线 Ctrl + R
  • 撤销 Ctrl + Z
  • 重做 Ctrl + Y

Markdown及扩展

Markdown 是一种轻量级标记语言,它允许人们使用易读易写的纯文本格式编写文档,然后转换成格式丰富的HTML页面。 —— [ 维基百科 ]

使用简单的符号标识不同的标题,将某些文字标记为粗体或者斜体,创建一个链接等,详细语法参考帮助?。

本编辑器支持 Markdown Extra ,  扩展了很多好用的功能。具体请参考Github.

表格

Markdown Extra 表格语法:

项目价格
Computer$1600
Phone$12
Pipe$1

可以使用冒号来定义对齐方式:

项目价格数量
Computer1600 元5
Phone12 元12
Pipe1 元234

定义列表

Markdown Extra 定义列表语法: 项目1 项目2
定义 A
定义 B
项目3
定义 C

定义 D

定义D内容

代码块

代码块语法遵循标准markdown代码,例如:

@requires_authorization
def somefunc(param1='', param2=0):
    '''A docstring'''
    if param1 > param2: # interesting
        print 'Greater'
    return (param2 - param1 + 1) or None
class SomeClass:
    pass
>>> message = '''interpreter
... prompt'''

脚注

生成一个脚注1.

目录

[TOC]来生成目录:

数学公式

使用MathJax渲染LaTex 数学公式,详见math.stackexchange.com.

  • 行内公式,数学公式为: Γ(n)=(n1)!n Γ ( n ) = ( n − 1 ) ! ∀ n ∈ N
  • 块级公式:

x=b±b24ac2a x = − b ± b 2 − 4 a c 2 a

更多LaTex语法请参考 这儿.

UML 图:

可以渲染序列图:

Created with Raphaël 2.1.2 张三 张三 李四 李四 嘿,小四儿, 写博客了没? 李四愣了一下,说: 忙得吐血,哪有时间写。

或者流程图:

Created with Raphaël 2.1.2 开始 我的操作 确认? 结束 yes no
  • 关于 序列图 语法,参考 这儿,
  • 关于 流程图 语法,参考 这儿.

离线写博客

即使用户在没有网络的情况下,也可以通过本编辑器离线写博客(直接在曾经使用过的浏览器中输入write.blog.csdn.net/mdeditor即可。Markdown编辑器使用浏览器离线存储将内容保存在本地。

用户写博客的过程中,内容实时保存在浏览器缓存中,在用户关闭浏览器或者其它异常情况下,内容不会丢失。用户再次打开浏览器时,会显示上次用户正在编辑的没有发表的内容。

博客发表后,本地缓存将被删除。 

用户可以选择 把正在写的博客保存到服务器草稿箱,即使换浏览器或者清除缓存,内容也不会丢失。

注意:虽然浏览器存储大部分时候都比较可靠,但为了您的数据安全,在联网后,请务必及时发表或者保存到服务器草稿箱

浏览器兼容

  1. 目前,本编辑器对Chrome浏览器支持最为完整。建议大家使用较新版本的Chrome。
  2. IE9以下不支持
  3. IE9,10,11存在以下问题
    1. 不支持离线功能
    2. IE9不支持文件导入导出
    3. IE10不支持拖拽文件导入


  1. 这里是 脚注内容.
  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 1
    评论
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值