amd vega56 ubuntu 下 tensorflow GPU rocm 运行情况记录及跑分

最新推荐文章于 2025-03-09 17:06:42 发布

CHN悠远

最新推荐文章于 2025-03-09 17:06:42 发布

阅读量2.2k

点赞数

文章标签： amd vega tensorflow rocm cuda

本文链接：https://blog.csdn.net/qadzhangc/article/details/102161517

版权

我的机器比较老了 i5 4570 hdd 16G

ubuntu 18.04.2 kernel 5.0.0-31-generic

如何安装？

首先必须是linux

然后参考 https://rocm.github.io/ROCmInstall.html#ubuntu-support---installing-from-a-debian-repository

sudo apt update
sudo apt dist-upgrade
sudo apt install libnuma-dev
sudo reboot

wget -qO - http://repo.radeon.com/rocm/apt/debian/rocm.gpg.key | sudo apt-key add -
echo 'deb [arch=amd64] http://repo.radeon.com/rocm/apt/debian/ xenial main' | sudo tee /etc/apt/sources.list.d/rocm.list


sudo apt update
sudo apt install rocm-dkms


sudo usermod -a -G video $LOGNAME

完成基本安装

然后

sudo apt install rocm-libs miopen-hip cxlactivitylogger

sudo apt-get update &&  sudo apt-get install -y --allow-unauthenticated  rocm-dkms rocm-dev rocm-libs rccl  rocm-device-libs  hsa-ext-rocr-dev hsakmt-roct-dev hsa-rocr-dev  rocm-opencl rocm-opencl-dev  rocm-utils  rocm-profiler cxlactivitylogger  miopen-hip miopengemm

然后

pip3 install tensorflow-rocm -i https://pypi.tuna.tsinghua.edu.cn/simple

这个地方默认安装的是1.14，如果要安装tf1.13 ( tensorflow-rocm==1.13.1) 会报错。。。也不知道咋办。。。

一下就以tf1.14 models 1.5 来粘下运行情况

/home/zc/models-r1.5/official/mnist/ 的


INFO:tensorflow:Done calling model_fn.
I1005 14:23:32.420413 140280443631424 estimator.py:1147] Done calling model_fn.
INFO:tensorflow:Create CheckpointSaverHook.
I1005 14:23:32.421550 140280443631424 basic_session_run_hooks.py:541] Create CheckpointSaverHook.
INFO:tensorflow:Graph was finalized.
I1005 14:23:32.561293 140280443631424 monitored_session.py:240] Graph was finalized.
2019-10-05 14:23:32.561613: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA
2019-10-05 14:23:32.588956: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 3199815000 Hz
2019-10-05 14:23:32.589530: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x55cf4c094920 executing computations on platform Host. Devices:
2019-10-05 14:23:32.589565: I tensorflow/compiler/xla/service/service.cc:175]   StreamExecutor device (0): <undefined>, <undefined>
2019-10-05 14:23:32.589793: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libhip_hcc.so
2019-10-05 14:23:32.626274: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1651] Found device 0 with properties: 
name: Vega 10 XT [Radeon RX Vega 64]
AMDGPU ISA: gfx900
memoryClockRate (GHz) 1.59
pciBusID 0000:03:00.0
2019-10-05 14:23:32.663438: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library librocblas.so
2019-10-05 14:23:32.664629: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libMIOpen.so
2019-10-05 14:23:32.666046: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library librocfft.so
2019-10-05 14:23:32.666334: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library librocrand.so
2019-10-05 14:23:32.666455: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1763] Adding visible gpu devices: 0
2019-10-05 14:23:32.666551: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1181] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-10-05 14:23:32.666571: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1187]      0 
2019-10-05 14:23:32.666579: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1200] 0:   N 
2019-10-05 14:23:32.666744: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1326] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 7524 MB memory) -> physical GPU (device: 0, name: Vega 10 XT [Radeon RX Vega 64], pci bus id: 0000:03:00.0)
2019-10-05 14:23:32.668748: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x55cf486ba940 executing computations on platform ROCM. Devices:
2019-10-05 14:23:32.668783: I tensorflow/compiler/xla/service/service.cc:175]   StreamExecutor device (0): Vega 10 XT [Radeon RX Vega 64], AMDGPU ISA version: gfx900
WARNING:tensorflow:From /home/zc/miniconda3/envs/rocmtf/lib/python3.6/site-packages/tensorflow/python/training/saver.py:1276: checkpoint_exists (from tensorflow.python.training.checkpoint_management) is deprecated and will be removed in a future version.
Instructions for updating:
Use standard file APIs to check for files with this prefix.
W1005 14:23:32.670347 140280443631424 deprecation.py:323] From /home/zc/miniconda3/envs/rocmtf/lib/python3.6/site-packages/tensorflow/python/training/saver.py:1276: checkpoint_exists (from tensorflow.python.training.checkpoint_management) is deprecated and will be removed in a future version.
Instructions for updating:
Use standard file APIs to check for files with this prefix.
INFO:tensorflow:Restoring parameters from /tmp/mnist_model/model.ckpt-24000
I1005 14:23:32.678996 140280443631424 saver.py:1280] Restoring parameters from /tmp/mnist_model/model.ckpt-24000
WARNING:tensorflow:From /home/zc/miniconda3/envs/rocmtf/lib/python3.6/site-packages/tensorflow/python/training/saver.py:1066: get_checkpoint_mtimes (from tensorflow.python.training.checkpoint_management) is deprecated and will be removed in a future version.
Instructions for updating:
Use standard file utilities to get mtimes.
W1005 14:23:40.458618 140280443631424 deprecation.py:323] From /home/zc/miniconda3/envs/rocmtf/lib/python3.6/site-packages/tensorflow/python/training/saver.py:1066: get_checkpoint_mtimes (from tensorflow.python.training.checkpoint_management) is deprecated and will be removed in a future version.
Instructions for updating:
Use standard file utilities to get mtimes.
2019-10-05 14:23:40.496950: W tensorflow/compiler/jit/mark_for_compilation_pass.cc:1412] (One-time warning): Not using XLA:CPU for cluster because envvar TF_XLA_FLAGS=--tf_xla_cpu_global_jit was not set.  If you want XLA:CPU, either set that envvar, or use experimental_jit_scope to enable XLA:CPU.  To confirm that XLA is active, pass --vmodule=xla_compilation_cache=1 (as a proper command-line flag, not via TF_XLA_FLAGS) or set the envvar XLA_FLAGS=--xla_hlo_profile.
INFO:tensorflow:Running local_init_op.
I1005 14:23:40.501688 140280443631424 session_manager.py:500] Running local_init_op.
INFO:tensorflow:Done running local_init_op.
I1005 14:23:40.514181 140280443631424 session_manager.py:502] Done running local_init_op.
INFO:tensorflow:Saving checkpoints for 24000 into /tmp/mnist_model/model.ckpt.
I1005 14:23:40.762727 140280443631424 basic_session_run_hooks.py:606] Saving checkpoints for 24000 into /tmp/mnist_model/model.ckpt.
2019-10-05 14:23:40.947216: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library librocblas.so
2019-10-05 14:23:48.989702: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libMIOpen.so
2019-10-05 14:23:55.898632: I tensorflow/core/kernels/conv_grad_input_ops.cc:981] running auto-tune for Backward-Data
2019-10-05 14:23:55.948926: I tensorflow/core/kernels/conv_grad_filter_ops.cc:875] running auto-tune for Backward-Filter
2019-10-05 14:23:56.168817: I tensorflow/core/kernels/conv_grad_filter_ops.cc:875] running auto-tune for Backward-Filter
INFO:tensorflow:train_accuracy = 1.0
I1005 14:23:56.406840 140280443631424 basic_session_run_hooks.py:262] train_accuracy = 1.0
INFO:tensorflow:loss = 0.0004839369, step = 24000
I1005 14:23:56.407395 140280443631424 basic_session_run_hooks.py:262] loss = 0.0004839369, step = 24000
INFO:tensorflow:global_step/sec: 73.6445
I1005 14:23:57.764089 140280443631424 basic_session_run_hooks.py:692] global_step/sec: 73.6445
INFO:tensorflow:train_accuracy = 1.0 (1.358 sec)
I1005 14:23:57.764740 140280443631424 basic_session_run_hooks.py:260] train_accuracy = 1.0 (1.358 sec)
INFO:tensorflow:loss = 0.0016695356, step = 24100 (1.358 sec)
I1005 14:23:57.764930 140280443631424 basic_session_run_hooks.py:260] loss = 0.0016695356, step = 24100 (1.358 sec)
INFO:tensorflow:global_step/sec: 161.989
I1005 14:23:58.381400 140280443631424 basic_session_run_hooks.py:692] global_step/sec: 161.989
INFO:tensorflow:train_accuracy = 1.0 (0.617 sec)
I1005 14:23:58.382167 140280443631424 basic_session_run_hooks.py:260] train_accuracy = 1.0 (0.617 sec)
INFO:tensorflow:loss = 0.0026494868, step = 24200 (0.617 sec)
I1005 14:23:58.382354 140280443631424 basic_session_run_hooks.py:260] loss = 0.0026494868, step = 24200 (0.617 sec)
INFO:tensorflow:global_step/sec: 166.851
I1005 14:23:58.980767 140280443631424 basic_session_run_hooks.py:692] global_step/sec: 166.851
INFO:tensorflow:train_accuracy = 1.0 (0.599 sec)
I1005 14:23:58.981556 140280443631424 basic_session_run_hooks.py:260] train_accuracy = 1.0 (0.599 sec)
INFO:tensorflow:loss = 0.00032299932, step = 24300 (0.599 sec)
I1005 14:23:58.981797 140280443631424 basic_session_run_hooks.py:260] loss = 0.00032299932, step = 24300 (0.599 sec)
INFO:tensorflow:global_step/sec: 168.752
I1005 14:23:59.573339 140280443631424 basic_session_run_hooks.py:692] global_step/sec: 168.752
INFO:tensorflow:train_accuracy = 1.0 (0.592 sec)
I1005 14:23:59.574037 140280443631424 basic_session_run_hooks.py:260] train_accuracy = 1.0 (0.592 sec)
INFO:tensorflow:loss = 0.0003701407, step = 24400 (0.592 sec)
I1005 14:23:59.574180 140280443631424 basic_session_run_hooks.py:260] loss = 0.0003701407, step = 24400 (0.592 sec)
INFO:tensorflow:global_step/sec: 167.43
I1005 14:24:00.170599 140280443631424 basic_session_run_hooks.py:692] global_step/sec: 167.43
INFO:tensorflow:train_accuracy = 1.0 (0.597 sec)
I1005 14:24:00.171491 140280443631424 basic_session_run_hooks.py:260] train_accuracy = 1.0 (0.597 sec)
INFO:tensorflow:loss = 0.0006009388, step = 24500 (0.597 sec)
I1005 14:24:00.171680 140280443631424 basic_session_run_hooks.py:260] loss = 0.0006009388, step = 24500 (0.597 sec)
INFO:tensorflow:global_step/sec: 161.194
I1005 14:24:00.790968 140280443631424 basic_session_run_hooks.py:692] global_step/sec: 161.194
INFO:tensorflow:train_accuracy = 0.99857146 (0.620 sec)
I1005 14:24:00.791700 140280443631424 basic_session_run_hooks.py:260] train_accuracy = 0.99857146 (0.620 sec)
INFO:tensorflow:loss = 0.010238873, step = 24600 (0.620 sec)
I1005 14:24:00.792010 140280443631424 basic_session_run_hooks.py:260] loss = 0.010238873, step = 24600 (0.620 sec)
INFO:tensorflow:global_step/sec: 167.456
I1005 14:24:01.388126 140280443631424 basic_session_run_hooks.py:692] global_step/sec: 167.456
INFO:tensorflow:train_accuracy = 0.99875 (0.597 sec)
I1005 14:24:01.388878 140280443631424 basic_session_run_hooks.py:260] train_accuracy = 0.99875 (0.597 sec)
INFO:tensorflow:loss = 1.1944725e-05, step = 24700 (0.597 sec)
I1005 14:24:01.389158 140280443631424 basic_session_run_hooks.py:260] loss = 1.1944725e-05, step = 24700 (0.597 sec)
INFO:tensorflow:global_step/sec: 167.955
I1005 14:24:01.983534 140280443631424 basic_session_run_hooks.py:692] global_step/sec: 167.955
INFO:tensorflow:train_accuracy = 0.9988889 (0.596 sec)
I1005 14:24:01.984512 140280443631424 basic_session_run_hooks.py:260] train_accuracy = 0.9988889 (0.596 sec)
INFO:tensorflow:loss = 3.371142e-05, step = 24800 (0.596 sec)
I1005 14:24:01.984758 140280443631424 basic_session_run_hooks.py:260] loss = 3.371142e-05, step = 24800 (0.596 sec)
INFO:tensorflow:global_step/sec: 169.656
I1005 14:24:02.572987 140280443631424 basic_session_run_hooks.py:692] global_step/sec: 169.656
INFO:tensorflow:train_accuracy = 0.999 (0.589 sec)
I1005 14:24:02.573793 140280443631424 basic_session_run_hooks.py:260] train_accuracy = 0.999 (0.589 sec)
INFO:tensorflow:loss = 6.105689e-05, step = 24900 (0.589 sec)
I1005 14:24:02.573983 140280443631424 basic_session_run_hooks.py:260] loss = 6.105689e-05, step = 24900 (0.589 sec)
INFO:tensorflow:global_step/sec: 167.571
I1005 14:24:03.169733 140280443631424 basic_session_run_hooks.py:692] global_step/sec: 167.571
INFO:tensorflow:train_accuracy = 0.9990909 (0.597 sec)
I1005 14:24:03.170553 140280443631424 basic_session_run_hooks.py:260] train_accuracy = 0.9990909 (0.597 sec)
INFO:tensorflow:loss = 0.0004118733, step = 25000 (0.597 sec)
I1005 14:24:03.170741 140280443631424 basic_session_run_hooks.py:260] loss = 0.0004118733, step = 25000 (0.597 sec)
INFO:tensorflow:global_step/sec: 169.609
I1005 14:24:03.759313 140280443631424 basic_session_run_hooks.py:692] global_step/sec: 169.609
INFO:tensorflow:train_accuracy = 0.99916667 (0.590 sec)
I1005 14:24:03.760116 140280443631424 basic_session_run_hooks.py:260] train_accuracy = 0.99916667 (0.590 sec)
INFO:tensorflow:loss = 0.00070394913, step = 25100 (0.590 sec)
I1005 14:24:03.760275 140280443631424 basic_session_run_hooks.py:260] loss = 0.00070394913, step = 25100 (0.590 sec)
INFO:tensorflow:global_step/sec: 159.183
I1005 14:24:04.387524 140280443631424 basic_session_run_hooks.py:692] global_step/sec: 159.183
INFO:tensorflow:train_accuracy = 0.99923074 (0.628 sec)
I1005 14:24:04.388160 140280443631424 basic_session_run_hooks.py:260] train_accuracy = 0.99923074 (0.628 sec)
INFO:tensorflow:loss = 9.452502e-05, step = 25200 (0.628 sec)

/home/zc/models-r1.5/tutorials/image/cifar10/ 的




2019-10-05 14:51:03.367600: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1651] Found device 0 with properties: 
name: Vega 10 XT [Radeon RX Vega 64]
AMDGPU ISA: gfx900
memoryClockRate (GHz) 1.59
pciBusID 0000:03:00.0
2019-10-05 14:51:03.400899: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library librocblas.so
2019-10-05 14:51:03.401687: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libMIOpen.so
2019-10-05 14:51:03.402580: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library librocfft.so
2019-10-05 14:51:03.402764: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library librocrand.so
2019-10-05 14:51:03.402860: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1763] Adding visible gpu devices: 0
2019-10-05 14:51:03.402959: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1181] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-10-05 14:51:03.402970: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1187]      0 
2019-10-05 14:51:03.402985: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1200] 0:   N 
2019-10-05 14:51:03.403105: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1326] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 7524 MB memory) -> physical GPU (device: 0, name: Vega 10 XT [Radeon RX Vega 64], pci bus id: 0000:03:00.0)
2019-10-05 14:51:03.405734: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x55749a8daf80 executing computations on platform ROCM. Devices:
2019-10-05 14:51:03.405767: I tensorflow/compiler/xla/service/service.cc:175]   StreamExecutor device (0): Vega 10 XT [Radeon RX Vega 64], AMDGPU ISA version: gfx900
2019-10-05 14:51:03.498389: W tensorflow/compiler/jit/mark_for_compilation_pass.cc:1412] (One-time warning): Not using XLA:CPU for cluster because envvar TF_XLA_FLAGS=--tf_xla_cpu_global_jit was not set.  If you want XLA:CPU, either set that envvar, or use experimental_jit_scope to enable XLA:CPU.  To confirm that XLA is active, pass --vmodule=xla_compilation_cache=1 (as a proper command-line flag, not via TF_XLA_FLAGS) or set the envvar XLA_FLAGS=--xla_hlo_profile.
INFO:tensorflow:Running local_init_op.
I1005 14:51:11.534648 139922356148032 session_manager.py:500] Running local_init_op.
INFO:tensorflow:Done running local_init_op.
I1005 14:51:11.544094 139922356148032 session_manager.py:502] Done running local_init_op.
WARNING:tensorflow:From /home/zc/miniconda3/envs/rocmtf/lib/python3.6/site-packages/tensorflow/python/training/monitored_session.py:875: start_queue_runners (from tensorflow.python.training.queue_runner_impl) is deprecated and will be removed in a future version.
Instructions for updating:
To construct input pipelines, use the `tf.data` module.
W1005 14:51:11.583780 139922356148032 deprecation.py:323] From /home/zc/miniconda3/envs/rocmtf/lib/python3.6/site-packages/tensorflow/python/training/monitored_session.py:875: start_queue_runners (from tensorflow.python.training.queue_runner_impl) is deprecated and will be removed in a future version.
Instructions for updating:
To construct input pipelines, use the `tf.data` module.
INFO:tensorflow:Saving checkpoints for 0 into /tmp/cifar10_train/model.ckpt.
I1005 14:51:12.117074 139922356148032 basic_session_run_hooks.py:606] Saving checkpoints for 0 into /tmp/cifar10_train/model.ckpt.
2019-10-05 14:51:12.419670: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library librocblas.so
2019-10-05 14:51:19.892185: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libMIOpen.so
2019-10-05 14:51:26.778551: I tensorflow/core/kernels/conv_grad_input_ops.cc:981] running auto-tune for Backward-Data
2019-10-05 14:51:26.828987: I tensorflow/core/kernels/conv_grad_filter_ops.cc:875] running auto-tune for Backward-Filter
2019-10-05 14:51:27.074274: I tensorflow/core/kernels/conv_grad_filter_ops.cc:875] running auto-tune for Backward-Filter
2019-10-05 14:51:27.292801: step 0, loss = 4.68 (53.1 examples/sec; 2.409 sec/batch)
2019-10-05 14:51:27.568947: step 10, loss = 4.65 (4635.0 examples/sec; 0.028 sec/batch)
2019-10-05 14:51:27.732518: step 20, loss = 4.48 (7825.4 examples/sec; 0.016 sec/batch)
2019-10-05 14:51:27.893633: step 30, loss = 4.32 (7944.4 examples/sec; 0.016 sec/batch)
2019-10-05 14:51:28.052451: step 40, loss = 4.34 (8059.8 examples/sec; 0.016 sec/batch)
2019-10-05 14:51:28.205325: step 50, loss = 4.28 (8372.7 examples/sec; 0.015 sec/batch)
2019-10-05 14:51:28.356338: step 60, loss = 4.29 (8476.2 examples/sec; 0.015 sec/batch)
2019-10-05 14:51:28.505872: step 70, loss = 4.22 (8559.9 examples/sec; 0.015 sec/batch)
2019-10-05 14:51:28.664742: step 80, loss = 4.31 (8056.9 examples/sec; 0.016 sec/batch)
2019-10-05 14:51:28.829310: step 90, loss = 4.25 (7778.1 examples/sec; 0.016 sec/batch)
INFO:tensorflow:global_step/sec: 54.4963
I1005 14:51:29.126652 139922356148032 basic_session_run_hooks.py:692] global_step/sec: 54.4963
2019-10-05 14:51:29.127789: step 100, loss = 4.17 (4288.3 examples/sec; 0.030 sec/batch)
2019-10-05 14:51:29.299471: step 110, loss = 4.11 (7455.8 examples/sec; 0.017 sec/batch)
2019-10-05 14:51:29.455985: step 120, loss = 3.94 (8178.1 examples/sec; 0.016 sec/batch)
2019-10-05 14:51:29.612898: step 130, loss = 4.08 (8157.4 examples/sec; 0.016 sec/batch)
2019-10-05 14:51:29.772487: step 140, loss = 3.97 (8020.5 examples/sec; 0.016 sec/batch)
2019-10-05 14:51:29.950281: step 150, loss = 4.06 (7199.4 examples/sec; 0.018 sec/batch)
2019-10-05 14:51:30.111554: step 160, loss = 4.20 (7936.9 examples/sec; 0.016 sec/batch)
2019-10-05 14:51:30.265593: step 170, loss = 3.87 (8309.5 examples/sec; 0.015 sec/batch)
2019-10-05 14:51:30.416730: step 180, loss = 3.89 (8469.1 examples/sec; 0.015 sec/batch)
2019-10-05 14:51:30.573256: step 190, loss = 3.91 (8177.6 examples/sec; 0.016 sec/batch)
INFO:tensorflow:global_step/sec: 57.5419
I1005 14:51:30.864464 139922356148032 basic_session_run_hooks.py:692] global_step/sec: 57.5419
2019-10-05 14:51:30.865772: step 200, loss = 3.65 (4375.8 examples/sec; 0.029 sec/batch)
2019-10-05 14:51:31.046528: step 210, loss = 3.82 (7081.5 examples/sec; 0.018 sec/batch)
2019-10-05 14:51:31.203528: step 220, loss = 3.78 (8152.8 examples/sec; 0.016 sec/batch)
2019-10-05 14:51:31.354960: step 230, loss = 3.69 (8452.6 examples/sec; 0.015 sec/batch)
2019-10-05 14:51:31.514334: step 240, loss = 3.77 (8031.4 examples/sec; 0.016 sec/batch)
2019-10-05 14:51:31.685647: step 250, loss = 3.71 (7471.7 examples/sec; 0.017 sec/batch)
2019-10-05 14:51:31.846451: step 260, loss = 3.74 (7960.0 examples/sec; 0.016 sec/batch)
2019-10-05 14:51:32.009800: step 270, loss = 3.59 (7836.0 examples/sec; 0.016 sec/batch)
2019-10-05 14:51:32.165830: step 280, loss = 3.70 (8203.6 examples/sec; 0.016 sec/batch)
2019-10-05 14:51:32.392678: step 290, loss = 3.42 (5643.1 examples/sec; 0.023 sec/batch)
INFO:tensorflow:global_step/sec: 53.4814
I1005 14:51:32.734279 139922356148032 basic_session_run_hooks.py:692] global_step/sec: 53.4814
2019-10-05 14:51:32.735830: step 300, loss = 3.59 (3729.9 examples/sec; 0.034 sec/batch)
2019-10-05 14:51:32.917374: step 310, loss = 3.49 (7050.6 examples/sec; 0.018 sec/batch)
2019-10-05 14:51:33.090716: step 320, loss = 3.49 (7384.3 examples/sec; 0.017 sec/batch)
2019-10-05 14:51:33.273003: step 330, loss = 3.58 (7022.1 examples/sec; 0.018 sec/batch)
2019-10-05 14:51:33.433509: step 340, loss = 3.35 (7974.5 examples/sec; 0.016 sec/batch)
2019-10-05 14:51:33.594271: step 350, loss = 3.33 (7962.1 examples/sec; 0.016 sec/batch)
2019-10-05 14:51:33.759240: step 360, loss = 3.37 (7759.2 examples/sec; 0.016 sec/batch)
2019-10-05 14:51:33.909936: step 370, loss = 3.39 (8493.7 examples/sec; 0.015 sec/batch)
2019-10-05 14:51:34.075028: step 380, loss = 3.55 (7753.4 examples/sec; 0.017 sec/batch)
2019-10-05 14:51:34.237601: step 390, loss = 3.50 (7873.3 examples/sec; 0.016 sec/batch)
INFO:tensorflow:global_step/sec: 55.9487
I1005 14:51:34.521626 139922356148032 basic_session_run_hooks.py:692] global_step/sec: 55.9487
2019-10-05 14:51:34.522810: step 400, loss = 3.32 (4487.9 examples/sec; 0.029 sec/batch)
2019-10-05 14:51:34.686445: step 410, loss = 3.35 (7822.4 examples/sec; 0.016 sec/batch)
2019-10-05 14:51:34.855793: step 420, loss = 3.59 (7558.3 examples/sec; 0.017 sec/batch)
2019-10-05 14:51:35.015741: step 430, loss = 3.30 (8002.6 examples/sec; 0.016 sec/batch)
2019-10-05 14:51:35.180842: step 440, loss = 3.09 (7762.0 examples/sec; 0.016 sec/batch)
2019-10-05 14:51:35.344060: step 450, loss = 3.19 (7832.9 examples/sec; 0.016 sec/batch)
2019-10-05 14:51:35.490957: step 460, loss = 3.17 (8714.1 examples/sec; 0.015 sec/batch)
2019-10-05 14:51:35.637316: step 470, loss = 3.26 (8745.2 examples/sec; 0.015 sec/batch)
2019-10-05 14:51:35.831752: step 480, loss = 3.35 (6583.3 examples/sec; 0.019 sec/batch)
2019-10-05 14:51:35.995347: step 490, loss = 3.11 (7824.0 examples/sec; 0.016 sec/batch)
INFO:tensorflow:global_step/sec: 56.7619

watch -n 1 /opt/rocm/bin/rocm-smi 的结果

cifar10 是跑不满GPU的大约只有TDP 80W左右，mnist 有120W-150W （华擎公版vega56 非oc bios 最高150，OC那边165W）

换上简单的docker 这次就是tf1.13 速度有所提高

docker run --rm -it -v $HOME:/data --privileged --rm --device=/dev/kfd --device=/dev/dri --group-add video rocm/tensorflow:rocm2.6-tf1.13-python3

/home/zc/models-r1.5/official/mnist/



2019-10-05 07:08:32.274088: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA
2019-10-05 07:08:32.369834: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1531] Found device 0 with properties: 
name: Device 687f
AMDGPU ISA: gfx900
memoryClockRate (GHz) 1.59
pciBusID 0000:03:00.0
Total memory: 7.98GiB
Free memory: 7.73GiB
2019-10-05 07:08:32.369874: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1642] Adding visible gpu devices: 0
2019-10-05 07:08:32.369900: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1053] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-10-05 07:08:32.369917: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1059]      0 
2019-10-05 07:08:32.369923: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1072] 0:   N 
2019-10-05 07:08:32.369980: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1189] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 7524 MB memory) -> physical GPU (device: 0, name: Device 687f, pci bus id: 0000:03:00.0)
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
INFO:tensorflow:Saving checkpoints for 0 into /tmp/mnist_model/model.ckpt.
2019-10-05 07:09:19.753740: I tensorflow/core/kernels/conv_grad_input_ops.cc:1027] running auto-tune for Backward-Data
2019-10-05 07:09:22.482913: I tensorflow/core/kernels/conv_grad_filter_ops.cc:979] running auto-tune for Backward-Filter
2019-10-05 07:09:27.833345: I tensorflow/core/kernels/conv_grad_filter_ops.cc:979] running auto-tune for Backward-Filter
INFO:tensorflow:train_accuracy = 0.17
INFO:tensorflow:loss = 2.2934916, step = 0
INFO:tensorflow:global_step/sec: 63.3507
INFO:tensorflow:train_accuracy = 0.53 (1.579 sec)
INFO:tensorflow:loss = 0.41081315, step = 100 (1.578 sec)
INFO:tensorflow:global_step/sec: 172.868
INFO:tensorflow:train_accuracy = 0.6566667 (0.578 sec)
INFO:tensorflow:loss = 0.2912456, step = 200 (0.579 sec)
INFO:tensorflow:global_step/sec: 178.331
INFO:tensorflow:train_accuracy = 0.7225 (0.561 sec)
INFO:tensorflow:loss = 0.24256901, step = 300 (0.561 sec)
INFO:tensorflow:global_step/sec: 176.009
INFO:tensorflow:train_accuracy = 0.762 (0.568 sec)
INFO:tensorflow:loss = 0.20687613, step = 400 (0.568 sec)
INFO:tensorflow:global_step/sec: 176.837
INFO:tensorflow:train_accuracy = 0.79833335 (0.566 sec)
INFO:tensorflow:loss = 0.087043725, step = 500 (0.566 sec)
INFO:tensorflow:global_step/sec: 171.856
INFO:tensorflow:train_accuracy = 0.82285714 (0.582 sec)
INFO:tensorflow:loss = 0.1064676, step = 600 (0.582 sec)
INFO:tensorflow:global_step/sec: 172.889
INFO:tensorflow:train_accuracy = 0.8375 (0.579 sec)
INFO:tensorflow:loss = 0.21137495, step = 700 (0.579 sec)
INFO:tensorflow:global_step/sec: 178.836
INFO:tensorflow:train_accuracy = 0.8522222 (0.559 sec)
INFO:tensorflow:loss = 0.18393756, step = 800 (0.559 sec)
INFO:tensorflow:global_step/sec: 179.369
INFO:tensorflow:train_accuracy = 0.866 (0.558 sec)
INFO:tensorflow:loss = 0.04732112, step = 900 (0.558 sec)
INFO:tensorflow:global_step/sec: 176.643
INFO:tensorflow:train_accuracy = 0.87454545 (0.566 sec)
INFO:tensorflow:loss = 0.07572386, step = 1000 (0.566 sec)
INFO:tensorflow:global_step/sec: 178.853
INFO:tensorflow:train_accuracy = 0.87916666 (0.559 sec)
INFO:tensorflow:loss = 0.14445473, step = 1100 (0.559 sec)
INFO:tensorflow:global_step/sec: 170.553
INFO:tensorflow:train_accuracy = 0.88769233 (0.586 sec)
INFO:tensorflow:loss = 0.046243306, step = 1200 (0.586 sec)
INFO:tensorflow:global_step/sec: 173.607
INFO:tensorflow:train_accuracy = 0.89285713 (0.576 sec)
INFO:tensorflow:loss = 0.077932455, step = 1300 (0.576 sec)
INFO:tensorflow:global_step/sec: 180.427
INFO:tensorflow:train_accuracy = 0.8986667 (0.554 sec)
INFO:tensorflow:loss = 0.04583888, step = 1400 (0.554 sec)
INFO:tensorflow:global_step/sec: 180.881
INFO:tensorflow:train_accuracy = 0.903125 (0.553 sec)
INFO:tensorflow:loss = 0.08700336, step = 1500 (0.553 sec)
INFO:tensorflow:global_step/sec: 173.771

/home/zc/models-r1.5/tutorials/image/cifar10/ 的


Queue-based input pipelines have been replaced by `tf.data`. Use `tf.data.Dataset.shuffle(min_after_dequeue).batch(batch_size)`.
2019-10-05 07:11:31.581090: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA
2019-10-05 07:11:31.626012: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1531] Found device 0 with properties: 
name: Device 687f
AMDGPU ISA: gfx900
memoryClockRate (GHz) 1.59
pciBusID 0000:03:00.0
Total memory: 7.98GiB
Free memory: 7.73GiB
2019-10-05 07:11:31.626041: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1642] Adding visible gpu devices: 0
2019-10-05 07:11:31.626064: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1053] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-10-05 07:11:31.626081: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1059]      0 
2019-10-05 07:11:31.626087: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1072] 0:   N 
2019-10-05 07:11:31.626166: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1189] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 7524 MB memory) -> physical GPU (device: 0, name: Device 687f, pci bus id: 0000:03:00.0)
WARNING:tensorflow:From /usr/local/lib/python3.5/dist-packages/tensorflow/python/training/monitored_session.py:809: start_queue_runners (from tensorflow.python.training.queue_runner_impl) is deprecated and will be removed in a future version.
Instructions for updating:
To construct input pipelines, use the `tf.data` module.
2019-10-05 07:12:19.668639: I tensorflow/core/kernels/conv_grad_input_ops.cc:1027] running auto-tune for Backward-Data
2019-10-05 07:12:21.352489: I tensorflow/core/kernels/conv_grad_filter_ops.cc:979] running auto-tune for Backward-Filter
2019-10-05 07:12:26.600089: I tensorflow/core/kernels/conv_grad_filter_ops.cc:979] running auto-tune for Backward-Filter
2019-10-05 07:12:31.251288: step 0, loss = 4.67 (21.4 examples/sec; 5.977 sec/batch)
2019-10-05 07:12:31.494354: step 10, loss = 4.59 (5265.9 examples/sec; 0.024 sec/batch)
2019-10-05 07:12:31.629644: step 20, loss = 4.76 (9461.2 examples/sec; 0.014 sec/batch)
2019-10-05 07:12:31.759203: step 30, loss = 4.42 (9879.6 examples/sec; 0.013 sec/batch)
2019-10-05 07:12:31.879325: step 40, loss = 4.32 (10655.8 examples/sec; 0.012 sec/batch)
2019-10-05 07:12:32.001538: step 50, loss = 4.40 (10473.7 examples/sec; 0.012 sec/batch)
2019-10-05 07:12:32.122003: step 60, loss = 4.31 (10625.4 examples/sec; 0.012 sec/batch)
2019-10-05 07:12:32.242860: step 70, loss = 4.44 (10590.9 examples/sec; 0.012 sec/batch)
2019-10-05 07:12:32.365681: step 80, loss = 4.20 (10421.7 examples/sec; 0.012 sec/batch)
2019-10-05 07:12:32.487062: step 90, loss = 4.08 (10545.3 examples/sec; 0.012 sec/batch)
2019-10-05 07:12:32.740512: step 100, loss = 4.17 (5050.3 examples/sec; 0.025 sec/batch)
2019-10-05 07:12:32.870252: step 110, loss = 4.01 (9866.1 examples/sec; 0.013 sec/batch)
2019-10-05 07:12:32.993540: step 120, loss = 3.98 (10382.4 examples/sec; 0.012 sec/batch)
2019-10-05 07:12:33.117593: step 130, loss = 3.87 (10317.8 examples/sec; 0.012 sec/batch)
2019-10-05 07:12:33.239487: step 140, loss = 3.97 (10500.8 examples/sec; 0.012 sec/batch)
2019-10-05 07:12:33.361031: step 150, loss = 4.08 (10531.4 examples/sec; 0.012 sec/batch)
2019-10-05 07:12:33.485318: step 160, loss = 3.97 (10298.6 examples/sec; 0.012 sec/batch)
2019-10-05 07:12:33.605257: step 170, loss = 3.99 (10672.5 examples/sec; 0.012 sec/batch)
2019-10-05 07:12:33.733531: step 180, loss = 4.16 (9978.5 examples/sec; 0.013 sec/batch)
2019-10-05 07:12:33.860067: step 190, loss = 4.00 (10115.7 examples/sec; 0.013 sec/batch)
2019-10-05 07:12:34.112532: step 200, loss = 3.74 (5070.0 examples/sec; 0.025 sec/batch)
2019-10-05 07:12:34.243842: step 210, loss = 3.65 (9748.1 examples/sec; 0.013 sec/batch)
2019-10-05 07:12:34.365270: step 220, loss = 3.72 (10541.3 examples/sec; 0.012 sec/batch)
2019-10-05 07:12:34.485716: step 230, loss = 3.79 (10627.1 examples/sec; 0.012 sec/batch)
2019-10-05 07:12:34.608241: step 240, loss = 3.71 (10446.7 examples/sec; 0.012 sec/batch)
2019-10-05 07:12:34.731073: step 250, loss = 3.81 (10420.7 examples/sec; 0.012 sec/batch)
2019-10-05 07:12:34.871777: step 260, loss = 3.67 (9097.2 examples/sec; 0.014 sec/batch)
2019-10-05 07:12:34.993355: step 270, loss = 3.54 (10528.2 examples/sec; 0.012 sec/batch)
2019-10-05 07:12:35.113886: step 280, loss = 3.56 (10619.6 examples/sec; 0.012 sec/batch)
2019-10-05 07:12:35.237534: step 290, loss = 3.57 (10352.2 examples/sec; 0.012 sec/batch)

如果使用oc bios 那么 mnist 可以到0.55 0.54,cifar10 因为本来也跑不满，所以没啥大变化

这个是非刷vega64 bios ，没有调整啥 hbcc (看有些挖矿的帖子说有办法提高近1倍算力。。。) 原生态上机的结果

这个其实和我那个gtx1660ti差不多的成绩，也就是继续类似gtx1070? 和当年的设计指标没区别？

看性能似乎一般，但是考虑到价格 1800左右还有8G显存据说 hbcc 还能整出16G显存。。。也算是一个低价选择吧。。。

最后涡轮是真响