RTX3060显卡比1060跑深度学习慢?

最新推荐文章于 2025-02-10 17:49:34 发布

原创最新推荐文章于 2025-02-10 17:49:34 发布 · 1.2w 阅读

9 ·

CC 4.0 BY-SA版权

深度学习专栏收录该内容

16 篇文章

订阅专栏

作者在单位新购的RTX3060设备上运行TensorFlow进行视频目标检测推理，性能远低于预期，仅12帧，对比笔记本1060显卡慢4倍。代码示例和GPU详细信息揭示了可能的问题。寻求优化建议。

最近单位搞到1台装了rtx3060显卡到机器,我把之前项目代码上面一跑发现速度非常啦跨...!!!!

举个例子:视频目标检测推理原来能跑到60帧,但这货居然只能跑到12帧!!!!(tensorflow1)

然后我换了框架(tensorrt+pycuda)一顿搞,发现RTX3060显卡上到速度比我到笔记本1060显卡慢4倍!!!!

这简直给我带到了新世界,于是我用tensorflow写了一个demo:

import numpy as np
import time 
import tensorflow as tf

a=np.random.rand(100,100)
b=np.random.rand(100,100)
c= tf.matmul(a,b)

with tf.Session() as sess:
    for i in range(10):
        t0=time.time()
        sess.run(c)
        print('time cost:{:.4f}'.format((time.time()-t0)*1000))

3060机器测定结果:

(AI) root@face-ai:~$ nvidia-smi
Thu Jul 15 10:48:43 2021       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 460.73.01    Driver Version: 460.73.01    CUDA Version: 11.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  GeForce RTX 3060    Off  | 00000000:02:00.0 Off |                  N/A |
| 42%   49C    P2    43W / 170W |    849MiB / 12051MiB |      2%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A      1139      G   /usr/bin/gnome-shell                4MiB |
|    0   N/A  N/A      6905      C   python3                           841MiB |
+-----------------------------------------------------------------------------+
(AI) root@face-ai:~$ python3 test.py 
2021-07-15 10:48:50.362846: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
WARNING:tensorflow:Deprecation warnings have been disabled. Set TF_ENABLE_DEPRECATION_WARNINGS=1 to re-enable them.
WARNING:tensorflow:From test.py:9: The name tf.Session is deprecated. Please use tf.compat.v1.Session instead.

2021-07-15 10:48:58.212358: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcuda.so.1
2021-07-15 10:48:58.249094: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1665] Found device 0 with properties: 
name: GeForce RTX 3060 major: 8 minor: 6 memoryClockRate(GHz): 1.837
pciBusID: 0000:02:00.0
2021-07-15 10:48:58.249440: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
2021-07-15 10:48:58.282163: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublas.so.11
2021-07-15 10:48:58.288839: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcufft.so.10
2021-07-15 10:48:58.290773: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcurand.so.10
2021-07-15 10:48:58.319544: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusolver.so.11
2021-07-15 10:48:58.323162: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusparse.so.11
2021-07-15 10:48:58.326224: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudnn.so.8
2021-07-15 10:48:58.331603: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1793] Adding visible gpu devices: 0
2021-07-15 10:48:58.421741: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 3499825000 Hz
2021-07-15 10:48:58.423567: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x55c8c5fdcc20 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2021-07-15 10:48:58.423802: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Host, Default Version
2021-07-15 10:48:58.919241: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x55c8c606faf0 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
2021-07-15 10:48:58.919997: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): GeForce RTX 3060, Compute Capability 8.6
2021-07-15 10:48:58.923105: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1665] Found device 0 with properties: 
name: GeForce RTX 3060 major: 8 minor: 6 memoryClockRate(GHz): 1.837
pciBusID: 0000:02:00.0
2021-07-15 10:48:58.934999: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
2021-07-15 10:48:58.935367: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublas.so.11
2021-07-15 10:48:58.935458: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcufft.so.10
2021-07-15 10:48:58.935535: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcurand.so.10
2021-07-15 10:48:58.935604: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusolver.so.11
2021-07-15 10:48:58.935679: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusparse.so.11
2021-07-15 10:48:58.935753: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudnn.so.8
2021-07-15 10:48:58.937903: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1793] Adding visible gpu devices: 0
2021-07-15 10:48:58.938317: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
2021-07-15 10:49:01.153241: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1206] Device interconnect StreamExecutor with strength 1 edge matrix:
2021-07-15 10:49:01.154207: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1212]      0 
2021-07-15 10:49:01.154511: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1225] 0:   N 
2021-07-15 10:49:01.162712: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1351] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 9454 MB memory) -> physical GPU (device: 0, name: GeForce RTX 3060, pci bus id: 0000:02:00.0, compute capability: 8.6)
time cost:600.3177
time cost:17.2832
time cost:3.6066
time cost:2.5594
time cost:1.3814
time cost:1.4493
time cost:1.7078
time cost:2.7463
time cost:16.8326
time cost:3.1228

1060笔记本结果

a@a-G3-3579:/media/a$ nvidia-smi
Thu Jul 15 10:50:50 2021       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 450.102.04   Driver Version: 450.102.04   CUDA Version: 11.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  GeForce GTX 106...  Off  | 00000000:01:00.0  On |                  N/A |
| N/A   59C    P0    24W /  N/A |    494MiB /  6078MiB |      3%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A      4574      G   /usr/lib/xorg/Xorg                224MiB |
|    0   N/A  N/A      4777      G   /usr/bin/gnome-shell              212MiB |
|    0   N/A  N/A      5165      G   fcitx-qimpanel                     40MiB |
|    0   N/A  N/A      6374      G   /usr/lib/firefox/firefox            1MiB |
|    0   N/A  N/A      6445      G   /usr/lib/firefox/firefox            1MiB |
|    0   N/A  N/A      6488      G   /usr/lib/firefox/firefox            1MiB |
|    0   N/A  N/A      7201      G   /usr/lib/firefox/firefox            1MiB |
|    0   N/A  N/A     13756      G   /usr/lib/firefox/firefox            1MiB |
|    0   N/A  N/A     13799      G   /usr/lib/firefox/firefox            1MiB |
|    0   N/A  N/A     13944      G   /usr/lib/firefox/firefox            1MiB |
+-----------------------------------------------------------------------------+
a@a-G3-3579:/media/a$ python3 test.py 
/home/a/.local/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:526: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint8 = np.dtype([("qint8", np.int8, 1)])
/home/a/.local/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:527: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_quint8 = np.dtype([("quint8", np.uint8, 1)])
/home/a/.local/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:528: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint16 = np.dtype([("qint16", np.int16, 1)])
/home/a/.local/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:529: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_quint16 = np.dtype([("quint16", np.uint16, 1)])
/home/a/.local/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:530: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint32 = np.dtype([("qint32", np.int32, 1)])
/home/a/.local/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:535: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  np_resource = np.dtype([("resource", np.ubyte, 1)])
2021-07-15 10:50:56.135547: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2021-07-15 10:50:56.229574: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:998] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-07-15 10:50:56.230025: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x2063ff0 executing computations on platform CUDA. Devices:
2021-07-15 10:50:56.230041: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (0): GeForce GTX 1060 with Max-Q Design, Compute Capability 6.1
2021-07-15 10:50:56.231739: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2199995000 Hz
2021-07-15 10:50:56.232615: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x27288f0 executing computations on platform Host. Devices:
2021-07-15 10:50:56.232631: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (0): <undefined>, <undefined>
2021-07-15 10:50:56.232716: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1433] Found device 0 with properties: 
name: GeForce GTX 1060 with Max-Q Design major: 6 minor: 1 memoryClockRate(GHz): 1.3415
pciBusID: 0000:01:00.0
totalMemory: 5.94GiB freeMemory: 5.39GiB
2021-07-15 10:50:56.232747: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 0
2021-07-15 10:50:56.233196: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix:
2021-07-15 10:50:56.233207: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990]      0 
2021-07-15 10:50:56.233234: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 0:   N 
2021-07-15 10:50:56.233302: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 5220 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1060 with Max-Q Design, pci bus id: 0000:01:00.0, compute capability: 6.1)
time cost:58.0266
time cost:0.4869
time cost:0.3860
time cost:0.3378
time cost:0.3417
time cost:0.3548
time cost:0.2599
time cost:0.2871
time cost:0.2599
time cost:0.2649

这个速度实在太离谱了!!!!

也许是我哪个地方设置问题,如果有大佬知道怎么优化到话还欢迎指导