win10 子系统 ubuntu GPU驱动,CUDA, CUDNN安装与 win10 GPU 机器学习性能对比, numa_node 问题

win10 子系统 ubuntu GPU 安装 与 win10 GPU 机器学习性能对比

WSL 2 使用最新、最强大的虚拟化技术在轻量级实用工具虚拟机 (VM) 中运行 Linux 内核。
本文指导win10 子系统 ubuntu GPU驱动,CUDA, CUDNN安装过程和与 win10 GPU 机器学习性能对比。
两系统统一采用tensorflow 2.7.0, cuda 11.2, cudnn8.1
硬件: CPU AMD R7 5800H, GPU RTX 3050TI

win10 子系统 ubuntu GPU驱动,CUDA, CUDNN安装

win10 子系统 安装过程参考
https://blog.csdn.net/qq_33371133/article/details/107955261

ubuntu GPU驱动: 这是一个坑,不能直接下载linux gpu驱动,需要在win10端下载安装支持子系统和CUDA的驱动,它会覆盖win10原有驱动。下载链接https://developer.nvidia.com/cuda/wsl/download

CUDA, CUDNN安装 参考 https://zhuanlan.zhihu.com/p/72298520, 略过显卡驱动安装流程。选择安装 cuda 11.2, cudnn8.1

检查驱动是否安装完成:
win10端: cmd输入nvidia-smi:
在这里插入图片描述
子系统Ubuntu端: terminal端输入nvidia-smi:

在这里插入图片描述
二者应该一致。

win10 端 CUDA, CUDNN安装

参考https://blog.csdn.net/qq_37296487/article/details/83028394, 略过驱动安装环节。
选择安装 cuda 11.2, cudnn8.1

tensorflow 安装

win10 端和子系统端都使用pip安装

pip install tensorflow

性能对比

cifar10 数据集的CNN 分类任务, 训练10遍
采用相同的代码:

import tensorflow as tf

from tensorflow.keras import datasets, layers, models

(train_images, train_labels), (test_images, test_labels) = datasets.cifar10.load_data()

# Normalize pixel values to be between 0 and 1
train_images, test_images = train_images / 255.0, test_images / 255.0

class_names = ['airplane', 'automobile', 'bird', 'cat', 'deer',
               'dog', 'frog', 'horse', 'ship', 'truck']

model = models.Sequential()
model.add(layers.Conv2D(32, (3, 3), activation='relu', input_shape=(32, 32, 3)))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Conv2D(64, (3, 3), activation='relu'))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Conv2D(64, (3, 3), activation='relu'))

model.add(layers.Flatten())
model.add(layers.Dense(64, activation='relu'))
model.add(layers.Dense(10))

model.summary()

model.compile(optimizer='adam',
              loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
              metrics=['accuracy'])

history = model.fit(train_images, train_labels, epochs=10, 
                    validation_data=(test_images, test_labels))

win10 运行结果

CPU 利用率 20-30%
GPU 峰值能跑到100%

1563/1563 [==============================] - 10s 5ms/step - loss: 1.4984 - accuracy: 0.4489 - val_loss: 1.2695 - val_accuracy: 0.5335
Epoch 2/10
1563/1563 [==============================] - 6s 4ms/step - loss: 1.1277 - accuracy: 0.5993 - val_loss: 1.1031 - val_accuracy: 0.6096
Epoch 3/10
1563/1563 [==============================] - 6s 4ms/step - loss: 0.9731 - accuracy: 0.6575 - val_loss: 0.9546 - val_accuracy: 0.6614
Epoch 4/10
1563/1563 [==============================] - 6s 4ms/step - loss: 0.8774 - accuracy: 0.6927 - val_loss: 0.9079 - val_accuracy: 0.6830
Epoch 5/10
1563/1563 [==============================] - 7s 4ms/step - loss: 0.8131 - accuracy: 0.7149 - val_loss: 0.8627 - val_accuracy: 0.6948
Epoch 6/10
1563/1563 [==============================] - 6s 4ms/step - loss: 0.7506 - accuracy: 0.7373 - val_loss: 0.8729 - val_accuracy: 0.6972
Epoch 7/10
1563/1563 [==============================] - 6s 4ms/step - loss: 0.7097 - accuracy: 0.7509 - val_loss: 0.8597 - val_accuracy: 0.7012
Epoch 8/10
1563/1563 [==============================] - 6s 4ms/step - loss: 0.6689 - accuracy: 0.7643 - val_loss: 0.8671 - val_accuracy: 0.7026
Epoch 9/10
1563/1563 [==============================] - 6s 4ms/step - loss: 0.6318 - accuracy: 0.7782 - val_loss: 0.8412 - val_accuracy: 0.7122
Epoch 10/10
1563/1563 [==============================] - 6s 4ms/step - loss: 0.5992 - accuracy: 0.7890 - val_loss: 0.8743 - val_accuracy: 0.7061

子系统UBUNTU运行结果

CPU 利用率 20-30%
GPU 峰值能跑到100%
和win10 端差不多

1563/1563 [==============================] - 11s 5ms/step - loss: 1.5182 - accuracy: 0.4468 - val_loss: 1.3254 - val_accuracy: 0.5321
Epoch 2/10
1563/1563 [==============================] - 7s 4ms/step - loss: 1.1464 - accuracy: 0.5937 - val_loss: 1.1226 - val_accuracy: 0.6122
Epoch 3/10
1563/1563 [==============================] - 7s 5ms/step - loss: 0.9849 - accuracy: 0.6550 - val_loss: 0.9455 - val_accuracy: 0.6695
Epoch 4/10
1563/1563 [==============================] - 7s 4ms/step - loss: 0.8819 - accuracy: 0.6905 - val_loss: 0.9230 - val_accuracy: 0.6782
Epoch 5/10
1563/1563 [==============================] - 7s 4ms/step - loss: 0.8085 - accuracy: 0.7167 - val_loss: 0.8923 - val_accuracy: 0.6935
Epoch 6/10
1563/1563 [==============================] - 7s 5ms/step - loss: 0.7483 - accuracy: 0.7376 - val_loss: 0.8511 - val_accuracy: 0.7101
Epoch 7/10
1563/1563 [==============================] - 7s 5ms/step - loss: 0.6985 - accuracy: 0.7561 - val_loss: 0.8586 - val_accuracy: 0.7066
Epoch 8/10
1563/1563 [==============================] - 7s 4ms/step - loss: 0.6536 - accuracy: 0.7702 - val_loss: 0.8609 - val_accuracy: 0.7061
Epoch 9/10
1563/1563 [==============================] - 7s 5ms/step - loss: 0.6108 - accuracy: 0.7855 - val_loss: 0.8639 - val_accuracy: 0.7188
Epoch 10/10
1563/1563 [==============================] - 7s 4ms/step - loss: 0.5790 - accuracy: 0.7963 - val_loss: 0.8540 - val_accuracy: 0.7163

结果对比

一开始我以为wsl2版的ubuntu子系统对显卡的支持不好,因为wsl一代根本就读不出来显卡233。没想到子系统居然可以100%调用显卡,并且性能损失不大(7.4 对比 6.4 秒,只慢了一秒!)。具体原因调研中,后期会更新…

问题: could not open file to read NUMA node: /sys/bus/pci/devices/0000:01:00.0/numa_node

子系统运行CNN的时候,报了一个这样的错误。

2021-11-25 18:10:44.356599: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:925] could not open file to read NUMA node: /sys/bus/pci/devices/0000:01:00.0/numa_node

参考: https://forums.developer.nvidia.com/t/numa-error-running-tensorflow-on-jetson-tx2/56119/4
在这里插入图片描述
就是说
不要在论坛发誓(骂人)。
NUMA 信息是无害警告。
Tensorflow 可以在出现警告的情况下正确运行。

所以无视就好。

欢迎提问。

评论 4
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值