先抛出问题,望高人指点迷津:
跑 mnist 的 demo,GPU 比 CPU 慢了很多,为啥呢?
本机环境:
CPU:Intel I9 8核16线程
内存:64G
显卡:AMD Radeon Pro 5500M
示例代码
import tensorflow as tf
def run():
mnist = tf.keras.datasets.mnist
(x_train, y_train), (x_test, y_test) = mnist.load_data()
x_train, x_test = x_train / 255.0, x_test / 255.0
print(len(x_train), len(y_train), len(x_test), len(y_test))
model = tf.keras.models.Sequential([
tf.keras.layers.Flatten(input_shape=(28, 28)),
tf.keras.layers.Dense(128, activation='relu'),
tf.keras.layers.Dropout(0.2),
tf.keras.layers.Dense(10, activation='softmax')
])
model.compile(optimizer='adam',
loss='sparse_categorical_crossentropy',
metrics=['accuracy'])
model.fit(x_train, y_train, epochs=5)
model.evaluate(x_test, y_test, verbose=2)
if __name__ == '__main__':
devices = tf.config.list_physical_devices()
print(devices)
with tf.device("cpu:0"):
print('start with cpu')
run()
with tf.device("gpu:0"):
print('start with gpu')
run()
示例结果
[PhysicalDevice(name='/physical_device:CPU:0', device_type='CPU'), PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]
2022-01-07 17:49:09.013880: I tensorflow/core/platform/cpu_feature_guard.cc:151] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
Metal device set to: AMD Radeon Pro 5500M
systemMemory: 64.00 GB
maxCacheSize: 3.99 GB
2022-01-07 17:49:09.014762: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:305] Could not identify NUMA node of platform GPU ID 0, defaulting to 0. Your kernel may not have been built with NUMA support.
2022-01-07 17:49:09.015263: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:271] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 0 MB memory) -> physical PluggableDevice (device: 0, name: METAL, pci bus id: <undefined>)
start with cpu
60000 60000 10000 10000
Epoch 1/5
1875/1875 [==============================] - 2s 836us/step - loss: 0.2982 - accuracy: 0.9135
Epoch 2/5
1875/1875 [==============================] - 1s 795us/step - loss: 0.1431 - accuracy: 0.9569
Epoch 3/5
1875/1875 [==============================] - 2s 806us/step - loss: 0.1059 - accuracy: 0.9679
Epoch 4/5
1875/1875 [==============================] - 2s 1ms/step - loss: 0.0884 - accuracy: 0.9732
Epoch 5/5
1875/1875 [==============================] - 2s 848us/step - loss: 0.0744 - accuracy: 0.9764
313/313 - 0s - loss: 0.0780 - accuracy: 0.9770 - 254ms/epoch - 810us/step
start with gpu
60000 60000 10000 10000
Epoch 1/5
2022-01-07 17:49:18.964441: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:112] Plugin optimizer for device_type GPU is enabled.
1875/1875 [==============================] - 13s 7ms/step - loss: 0.2918 - accuracy: 0.9146
Epoch 2/5
1875/1875 [==============================] - 12s 6ms/step - loss: 0.1402 - accuracy: 0.9591
Epoch 3/5
1875/1875 [==============================] - 12s 6ms/step - loss: 0.1055 - accuracy: 0.9683
Epoch 4/5
1875/1875 [==============================] - 12s 6ms/step - loss: 0.0851 - accuracy: 0.9741
Epoch 5/5
1875/1875 [==============================] - 12s 6ms/step - loss: 0.0703 - accuracy: 0.9781
2022-01-07 17:50:19.926187: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:112] Plugin optimizer for device_type GPU is enabled.
313/313 - 1s - loss: 0.0784 - accuracy: 0.9759 - 1s/epoch - 4ms/step
环境安装
英文好的可以看参考原文 Tensorflow Plugin - Metal - Apple Developer
确保Python 是 3.8版本。不是的话, brew 安装一下
#查看py版本
python3 -V
# 不是3.8的话,安装一下
brew install python@3.8
# 创建虚拟环境
python3 -m venv ~/tensorflow-metal
source ~/tensorflow-metal/bin/activate
python -m pip install -U pip
# 安装 tensorflow-macos
SYSTEM_VERSION_COMPAT=0 python -m pip install。tensorflow-macos
python -m pip install tensorflow-metal
# 现在就可以跑上面的 demo 了
坑1: 本来使用anaconda来装,死活报下面的错,浪费了很多时间
PSGraph adamUpdateWithLearningRateTensor:beta1Tensor:beta2Tensor:epsilonTensor:beta1PowerTensor:beta2PowerTensor:valuesTensor:momentumTensor:velocityTensor:maximumVelocityTensor:gradientTensor:name:]: unrecognized selector sent to instance
坑2: 如果无法再现获取数据集,参考keras - How can I import the MNIST dataset that has been manually downloaded? - Stack Overflow
python 环境详情
❯ pip list
Package Version
---------------------------- ---------
absl-py 1.0.0
astunparse 1.6.3
cachetools 4.2.4
certifi 2021.10.8
charset-normalizer 2.0.10
flatbuffers 2.0
gast 0.4.0
google-auth 2.3.3
google-auth-oauthlib 0.4.6
google-pasta 0.2.0
grpcio 1.43.0
h5py 3.6.0
idna 3.3
importlib-metadata 4.10.0
keras 2.7.0
Keras-Preprocessing 1.1.2
libclang 12.0.0
Markdown 3.3.6
numpy 1.22.0
oauthlib 3.1.1
opt-einsum 3.3.0
pip 21.3.1
protobuf 3.19.1
pyasn1 0.4.8
pyasn1-modules 0.2.8
requests 2.27.1
requests-oauthlib 1.3.0
rsa 4.8
setuptools 56.0.0
six 1.15.0
tensorboard 2.7.0
tensorboard-data-server 0.6.1
tensorboard-plugin-wit 1.8.1
tensorflow-estimator 2.7.0
tensorflow-io-gcs-filesystem 0.23.1
tensorflow-macos 2.7.0
tensorflow-metal 0.3.0
termcolor 1.1.0
typing_extensions 4.0.1
urllib3 1.26.7
Werkzeug 2.0.2
wheel 0.37.1
wrapt 1.13.3
zipp 3.7.0