为Tensorflow2.x提供GPU支持

青棱

已于 2023-06-23 18:41:56 修改

阅读量626

点赞数

于 2022-11-14 23:44:52 首次发布

本文链接：https://blog.csdn.net/yunyangyy/article/details/127857640

版权

docker 同时被 2 个专栏收录

8 篇文章 0 订阅

订阅专栏

gpu

5 篇文章 0 订阅

订阅专栏

Tensorflow-gpu安装

安装docker及nvidia-docker

在Ubuntu上安装Docker并使得Docker支持GPU

安装Tensorflow

在宿主机上安装GPU驱动

查找合适的Nvidia驱动器版本并安装

sudo ubuntu-drivers devices
sudo ubuntu-drivers autoinstall

使用python3.8作为基础镜像

拉取镜像
```
sudo docker pull python:3.8
```

编写docker-compose.yml

version: '3'
services:
	tensorflow_gpu:
		container_name: tensorflow_gpu
		image: python:3.8
		user: "0"
		working_dir: /home
		volumes:
			./src:/home
		deploy:
			resources:
				reservations:
					devices:
						- driver: nvidia
						  count: "all"
						  capabilities: [gpu]
		stdin_open: true
		tty: true
		command: /bin/bash -c "chown -R 1002:1002 . && /bin/bash"

创建容器

sudo docker-compose up -d

在Docker内需且仅需安装cuda

查看最大支持的cuda版本
```
nvidia-smi
```

选择指定版本的cuda，安装类型选择runfile(local)

wget https://developer.download.nvidia.com/compute/cuda/10.1/Prod/local_installers/cuda_10.1.243_418.87.00_linux.run

CUDA Toolkit Archive | NVIDIA Developer

安装cuda，选择仅安装cuda-toolkit，并配置环境变量

sh cuda_10.1.243_418.87.00_linux.run
echo "export LD_LIBRARY_PATH=/usr/local/cuda/lib64:/usr/local/cuda/extras/CPUTI/lib64" >> ~/.bashrc
echo "export CUDA_HOME=/usr/local/cuda/bin" >> ~/.bashrc
echo "export PATH=$PATH:$LD_LIBRARY_PATH:$CUDA_HOME" >> ~/.bashrc
source ~/.bashrc

Ubuntu安装cuda

若报错Failed to verify gcc version. See log at /var/log/cuda-installer.log for details.，则添加--override参数或安装cuda对应版本的gcc

添加--override参数
```
sh cuda_10.1.243_418.87.00_linux.run --override
```
- ubuntu22.10安装cuda出错Failed to verify gcc version. See log at /var/log/cuda-installer.log for details.

查看cuda对应的Versioned Online Documentation，安装合适版本的gcc

cp /etc/apt/sources.list /etc/apt/sources.list.bak
echo "deb https://mirrors.ustc.edu.cn/ubuntu/ focal main restricted universe multiverse" > /etc/apt/sources.list
echo "deb https://mirrors.ustc.edu.cn/ubuntu/ focal-security main restricted universe multiverse" >> /etc/apt/sources.list
echo "deb https://mirrors.ustc.edu.cn/ubuntu/ focal-updates main restricted universe multiverse" >> /etc/apt/sources.list
echo "deb https://mirrors.ustc.edu.cn/ubuntu/ focal-backports main restricted universe multiverse" >> /etc/apt/sources.list
apt update | grep NO_PUBKEY
gpg --keyserver keyserver.ubuntu.com --recv-keys $key
gpg --export --armor $key | apt-key add -
apt update
apt install gcc-7 -y
apt install g++-7 -y
apt upgrade -y
apt autoremove
update-alternatives --install /usr/bin/gcc gcc /usr/bin/gcc-7 90
update-alternatives --install /usr/bin/g++ g++ /usr/bin/g++-7 90
update-alternatives --install /usr/bin/gcc gcc /usr/bin/gcc-10 50
update-alternatives --install /usr/bin/g++ g++ /usr/bin/g++-10 50
update-alternatives --config gcc

CUDA Toolkit Archivee
apt update 报错NO_PUBKEY

在Docker内安装tensorflow

pip安装对应版本的tensorflow

pip install tensorflow==2.3.0

安装 GPU 支持项

测试tensorflow，此时可以导入包，但无法检测到GPU

python
import tensorflow as tf
tf.test.is_gpu_available()

在Docker内安装cudnn

下载解压对应版本的cudnn，并移动到cuda中

 cp -r -d $path/lib64/* /usr/local/cuda/lib64/

cuDNN Archive
安装cudnn

重新测试tensorflow，此时可以检测到GPU

import tensorflow as tf
tf.config.list_physical_devices('GPU')

强制使用CPU

import os
os.environ["CUDA_DEVICE_ORDER"] = "PCI_BUS_ID" 
os.environ["CUDA_VISIBLE_DEVICES"] = "-1"

环境设置需在执行tf初始化前使用

Tensoflow案例

函数拟合

import os

import tensorflow as tf

 
class Linear(tf.keras.Model):
    def __init__(self):
        super().__init__()
        self.dense = tf.keras.layers.Dense(
            units=1,
            activation=None,
            kernel_initializer=tf.zeros_initializer(),
            bias_initializer=tf.zeros_initializer()
        )
 
    def call(self, input):
        output = self.dense(input)
        return output


def demo_func():
    X = tf.constant([[1.0, 2.0, 3.0], [4.0, 5.0, 6.0]])
    y = tf.constant([[7.0], [8.0]])

    model = Linear()
    optimizer = tf.keras.optimizers.SGD(learning_rate=0.01)
    for i in range(1000):
        with tf.GradientTape() as tape:
            y_pred = model(X)      
            loss = tf.reduce_mean(tf.square(y_pred - y))
        # 使用model.variables直接获得模型中的所有变量
        grads = tape.gradient(loss, model.variables)    
        optimizer.apply_gradients(grads_and_vars=zip(grads, model.variables))
        if i % 100 == 0:
            print(i, loss.numpy())
    print(model.variables)

if __name__ == "__main__":
    print('Tensorflow vesion:{}'.format(tf.__version__))
    
    use_gpu = True
    if use_gpu:
        print('Default to GPU')
        print('GPU Info:{}'.format(tf.config.list_physical_devices('GPU')))
    else:
        print('Set to use CPU')
        os.environ["CUDA_DEVICE_ORDER"] = "PCI_BUS_ID" 
        os.environ["CUDA_VISIBLE_DEVICES"] = "-1"
        print('GPU Info:{}'.format(tf.config.list_physical_devices('GPU')))

    demo_func()