一:系统环境
1.Ubuntu 22.04.3 LTS
lemon@lemonPrecisionTower:~/my_file/yolo5_docker$ cat /etc/os-release
PRETTY_NAME="Ubuntu 22.04.4 LTS"
NAME="Ubuntu"
VERSION_ID="22.04"
VERSION="22.04.4 LTS (Jammy Jellyfish)"
VERSION_CODENAME=jammy
ID=ubuntu
ID_LIKE=debian
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
UBUNTU_CODENAME=jammy
lemon@lemonPrecisionTower:~/my_file/yolo5_docker$
2.安装Nvidia驱动以及Cuda驱动
Nvidia驱动下载地址:Official Drivers | NVIDIA
下载对应系统的驱动版本
Cuda驱动:CUDA Toolkit 12.4 Update 1 Downloads | NVIDIA Developer
安装好之后请自行核对驱动是否正确
lemon@lemonPrecisionTower:~/my_file/yolo5_docker$ nvidia-smi
Wed May 1 21:14:55 2024
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 545.29.06 Driver Version: 545.29.06 CUDA Version: 12.3 |
|-----------------------------------------+----------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+======================+======================|
| 0 Tesla P40 Off | 00000000:02:00.0 Off | Off |
| N/A 37C P0 52W / 250W | 174MiB / 24576MiB | 0% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+
+---------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=======================================================================================|
| 0 N/A N/A 1550 G /usr/lib/xorg/Xorg 22MiB |
| 0 N/A N/A 3907 C+G ...libexec/gnome-remote-desktop-daemon 149MiB |
+---------------------------------------------------------------------------------------+
lemon@lemonPrecisionTower:~/my_file/yolo5_docker$ nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2024 NVIDIA Corporation
Built on Thu_Mar_28_02:18:24_PDT_2024
Cuda compilation tools, release 12.4, V12.4.131
Build cuda_12.4.r12.4/compiler.34097967_0
lemon@lemonPrecisionTower:~/my_file/yolo5_docker$
3.Docker以及Docker-Compose
需要提前安装好Docker,最好Docker版本在19.x.x 以上,19版本以下的需要安装Nvidia-docker
Docker19 和 Docker-compose2.x 开始提供调用显卡的支持,除此之外还需要安装 nvidia-container-toolkit 用于提供Docker Container调用Nvidia显卡的支持。
Docker/Docker-compose的安装可以见我的另一篇博客:Linux离线安装Docker,Docker-compose适用于CentOS7,RockyLinux8,Ubuntu22.04,Ubuntu20.04-CSDN博客
lemon@lemonPrecisionTower:~/my_file/yolo5_docker$ docker -v
Docker version 26.1.0, build 9714adc
lemon@lemonPrecisionTower:~/my_file/yolo5_docker$ docker-compose -v
Docker Compose version v2.26.1
lemon@lemonPrecisionTower:~/my_file/yolo5_docker$
4.nvidia-container-toolkit
distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add -
curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list
sudo apt-get update && sudo apt-get install -y nvidia-container-toolkit
sudo systemctl restart docker
二:Dockerfile文件
我这是是基于Ubuntu 20.04 系统 Cuda 12.1 pytorch 2.2.2 版本构建文件
BaseImage
可以去Nvidia的官方Docker 镜像仓库中搜索你需要使用的Cuda版本以及对应系统的镜像
我这里选定的是 12.2.0-base-ubuntu20.04 因为我的系统是Ubuntu 22.04 我比较熟悉Ubuntu系
Python Version
去Python的官方下载地址中选择你所使用的Python版本
我开发使用的是 Python 3.9.18 所以我的Dockerfilw文件中就自动安装了Python 3.9.18版本
PyTorch Version
我选用的是 PyTorch 2.2.2 原本在我写文章测试的时候还是最新版本,现在最新发布了2.3.0
直接进入PyTorch 的历史版本页面,选择你需要的版本:
基于这些基础环境构建Dockerfile文件,基于官方Cuda镜像 nvidia/cuda:12.2.0-base-ubuntu20.04
,安装Python 3.9.18 之后安装Yolo v5运行时所需要的包,并安装PyTorch,其中具体的版本搭配需要看你所需的版本
# CUDA基础镜像
FROM nvidia/cuda:12.2.0-base-ubuntu20.04
# 安装基础包
RUN apt update && \
apt install -y \
wget build-essential zlib1g-dev libncurses5-dev libgdbm-dev libnss3-dev libssl-dev \
libreadline-dev libffi-dev libsqlite3-dev libbz2-dev liblzma-dev && \
apt clean && \
rm -rf /var/lib/apt/lists/*
WORKDIR /temp
# 下载python
RUN wget https://www.python.org/ftp/python/3.9.18/Python-3.9.18.tgz && \
tar -xvf Python-3.9.18.tgz
# 编译&安装python
RUN cd Python-3.9.18 && \
./configure --enable-optimizations && \
make && \
make install
WORKDIR /workspace
RUN rm -r /temp && \
ln -s /usr/local/bin/python3 /usr/local/bin/python && \
ln -s /usr/local/bin/pip3 /usr/local/bin/pip
# 安装 pytorch
RUN pip3 install torch==2.2.2 torchvision==0.17.2 torchaudio==2.2.2 --index-url https://download.pytorch.org/whl/cu121
# 安装需要的插件
RUN pip3 install absl-py==2.1.0 Brotli==1.0.9 cachetools==5.3.3 certifi==2024.2.2 \
charset-normalizer==3.3.2 colorama==0.4.6 contourpy==1.1.1 cycler==0.12.1 \
filelock==3.13.1 fonttools==4.49.0 fsspec==2023.4.0 gmpy2==2.1.2 \
google-auth==2.28.1 google-auth-oauthlib==1.0.0 grpcio==1.62.0 idna==3.6 \
importlib-metadata==7.0.1 importlib-resources==6.1.2 Jinja2==3.1.2 \
kiwisolver==1.4.5 Markdown==3.5.2 MarkupSafe==2.1.3 matplotlib==3.7.5 \
mpmath==1.3.0 opencv-python==4.9.0.80 opencv-python-headless==4.9.0.80 \
networkx==3.0 numpy==1.23.5 oauthlib==3.2.2 \
packaging==23.2 pandas==2.0.3 pillow==10.2.0 protobuf==4.25.3 pyasn1==0.5.1 \
pyasn1-modules==0.3.0 pycocotools==2.0.7 pyparsing==3.1.1 PySocks==1.7.1 \
python-dateutil==2.9.0.post0 pytz==2024.1 PyYAML==6.0.1 requests==2.31.0 \
requests-oauthlib==1.3.1 rsa==4.9 scipy==1.10.1 seaborn==0.13.2 \
setuptools==68.2.2 six==1.16.0 sympy==1.12 tensorboard==2.14.0 \
tensorboard-data-server==0.7.2 thop==0.1.1.post2209072238 tqdm==4.66.2 \
triton==2.2.0 typing-extensions==4.8.0 tzdata==2024.1 urllib3==2.1.0 \
Werkzeug==3.0.1 wheel==0.41.2 zipp==3.17.0 && \
rm -r /root/.cache/pip
# 安装 ffmpeg
ENV TZ=Asia/Shanghai \
DEBIAN_FRONTEND=noninteractive
RUN apt-get update && apt-get install ffmpeg libsm6 libxext6 -y
WORKDIR /yoloTrain
构建镜像
docker build -t image_name:image_tag -f ./dockerfile_path .
lemon@lemonPrecisionTower:~/my_file/yolo5_docker$ docker build -t lemon/pytorch-yolo5:test_2 -f ./Dockerfile-py3918-cuda121-pytorch222 .
[+] Building 0.1s (15/15) FINISHED docker:default
=> [internal] load build definition from Dockerfile-py3918-cuda121-pytorch222 0.0s
=> => transferring dockerfile: 2.40kB 0.0s
=> [internal] load metadata for docker.io/nvidia/cuda:12.2.0-base-ubuntu20.04 0.0s
=> [internal] load .dockerignore 0.0s
=> => transferring context: 2B 0.0s
=> [ 1/11] FROM docker.io/nvidia/cuda:12.2.0-base-ubuntu20.04 0.0s
=> CACHED [ 2/11] RUN apt update && apt install -y wget build-essential zlib1g-dev libncurses5-dev libgdbm-dev libnss3-dev libssl-dev libreadline-dev libffi-dev libsqlite3-dev libbz2 0.0s
=> CACHED [ 3/11] WORKDIR /temp 0.0s
=> CACHED [ 4/11] RUN wget https://www.python.org/ftp/python/3.9.18/Python-3.9.18.tgz && tar -xvf Python-3.9.18.tgz 0.0s
=> CACHED [ 5/11] RUN cd Python-3.9.18 && ./configure --enable-optimizations && make && make install 0.0s
=> CACHED [ 6/11] WORKDIR /workspace 0.0s
=> CACHED [ 7/11] RUN rm -r /temp && ln -s /usr/local/bin/python3 /usr/local/bin/python && ln -s /usr/local/bin/pip3 /usr/local/bin/pip 0.0s
=> CACHED [ 8/11] RUN pip3 install torch==2.2.2 torchvision==0.17.2 torchaudio==2.2.2 --index-url https://download.pytorch.org/whl/cu121 0.0s
=> CACHED [ 9/11] RUN pip3 install absl-py==2.1.0 Brotli==1.0.9 cachetools==5.3.3 certifi==2024.2.2 charset-normalizer==3.3.2 colorama==0.4.6 contourpy==1.1.1 cycler==0.12.1 filelock==3.13.1 fon 0.0s
=> CACHED [10/11] RUN apt-get update && apt-get install ffmpeg libsm6 libxext6 -y 0.0s
=> CACHED [11/11] WORKDIR /yoloTrain 0.0s
=> exporting to image 0.0s
=> => exporting layers 0.0s
=> => writing image sha256:7a558d670b90c1b8939c735a952e6c31ef51744c945406d0c6da931eb3a43399 0.0s
=> => naming to docker.io/lemon/pytorch-yolo5:test_2 0.0s
lemon@lemonPrecisionTower:~/my_file/yolo5_docker$ docker images
REPOSITORY TAG IMAGE ID CREATED SIZE
lemon/pytorch-yolo5 test_1 7a558d670b90 3 hours ago 10.2GB
lemon/pytorch-yolo5 test_2 7a558d670b90 3 hours ago 10.2GB
lemon/pytorch pytorch2.2.2-py3.8.10-cuda12.2.0-ubuntu20.04 0e94c70d8f33 2 days ago 51.9GB
lemon/pytorch-yolo v1 5a9259d2aead 2 days ago 9.9GB
lemon/pytorch_test pytorch2.2.2-py3.8.10-cuda12.2.0-ubuntu20.04 5a9259d2aead 2 days ago 9.9GB
lemon/pytorch 2.2.2-py3.9.10-cuda12.2.0-ubuntu20.04 6ae13bdc8b55 2 days ago 6.29GB
lemon/pytorch pytorch2.2.2-py3.9.10-cuda12.2.0-ubuntu20.04 6ae13bdc8b55 2 days ago 6.29GB
nginx latest 7383c266ef25 7 days ago 188MB
redis latest 7fc37b47acde 3 weeks ago 116MB
mysql latest 6f343283ab56 5 weeks ago 632MB
nvidia/cuda 12.2.0-base-ubuntu20.04 2664f46b9de0 5 months ago 242MB
hello-world latest d2c94e258dcb 12 months ago 13.3kB
mysql 8.0.17 b8fd9553f1f0 4 years ago 445MB
lemon@lemonPrecisionTower:~/my_file/yolo5_docker$
构建成功就可以看到刚才打包的镜像,后续我们基于刚才的镜像运行容器
三:Docker-Compose
准备工作
在Docker-compose文件中我们将Yolo v5的代码挂载进Dockerfile中创建的工作目录 :/yoloTrain
在此之前需要你先调通程序并调试好代码,上面Dockerfile中安装的运行时环境包也可以根据你的运行环境自行更改。
运行容器
version: '3.8'
services:
cuda121_yolo_5:
image: lemon/pytorch-yolo5:test_1
container_name: cuda121_yolo_5
stdin_open: true
tty: true
shm_size: 200G
volumes:
- /etc/localtime:/etc/localtime:ro
- /home/lemon/my_file/yolo_v5/:/yoloTrain/
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: all
capabilities: [gpu]
lemon@lemonPrecisionTower:~/my_file/yolo5_docker$ docker-compose -f docker-compose-yolo5-train.yml up WARN[0000] /home/lemon/my_file/yolo5_docker/docker-compose-yolo5-train.yml: `version` is obsolete [+] Running 1/0 ✔ Container cuda121_yolo_5 Created 0.0s Attaching to cuda121_yolo_5
稍后容器启动后进入容器中,尝试运行训练代码:
lemon@lemonPrecisionTower:~/my_file/yolo_v5/yolov5-5.0$ docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
f6a265b97218 lemon/pytorch-yolo5:test_1 "/bin/bash" 3 hours ago Up 2 minutes cuda121_yolo_5
e5cacb5c9f33 mysql:8.0.17 "docker-entrypoint.s…" 6 days ago Up 12 hours 0.0.0.0:3306->3306/tcp, :::3306->3306/tcp, 33060/tcp mysql_data-mysql_8_0_17-1
lemon@lemonPrecisionTower:~/my_file/yolo_v5/yolov5-5.0$ docker exec -it cuda121_yolo_5 /bin/bash
root@f6a265b97218:/yoloTrain# cd yolov5-5.0
root@f6a265b97218:/yoloTrain/yolov5-5.0# python train.py