基于python镜像创建深度学习的docker环境(tensorflow/pytorch)

1 篇文章 0 订阅
1 篇文章 0 订阅

基于镜像 python:3.8.3-buster 创建tensorflow的docker环境。

我的系统环境是 Ubuntu18.04, Nvidia驱动版本是最新版 440.64.00

我为了测试方便, 直接在docker容器中配置, 而没有采用 Dockerfile 的方式打包镜像。

1 基础配置

拉取并运行镜像 (docker源尽量配置国内的通道, 快很多)

# 拉取镜像
docker pull python:3.8.3-buster

# 运行并进入镜像
docker run -it --name=py-3.8.3 python:3.8.3-buster bash

配置 apt 源、时区 等

# apt 源  使用清华源
echo -e "deb https://mirrors.tuna.tsinghua.edu.cn/debian/ buster main contrib non-free\ndeb https://mirrors.tuna.tsinghua.edu.cn/debian/ buster-updates main contrib non-free\ndeb https://mirrors.tuna.tsinghua.edu.cn/debian/ buster-backports main contrib non-free\ndeb https://mirrors.tuna.tsinghua.edu.cn/debian-security buster/updates main contrib non-free" > /etc/apt/sources.list
apt update 

# 配置时区
ln -fs /usr/share/zoneinfo/Asia/Shanghai /etc/localtime
dpkg-reconfigure --frontend noninteractive tzdata

# openssl 设置, 有些场景使用 TLSv1.2 会报错, 比如 pyodbc 连接 mssql
sed -i 's/TLSv1.[0-9]/TLSv1.0/g' /etc/ssl/openssl.cnf

# 安装vim curl 
apt install -y vim curl ca-certificates

配置 zsh 的默认环境和主题

# 安装 zsh
apt install -y zsh 
# 安装 oh-my-zsh
cd /usr/share/zsh
curl -fsSL https://raw.github.com/robbyrussell/oh-my-zsh/master/tools/install.sh > oh-my-zsh_install.sh    # 有可能下载有问题, 可以通过其他方式下载然后放到这个目录
vim oh-my-zsh_install.sh    # 修改安装目录 改为以下这一行
ZSH=/usr/share/zsh/oh-my-zsh

# 执行 oh-my-zsh 安装脚本
bash oh-my-zsh_install.sh

# 下载 oh-my-zsh 主题
git clone --depth=1 https://github.com/romkatv/powerlevel10k.git /usr/share/zsh/oh-my-zsh/themes/powerlevel10k
# 下载 两个插件
git clone https://github.com/zsh-users/zsh-syntax-highlighting.git /usr/share/zsh/oh-my-zsh/plugins/zsh-syntax-highlighting
git clone https://github.com/zsh-users/zsh-autosuggestions.git /usr/share/zsh/oh-my-zsh/plugins/zsh-autosuggestions

# 将 zsh.zshrc 文件添加到 /etc/zsh 目录中, 这个文件是我创建的 zshrc 的默认配置, 拷贝其他主机的然后修改得到, 具体内容放在本文末. 

# 编辑 /etc/profile 
vim /etc/profile    # 添加以下内容 主要使在引用该文件时自动加载zsh配置
if [[ "$SHELL" =~ "zsh" ]] && [ -f /etc/zsh/zshrc ]; then 
    . /etc/zsh/zshrc
fi

# 编辑 /etc/zsh/zshrc
vim /etc/zsh/zshrc    # 添加以下内容 主要用来自动加载zsh配置
if [ -f /etc/zsh/zsh.zshrc ]; then 
    . /etc/zsh/zsh.zshrc
fi

# 编辑 /etc/zsh/zshenv
vim /etc/zsh/zshenv    # 添加以下内容 避免首次进入zsh是进入zsh设置向导
if [[ ! -f ~/.zshrc ]]; then 
    touch ~/.zshrc
fi

配置 ssh, 以后使用ssh连接容器

# 安装 ssh 和 supervisor 等工具
apt install -y net-tools sudo openssh-client openssh-server supervisor

# 修改 supervisor 配置, 添加 sshd 服务
vim /etc/supervisor/supervisord.conf    # 按以下修改
[supervisord]
nodaemon=true
[program:sshd]
command=/usr/sbin/sshd -D

# 创建 /run/sshd
mkdir -p /run/sshd

# 修改 sudoers, 添加users用户组 使其可以免密sudo. 这个是为了我把这个镜像给其他同事建容器使用
vim /etc/sudoers    
%users         ALL=(ALL)       NOPASSWD: ALL

保存一下镜像

# 退出容器 保存容器为镜像
docker commit py-3.8.3 python:3.8.3-ssh

2 如何配置cuda支持

根据 nvidia docker 的说明, 系统驱动配置好, 然后安装 nvidia-container-toolkit 之后, 只需要在启动 docker 容器时 添加 --gpus all 即可让容器支持调用显卡. 但是我在使用上述镜像测试 tensorflow 时始终无法成功, 报错 failed call to cuInit: CUDA_ERROR_UNKNOWN, UNKNOWN 这个真的不好定位问题. 好在经过探索(探索过程就不在这里说了), 问题可以解决.

我的解决方式是, 使用 dockerfile 在以上的镜像基础上添加环境(目前不清楚dockerfile ENV命令具体带来的影响是啥, 参考的 nvidia 官方 cuda-base 镜像的 dockerfile

Dockerfile

FROM python:3.8.3-ssh
# nvidia-container-runtime
ENV NVIDIA_VISIBLE_DEVICES all
ENV NVIDIA_DRIVER_CAPABILITIES compute,utility
ENV NVIDIA_REQUIRE_CUDA "cuda>=10.2 brand=tesla,driver>=384,driver<385 brand=tesla,driver>=396,driver<397 brand=tesla,driver>=410,driver<411 brand=tesla,driver>=418,driver<419"
CMD [ "/usr/bin/supervisord" ]

使用 Dockerfile 构建镜像

# 在 Dockerfile 的目录下运行
docker build -t python:3.8.3-cuda .

3 安装cuda

# 启动容器 镜像配置的默认入口为 supervisor, supervisor 启动ssh, 所以后面通过 ssh 登录, 顺便测试 ssh
# 映射 /etc 的三个文件, 主要是为了使容器和当前系统的用户及密码一致, 以便可以使用当前系统账号ssh登录容器. 记得将 用户 归属到 users 组中, 以便在容器中可以使用sudo(上面配置了 users 组可以免密sudo, 不影响容器外)
# 下载 目录映射到 software, 主要我的cuda安装文件都在这里
# 假设当前系统账号为 user, 并且稍后使用该账号ssh登录容器, 为user账号映射home目录
mkdir -p /home/user/user
docker run -d \
    -p 20022:22 \
    -h cuda10 \
    -v /etc/passwd:/etc/passwd:ro\
    -v /etc/group:/etc/group:ro\
    -v /etc/shadow:/etc/shadow:ro\
    -v /home/user/下载:/home/user/software \
    -v /home/user/user:/home/user \
    --name=cuda10 \
    --gpus all \
    python:3.8.3-cuda 

# 使用 user 账号登录容器
ssh -p 20022 localhost    # 默认账号为当前账号, 即 user

安装 cuda, 这些文件都在 nvidia 官方即可下载

# cuda 按提示安装即可, 注意不要安装 driver
sudo sh ~/software/cuda_10.1.243_418.87.00_linux.run

# cudnn
cd ~/software
tar -zxf cudnn-10.1-linux-x64-v7.6.5.32.tgz
sudo cp -r ~/software/cuda/include/* /usr/local/cuda-10.1/include/
sudo cp -r ~/software/cuda/lib64/* /usr/local/cuda-10.1/lib64/
rm -rf -r ~/software/cuda

# TensorRT
tar -zxf TensorRT-6.0.1.5.Ubuntu-18.04.x86_64-gnu.cuda-10.1.cudnn7.6.tar.gz
sudo cp -r ~/software/TensorRT-6.0.1.5 /usr/local/
sudo ln -sf /usr/local/TensorRT-6.0.1.5 /usr/local/TensorRT
# 安装 tensorrt
cd /usr/local/TensorRT/python
sudo pip install ./tensorrt-6.0.1.5-cp37-none-linux_x86_64.whl    # 安装不上, 没有py38版本
# 安装 uff
cd ../uff/
sudo pip install uff-0.6.5-py2.py3-none-any.whl
which convert-to-uff
# 安装 graphsurgeon
cd ../graphsurgeon
sudo pip install graphsurgeon-0.4.1-py2.py3-none-any.whl

# 添加环境变量
sudo vim /etc/profile 和 sudo vim /etc/zsh/zshrc    # 添加以下三行
export CUDA_HOME=/usr/local/cuda
export LD_LIBRARY_PATH=${CUDA_HOME}/lib64:/usr/local/TensorRT/lib:$LD_LIBRARY_PATH
export PATH=${CUDA_HOME}/bin:$PATH

# 删除 pip 缓存
rm -rf ~/.cache/pip/http/*

退出容器, 保存一个包含cuda10.1的镜像

# 退出并关闭容器 
docker stop cuda10    # 以 cuda10 作为名字启动的容器
# 保存容器为镜像
docker commit cuda10 python:3.8.3-cuda10.1

4 安装tensorflow或pytorch

docker run -d \
    -p 20022:22 \
    -h tf2 \
    -v /etc/passwd:/etc/passwd:ro\
    -v /etc/group:/etc/group:ro\
    -v /etc/shadow:/etc/shadow:ro\
    -v /home/user/user:/home/user \
    --name=tf2 \
    --gpus all \
    python:3.8.3-cuda10.1

# 使用 user 账号登录容器
ssh -p 20022 localhost    # 默认账号为当前账号, 即 user

# 修改pip源
mkdir -p ~/.pip
echo '[global]\nindex-url = https://pypi.tuna.tsinghua.edu.cn/simple' > ~/.pip/pip.conf

# 安装tensorflow
pip install tensorflow    # 这里只是安装到个人目录下, 因为使用的是普通账号登录的

# 测试 tf 显卡是否可用
python -c "import tensorflow as tf; print(tf.test.is_gpu_available())"    # 结果应为 True
python -c "import tensorflow as tf; print(tf.config.list_physical_devices('GPU'))"    # 结果应为 具体的显卡 list

# 安装pytorch 
pip install torch

# 测试 pytorch 显卡是否能用
python -c "import torch; print(torch.cuda.is_available())"     # 结果应为 True

# 结果正常了

如此, 环境便配置好了.

5 附件 zsh.zshrc 的内容

# Enable Powerlevel10k instant prompt. Should stay close to the top of ~/.zshrc.
# Initialization code that may require console input (password prompts, [y/n]
# confirmations, etc.) must go above this block; everything else may go below.
if [[ -r "${XDG_CACHE_HOME:-$HOME/.cache}/p10k-instant-prompt-${(%):-%n}.zsh" ]]; then
  source "${XDG_CACHE_HOME:-$HOME/.cache}/p10k-instant-prompt-${(%):-%n}.zsh"
fi

# If you come from bash you might have to change your $PATH.
# export PATH=$HOME/bin:/usr/local/bin:$PATH

# Path to your oh-my-zsh installation.
export ZSH="/usr/share/zsh/oh-my-zsh"

# Set name of the theme to load --- if set to "random", it will
# load a random theme each time oh-my-zsh is loaded, in which case,
# to know which specific one was loaded, run: echo $RANDOM_THEME
# See https://github.com/ohmyzsh/ohmyzsh/wiki/Themes
#ZSH_THEME="robbyrussell"
ZSH_THEME="powerlevel10k/powerlevel10k"

# Set list of themes to pick from when loading at random
# Setting this variable when ZSH_THEME=random will cause zsh to load
# a theme from this variable instead of looking in $ZSH/themes/
# If set to an empty array, this variable will have no effect.
# ZSH_THEME_RANDOM_CANDIDATES=( "robbyrussell" "agnoster" )

# Uncomment the following line to use case-sensitive completion.
# CASE_SENSITIVE="true"

# Uncomment the following line to use hyphen-insensitive completion.
# Case-sensitive completion must be off. _ and - will be interchangeable.
# HYPHEN_INSENSITIVE="true"

# Uncomment the following line to disable bi-weekly auto-update checks.
# DISABLE_AUTO_UPDATE="true"

# Uncomment the following line to automatically update without prompting.
# DISABLE_UPDATE_PROMPT="true"

# Uncomment the following line to change how often to auto-update (in days).
# export UPDATE_ZSH_DAYS=13

# Uncomment the following line if pasting URLs and other text is messed up.
# DISABLE_MAGIC_FUNCTIONS=true

# Uncomment the following line to disable colors in ls.
# DISABLE_LS_COLORS="true"

# Uncomment the following line to disable auto-setting terminal title.
# DISABLE_AUTO_TITLE="true"

# Uncomment the following line to enable command auto-correction.
# ENABLE_CORRECTION="true"

# Uncomment the following line to display red dots whilst waiting for completion.
# COMPLETION_WAITING_DOTS="true"

# Uncomment the following line if you want to disable marking untracked files
# under VCS as dirty. This makes repository status check for large repositories
# much, much faster.
# DISABLE_UNTRACKED_FILES_DIRTY="true"

# Uncomment the following line if you want to change the command execution time
# stamp shown in the history command output.
# You can set one of the optional three formats:
# "mm/dd/yyyy"|"dd.mm.yyyy"|"yyyy-mm-dd"
# or set a custom format using the strftime function format specifications,
# see 'man strftime' for details.
# HIST_STAMPS="mm/dd/yyyy"

# Would you like to use another custom folder than $ZSH/custom?
# ZSH_CUSTOM=/path/to/new-custom-folder

# Which plugins would you like to load?
# Standard plugins can be found in $ZSH/plugins/
# Custom plugins may be added to $ZSH_CUSTOM/plugins/
# Example format: plugins=(rails git textmate ruby lighthouse)
# Add wisely, as too many plugins slow down shell startup.
plugins=(git)

source $ZSH/oh-my-zsh.sh

# User configuration

# export MANPATH="/usr/local/man:$MANPATH"

# You may need to manually set your language environment
# export LANG=en_US.UTF-8

# Preferred editor for local and remote sessions
# if [[ -n $SSH_CONNECTION ]]; then
#   export EDITOR='vim'
# else
#   export EDITOR='mvim'
# fi

# Compilation flags
# export ARCHFLAGS="-arch x86_64"

# Set personal aliases, overriding those provided by oh-my-zsh libs,
# plugins, and themes. Aliases can be placed here, though oh-my-zsh
# users are encouraged to define aliases within the ZSH_CUSTOM folder.
# For a full list of active aliases, run `alias`.
#
# Example aliases
# alias zshconfig="mate ~/.zshrc"
# alias ohmyzsh="mate ~/.oh-my-zsh"

# key bindings
bindkey "\e[1~" beginning-of-line
bindkey "\e[4~" end-of-line
bindkey "\e[5~" beginning-of-history
bindkey "\e[6~" end-of-history
# for rxvt
bindkey "\e[8~" end-of-line
bindkey "\e[7~" beginning-of-line
# for non RH/Debian xterm, can't hurt for RH/DEbian xterm
bindkey "\eOH" beginning-of-line
bindkey "\eOF" end-of-line
# for freebsd console
bindkey "\e[H" beginning-of-line
bindkey "\e[F" end-of-line
# completion in the middle of a line
# bindkey '^i' expand-or-complete-prefix
# Fix numeric keypad  
# # 0 . Enter  
bindkey -s "^[Op" "0"
bindkey -s "^[On" "."
bindkey -s "^[OM" "^M"
# # 1 2 3  
bindkey -s "^[Oq" "1"
bindkey -s "^[Or" "2"
bindkey -s "^[Os" "3"
# # 4 5 6  
bindkey -s "^[Ot" "4"
bindkey -s "^[Ou" "5"
bindkey -s "^[Ov" "6"
# # 7 8 9  
bindkey -s "^[Ow" "7"
bindkey -s "^[Ox" "8"
bindkey -s "^[Oy" "9"
# # + - * /  
bindkey -s "^[Ol" "+"
bindkey -s "^[Om" "-"
bindkey -s "^[Oj" "*"
bindkey -s "^[Oo" "/"

# To customize prompt, run `p10k configure` or edit ~/.p10k.zsh.
[[ ! -f ~/.p10k.zsh ]] || source ~/.p10k.zsh

. /usr/share/zsh/oh-my-zsh/plugins/zsh-syntax-highlighting/zsh-syntax-highlighting.zsh
. /usr/share/zsh/oh-my-zsh/plugins/zsh-autosuggestions/zsh-autosuggestions.zsh

POWERLEVEL9K_DISABLE_GITSTATUS=true
typeset -g POWERLEVEL9K_INSTANT_PROMPT=quiet

echo -e "\n提示: 当前运行的shell是zsh, 主题为powerlevel10k, 如果想要修改主题的效果, 请运行: p10k configure\n"
  • 1
    点赞
  • 6
    收藏
    觉得还不错? 一键收藏
  • 1
    评论
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值