一、创建requirements.txt
torch>=1.4.0
torchvision>=0.5.0
dominate>=2.4.0
visdom>=0.1.8.8
wandb
二、创建Dockerfile
FROM nvidia/cuda:12.1.1-cudnn8-devel-ubuntu22.04
MAINTAINER Ma Yue
ENV LANG=C.UTF-8 LC_ALL=C.UTF-8
RUN sed -i 's#http://archive.ubuntu.com/#http://mirrors.aliyun.com/#' /etc/apt/sources.list \
&& sed -i 's#http://security.ubuntu.com/#http://mirrors.aliyun.com/#' /etc/apt/sources.list
#RUN apt-key adv --keyserver keyserver.ubuntu.com --recv-keys A4B469963BF863CC
RUN apt-get update \
&& apt-get install -y software-properties-common curl libgl1-mesa-glx libglib2.0-0 libsm6 libxrender1 libxext-dev scrot vim\
&& add-apt-repository ppa:deadsnakes/ppa \
&& apt-get remove -y software-properties-common \
&& apt autoremove -y \
&& apt-get update \
&& apt-get install -y python3.8 python3.8-dev python3.8-distutils\
&& curl -o /tmp/get-pip.py "https://bootstrap.pypa.io/get-pip.py" \
&& python3.8 /tmp/get-pip.py \
&& apt-get remove -y curl \
&& apt autoremove -y \
&& rm -rf /var/lib/apt/lists/*
COPY requirements.txt /requirements.txt
RUN ln -sfn /usr/bin/python3.8 /usr/bin/python3 && ln -sfn /usr/bin/python3 /usr/bin/python && ln -sfn /usr/bin/pip3 /usr/bin/pip
RUN pip install --no-cache-dir -r /requirements.txt -i https://pypi.tuna.tsinghua.edu.cn/simple
FROM nvidia/cuda:12.1.1-cudnn8-devel-ubuntu22.04
MAINTAINER Ma Yue
ENV LANG=C.UTF-8 LC_ALL=C.UTF-8
#定义时区参数
ENV TZ=Asia/Shanghai
#设置时区
RUN ln -snf /usr/share/zoneinfo/$TZ /etc/localtime && echo '$TZ' > /etc/timezone
RUN sed -i 's#http://archive.ubuntu.com/#http://mirrors.aliyun.com/#' /etc/apt/sources.list \
&& sed -i 's#http://security.ubuntu.com/#http://mirrors.aliyun.com/#' /etc/apt/sources.list
#RUN apt-key adv --keyserver keyserver.ubuntu.com --recv-keys A4B469963BF863CC
RUN apt-get update \
&& apt-get install -y software-properties-common curl libgl1-mesa-glx libglib2.0-0 libsm6 libxrender1 libxext-dev scrot vim\
&& add-apt-repository ppa:deadsnakes/ppa \
&& apt-get remove -y software-properties-common \
&& apt autoremove -y \
&& apt-get update \
&& apt-get install -y python3.8 python3.8-dev python3.8-distutils\
&& curl -o /tmp/get-pip.py "https://bootstrap.pypa.io/get-pip.py" \
&& python3.8 /tmp/get-pip.py \
&& apt-get remove -y curl \
&& apt autoremove -y \
&& rm -rf /var/lib/apt/lists/*
COPY requirements.txt /requirements.txt
RUN ln -sfn /usr/bin/python3.8 /usr/bin/python3 && ln -sfn /usr/bin/python3 /usr/bin/python && ln -sfn /usr/bin/pip3 /usr/bin/pip
RUN pip install --no-cache-dir -r /requirements.txt -i https://pypi.tuna.tsinghua.edu.cn/simple
三、配置DOCKER
先进入root模式:sudo su
输入密码:[sudo] studentfive 的密码:
进入项目目录
root@admin:
cd /date1/sun/docker
得到:Dockerfile requirements.txt
root@admin:/date1/sun/docker# docker build -t cyclegan:1.0 .
docker build -t cyclegan:1.0 .
docker build --network host -t cyclegan:1.0 .
build失败了,查看一下映射是否成功: docker images
没有项目,继续build:docker build -t cyclegan:1.0 .
build成功了
进入docker镜像:docker images
映射项目文件夹:
docker run --name cyclegan -v /date1/sun/cyclegan/pytorch-CycleGAN-and-pix2pix-master:/date1/sun/cyclegan/pytorch-CycleGAN-and-pix2pix-master -it -d --gpus all -e PYTHONIOENCODING=UTF-8 cyclegan:1.0 /bin/bash
得到:24607a3542cae01f14817931916127acc348995618591190e37d975cdcb9c757
查看docker: docker ps -a
另外,如果需要使用visdom,则需在创建docker时映射端口:
docker run --name cyclegan -v /date1/sun/cyclegan/pytorch-CycleGAN-and-pix2pix-master:/date1/sun/cyclegan/pytorch-CycleGAN-and-pix2pix-master -p 8888:8097 -it -d -e PYTHONIOENCODING=UTF-8 --gpus all cyclegan:1.0 /bin/bash
其中,8097是docker中visdom的端口,8888是自己创建的在ubuntu中的端口
四、在docker中项目开发
进docker:
docker exec -it cyclegan /bin/bash
进项目:
cd /date1/sun/cyclegan/pytorch-CycleGAN-and-pix2pix-master/
然后进行项目开发,例如运行:
python train.py --dataroot ./datasets/outline2micro --name outline2micro_cyclegan --model cycle_gan
测试:
python test.py --dataroot ./datasets/outline2micro --name outline2micro_cyclegan --model cycle_gan
结束之后退出docker:exit
五、停止、删除所有的docker容器和镜像
列出所有的容器 ID
docker ps -a
停止所有的容器
docker stop $(docker ps -aq)
删除所有的容器
docker rm $(docker ps -aq)
删除所有的镜像
docker rmi $(docker images -q)
六、
print(torch.__version__)
torch.cuda.is_available()
七、启动visdom
python -m visdom.server
1.如果报错,
Checking for scripts.
Downloading scripts, this may take a little while
ERROR:root:Error [Errno -3] Temporary failure in name resolution while downloading https://unpkg.com/jquery@3.1.1/dist/jquery.min.js
ERROR:root:Error [Errno -3] Temporary failure in name resolution while downloading https://unpkg.com/bootstrap@3.3.7/dist/js/bootstrap.min.js
ERROR:root:Error [Errno -3] Temporary failure in name resolution while downloading https://unpkg.com/react@16.2.0/umd/react.production.min.js
解决方法:
进入docker后,输入
cd /usr/local/lib/python3.8/dist-packages/visdom/server
vim run_server.py
按i,进入Insert模式,下到最下面将这一行 download_scripts()注释掉,
def download_scripts_and_run():
# download_scripts()
main()
按ESC键,输入:wq!退出
2.如果visdom网页蓝屏,且终端报错
ERROR:tornado.general:Could not open static file '/usr/local/lib/python3.8/dist-packages/visdom/static/js/plotly-plotly.min.js'
ERROR:tornado.general:Could not open static file '/usr/local/lib/python3.8/dist-packages/visdom/static/js/d3.v3.min.js'
ERROR:tornado.general:Could not open static file '/usr/local/lib/python3.8/dist-packages/visdom/static/js/d3-selection-multi.v1.js'
ERROR:tornado.general:Could not open static file '/usr/local/lib/python3.8/dist-packages/visdom/static/js/saveSvgAsPng.js'
ERROR:tornado.general:Could not open static file '/usr/local/lib/python3.8/dist-packages/visdom/user/style.css'
INFO:tornado.access:200 GET / (172.17.0.1) 20.92ms
WARNING:tornado.access:404 GET /static/css/bootstrap.min.css (172.17.0.1) 2.68ms
WARNING:tornado.access:404 GET /static/js/jquery.min.js (172.17.0.1) 1.36ms
WARNING:tornado.access:404 GET /static/js/bootstrap.min.js (172.17.0.1) 1.19ms
WARNING:tornado.access:404 GET /static/css/react-resizable-styles.css (172.17.0.1) 1.31ms
WARNING:tornado.access:404 GET /static/css/react-grid-layout-styles.css (172.17.0.1) 1.12ms
WARNING:tornado.access:404 GET /static/js/react-react.min.js (172.17.0.1) 1.89ms
WARNING:tornado.access:404 GET /static/js/react-dom.min.js (172.17.0.1) 2.23ms
WARNING:tornado.access:404 GET /static/js/layout_bin_packer.js (172.17.0.1) 2.55ms
解决方法:
先删除容器里的static文件夹,
cd /usr/local/lib/python3.8/dist-packages/visdom
ls看一下,应该存在static文件夹
输入:
rm -r static
ls看一下,应该已经删除了static文件夹
输入exit退出docker模式,自动进入root模式
然后在root@admin下执行命令:
docker cp /date1/sun/docker/static cyclegan:/usr/local/lib/python3.8/dist-packages/visdom
watch -n 0.1 nvidia-smi
https://www.cnblogs.com/2205254761qq/p/11863928.html
bashdistribution=$(. /etc/os-release;echo $ID$VERSION_ID)
curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add -
curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list
sudo apt-get update
sudo apt-get install -y nvidia-docker2
bash sudo systemctl restart docker
服务器重启之后,找不懂到mage, docker,
systemctl status docker
1.当docker状态是running
先重新拉个镜像(不确定这步可不可以省略),之后跟着链接里的走。
服务器重启之后,docker ps 看不到任何容器_重启容器服务 linux-CSDN博客
内容为:
重启服务器之后,docker服务没有做自启动·
docker ps
报错:Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?
因为没启动,需要手动重启
systemctl start docker
然后docker ps 是没有任何容器启动的
所以要查询所有容器的id
docker container ls -a
这时候你能看到所有你之前被关掉的容器
start一下,就可以了
docker container start 01738895328c(container Id)
stop同上了。希望对你有帮助。
自动重启:
docker update --restart=always NAME(容器的名称)
————————————————
版权声明:本文为博主原创文章,遵循 CC 4.0 BY-SA 版权协议,转载请附上原文出处链接和本声明。
原文链接:https://blog.csdn.net/FBX_fbx_FBX/article/details/109570694
2. 当docker状态是failed
输入,
sudo dockerd --debug
得到报错,
failed to start daemon: Error initializing network controller: error creating default "bridge" network: cannot create network ee4e7f4c5317b3dec2b6fe5db0044ba535c038e4282709f5fd4bce3f3fe4638f (docker0): conflicts with network c9e21497a9d4268189fe6a0b16279efb81d45bce8564104aa983ab1570f03659 (docker0): networks have same bridge name
解决方法,
sudo rm -rf /var/lib/docker/network
sudo systemctl start docker
- 如果还是报错 ;
sudo systemctl stop firewalld
sudo systemctl start docker
sudo systemctl status docker
之后,同上
systemctl start docker
然后docker ps 是没有任何容器启动的
所以要查询所有容器的id
docker container ls -a
这时候你能看到所有你之前被关掉的容器
start一下,就可以了
docker container start 01738895328c(container Id)
3. 连接服务器蓝屏/黑屏
方法1:重启xrdp服务 (个人推荐)
sudo /etc/init.d/xrdp restart
之后使用账号密码成功登陆:
xrdp远程桌面蓝/黑屏问题解决方法_xrdp远程桌面进不去界面-CSDN博客
4.
启用 Docker 服务开机自启动
为了确保 Docker 服务在系统重启后自动启动,可以启用它:
sudo systemctl enable docker