之所以想写这篇文章,是因为在这件事情上花了我不少的力气,遇到了不少的问题,想记录下,赶时间的朋友,可以直接看二的截图。
起初是,我在win10上建了一个docker容器,容器是作为一个ssh服务器的,我可以通过xshell直接登录到容器,同时容器内启了一个bert serving服务,因为当时建容器时,写的启动命令是 /bin/bash,这样当我的win10重启时,我启动容器的过程就比较复杂。
我首先要经过 docker exec 进入到容器,去启动ssh,接着通过xshell 登录到我的容器,去启动bert serving 服务,感觉好麻烦,我想直接 docker start 之后,就直接可以通过xshell 登录到容器,同时 bert serving 服务也运行起来了。
在我进入容器后,需要执行如下几条命令
# 启动 ssh 服务
service ssh start
cd /root/bert_run
# 切换到 voice_bert 的 conda 环境
conda activate voice_bert
# 启动 bert serving 服务
bert-serving-start -model_dir /root/bert_run/chinese_L-12_H-768_A-12 -num_worker=2 -max_seq_len 64
一、明白前台和后台
我本机有ssh服务的镜像,我尝试着再建立一个容器,把启动命令改成 service ssh start, 如下
docker run -itd -p 20021:22 --name try_start_ssh -d 886eca19e611 service ssh start
我发现,容器一start ,立刻就退出,后来我才明白,service ssh start 运行结束,那么容器就会跟着结束,如果我想要容器不退出,那么需要让命令service ssh start 不退出,也就是需要ssh服务前台运行,根据参考文章[4] , 同时需要docker run启动多条命令,根据参考文章[2],立刻尝试
docker run -itd 20021:22 --name try_start_ssh -d 886eca19e611 sh -c "service ssh start && tail -f /var/log/wtmp"
非常完美,容器start后,没有退出,并且通过 xshell 连接上了容器
那么我只需要在docker run 的时候把上面那些命令都串起来,就可以直接启动 bert serving 服务了,只是我本地并没有 bert serving 的镜像,我得需要把 bert serving 容器保存为镜像,然后再 docker run 才行,但本地又会多一个镜像,冗余并且麻烦,我就想,能不能直接修改 docker run 的启动命令,此时发现了文章[3],我如获至宝,但我对文章的理解有限,并且我思考的也不深入,导致我把docker 容器的配置文件的 Path字段理解为,是需要写入全部的命令,这里花了我大量的时间,docker start 一直报错,后来我看到另一篇文章才发现我理解错了,只可惜我把书签弄丢了,感谢那位作者,下面是一些细节
二、直接修改容器启动命令
win10系统,如何找到 docker 的配置文件,可以参考文章[5]
当时只是粗浅的理解了文章[3],我把
sh -c "service ssh start && tail -f /var/log/wtmp"
这段命令都写在了 Path 的位置,结果在 docker start时,报如下错误
Error response from daemon: OCI runtime create failed: container_linux.go:380: starting container process caused: exec: "sh -c 'service ssh start && cd /root/bert_run && conda activate voice_bert'": stat sh -c 'service ssh start && cd /root/bert_run &&
conda activate voice_bert': no such file or directory: unknown
Error: failed to start containers: 05d9022ceb23
我经历了如下尝试:
- 缩减命令
因为我想知道哪里出了问题,但是依然没发现问题 - 文章[3]中提到要修改 WorkingDir
其实我压根不知道 WorkingDir 什么意思,我把它修改成了我的bert 启动目录,依然不行 - 把可执行程序改成全路径
因为我怀疑,找不到文件是指,找不到可执行程序,可是还是不行
绝望之中,我发现了另一篇文章,比较遗憾的是 文章链接找不到了,文章中说,需要改3个位置,如下,Path,Args,Cmd,截图中一看就明白
- Path
填写可执行程序 - Args
填写可执行程序的参数 - Cmd
是完整的命令,要分开写,用引号,以逗号隔开
迅速尝试,立刻就成功了
那我想,把我一开始的全部的命令合一块,我的 bert serving 服务,不是也可以直接启动吗,立刻动手,很兴奋,没报错,但是容器退出了,很惆怅,因为没报错,我在想是不是我的 bert 服务不能前台运行,我就把 tail 放到了后面,肯定可以前台了,,但还是不行,正在一筹莫展呢,我无意间点到了一个log文件,里面是docker start 报错的信息,所以我相信有上帝之手的存在,为我解决问题,提供了指明灯,其实在 docker 容器配置文件的目录,有个 log 文件的,记录了 docker start 启动的日志信息。
但接下来也不是很顺利,仍然报了好几个错
- 第一个报错
{"log":"sh: 1: conda: not found\r\n","stream":"stdout","time":"2022-10-15T09:04:41.5999796Z"}
这个好解决,加上全路径就 ok 了,因为报错是 conda 找不到
- 第二个报错
{"log":"CommandNotFoundError: Your shell has not been properly configured to use 'conda activate'.\r\n","stream":"stdout","time":"2022-10-15T09:27:56.0128233Z"}
{"log":"To initialize your shell, run\r\n","stream":"stdout","time":"2022-10-15T09:27:56.0128271Z"}
{"log":"\r\n","stream":"stdout","time":"2022-10-15T09:27:56.0128459Z"}
{"log":" $ conda init \u003cSHELL_NAME\u003e\r\n","stream":"stdout","time":"2022-10-15T09:27:56.0128485Z"}
{"log":"\r\n","stream":"stdout","time":"2022-10-15T09:27:56.0128515Z"}
{"log":"Currently supported shells are:\r\n","stream":"stdout","time":"2022-10-15T09:27:56.0128541Z"}
{"log":" - bash\r\n","stream":"stdout","time":"2022-10-15T09:27:56.0128568Z"}
{"log":" - fish\r\n","stream":"stdout","time":"2022-10-15T09:27:56.0129063Z"}
{"log":" - tcsh\r\n","stream":"stdout","time":"2022-10-15T09:27:56.0129097Z"}
{"log":" - xonsh\r\n","stream":"stdout","time":"2022-10-15T09:27:56.0129123Z"}
{"log":" - zsh\r\n","stream":"stdout","time":"2022-10-15T09:27:56.012915Z"}
{"log":" - powershell\r\n","stream":"stdout","time":"2022-10-15T09:27:56.0129177Z"}
{"log":"\r\n","stream":"stdout","time":"2022-10-15T09:27:56.0129203Z"}
{"log":"See 'conda init --help' for more information and options.\r\n","stream":"stdout","time":"2022-10-15T09:27:56.0129229Z"}
{"log":"\r\n","stream":"stdout","time":"2022-10-15T09:27:56.0129256Z"}
{"log":"IMPORTANT: You may need to close and restart your shell after running 'conda init'.\r\n","stream":"stdout","time":"2022-10-15T09:27:56.0129282Z"}
{"log":"\r\n","stream":"stdout","time":"2022-10-15T09:27:56.0129309Z"}
{"log":"\r\n","stream":"stdout","time":"2022-10-15T09:27:56.0129335Z"}
大概就是我的conda环境不好使,需要通过 conda init 初始化一下,我心想,我的环境是我之前都配置好的啊,也没想那么多,就多添加了条 conda init 的命令,还是报错,如下
- 第三个错误
{"log":" * Starting OpenBSD Secure Shell server sshd \u001b[80G \r\u001b[74G[ OK ]\r\n","stream":"stdout","time":"2022-10-15T09:42:31.970106Z"}
{"log":"no change /root/anaconda3/condabin/conda\r\n","stream":"stdout","time":"2022-10-15T09:42:32.4860911Z"}
{"log":"no change /root/anaconda3/bin/conda\r\n","stream":"stdout","time":"2022-10-15T09:42:32.4861107Z"}
{"log":"no change /root/anaconda3/bin/conda-env\r\n","stream":"stdout","time":"2022-10-15T09:42:32.4861143Z"}
{"log":"no change /root/anaconda3/bin/activate\r\n","stream":"stdout","time":"2022-10-15T09:42:32.4861174Z"}
{"log":"no change /root/anaconda3/bin/deactivate\r\n","stream":"stdout","time":"2022-10-15T09:42:32.4861204Z"}
{"log":"no change /root/anaconda3/etc/profile.d/conda.sh\r\n","stream":"stdout","time":"2022-10-15T09:42:32.4861234Z"}
{"log":"no change /root/anaconda3/etc/fish/conf.d/conda.fish\r\n","stream":"stdout","time":"2022-10-15T09:42:32.4861263Z"}
{"log":"no change /root/anaconda3/shell/condabin/Conda.psm1\r\n","stream":"stdout","time":"2022-10-15T09:42:32.4861387Z"}
{"log":"no change /root/anaconda3/shell/condabin/conda-hook.ps1\r\n","stream":"stdout","time":"2022-10-15T09:42:32.4861419Z"}
{"log":"no change /root/anaconda3/lib/python3.8/site-packages/xontrib/conda.xsh\r\n","stream":"stdout","time":"2022-10-15T09:42:32.4861449Z"}
{"log":"no change /root/anaconda3/etc/profile.d/conda.csh\r\n","stream":"stdout","time":"2022-10-15T09:42:32.4861478Z"}
{"log":"modified /root/.zshrc\r\n","stream":"stdout","time":"2022-10-15T09:42:32.4861508Z"}
{"log":"\r\n","stream":"stdout","time":"2022-10-15T09:42:32.4861536Z"}
{"log":"==\u003e For changes to take effect, close and re-open your current shell. \u003c==\r\n","stream":"stdout","time":"2022-10-15T09:42:32.4861565Z"}
因为我用了 conda init,我需要重启一下 shell,conda 环境才能生效,又查了下,加了个 reboot 的命令,发现 reboot 后面的命令都无法执行了,都有点想放弃了,凭借我的经验,因为我的conda 是已经配置好的,而且用的是 zsh环境,那我执行一下 zsh 是不是就可以了,迅速尝试,容器终于启动起来了,但是客户端无法连接到 bert serving 服务,发现还得把日志透出来才行,重新修改
修改 bert 的命令,重定向一下,如下,发现日志文件就没有,同时我客户端也连不上
bert-serving-start -model_dir /root/bert_serving/chinese_L-12_H-768_A-12 -num_worker=2 -max_seq_len 64 -graph_tmp_dir /root/bert_serving/bert_tmp_file >bert_serving.log 2>&1
我猜测是目录的问题,日志和临时文件写不进去,然后看了下线上的容器,发现日志都是重定向到挂在的目录中,又修改 bert 的启动命令如下,把日志和临时文件都输出到挂载的目录上,同时修改docker 配置文件中的 WorkingDir 为挂载目录 /data
service ssh start && cd /data/bert_tmp_file && zsh && conda activate voice_bert && bert-serving-start -model_dir /root/bert_serving/chinese_L-12_H-768_A-12 -num_worker=2 -max_seq_len 64 -graph_tmp_dir /data/bert_tmp_file >/data/bert_log/bert_serving.log 2>&1
还是不行,依然没死心,我认为是目录权限的问题,把目录bert_tmp_file 和 bert_log 都赋予 777 权限,如下:
chmod 777 bert_tmp_file
chmod 777 bert_log
我这才死心,先搁置吧,也能用,必须得进入到容器中,启动 bert serving 服务才行
受到文章[6]的启发,决定把命令写入 start.sh 的脚本中试试,尝试了脚本中用 zsh 解析、启动命令用zsh 解析等等,依然不行,在不断尝试的过程中,发现 bert_serving.log 中有一条日志,大致是找不到 bert-serving-start 命令,我想,都通过 conda activate voice_bert 把环境切换到 voice_bert conda环境了,为啥找不到命令 bert-serving-start呢?这才考虑到,可能是通过命令 conda activate voice_bert 无法切换环境,这才找到了解决的办法,最终写成的脚本如下:
#!/usr/bin/bash
rm -rf /root/bert_serving/bert_tmp_file/*
rm -rf /root/bert_serving/bert_log/*
service ssh start
cd /root/bert_serving/bert_tmp_file
#conda activate voice_bert
source /root/anaconda3/bin/activate voice_bert
nohup bert-serving-start -model_dir /root/bert_serving/chinese_L-12_H-768_A-12 -num_worker=2 -max_seq_len 64 -graph_tmp_dir /root/bert_serving/bert_tmp_file -http_max_connect 100 >/root/bert_serving/bert_log/bert_serving.log 2>&1 &
tail -f /var/log/wtmp
针对 docker 的配置文件,我做了如下修改
最终大功告成
参考文章:
[1] 由于我的浏览器问题,有几篇很重要的文章链接丢失了,无法链接
[2] docker run一次执行多条命令的方法
[3] 直接修改docker run 启动命令
[4] ssh 服务前台运行
[5] win10 下找到 docker 配置文件
[6] 说要把启动命令写入脚本中
[7] 通过 source切换conda环境1
[8] 需要通过 bash 执行