docker+pytorch+kaldi+gpu环境搭建

在服务器中使用docker运行自己的kaldi程序,并且使用服务器上的gpu

我创建了一个没有编译好的仓库,可以直接拉取,然后在/opt 目录下从 3.kaldi安装与编译 开始安装

docker pull xinyinan/pytorch1.2-cuda10.0-kaldi:latest

写在前面,因为要使用sad_rats这一用例,坑点较多,一一解决

1.docker容器创建

地址:https://hub.docker.com/r/pytorch/pytorch/tags
切记要选择1.2-cuda10.0-cudnn7-devel 这一镜像,不要选择1.2-cuda10.0-cudnn7-runtime,这一镜像会导致你的docker容器中 /usr/local 没有cuda 和cudnn。相比安装cuda和cudnn,还是docker更省事。

docker run -p 0000:0000 -it --gpus '"device=0,1,2,3"' -v /workspace:/workspace pytorch/pytorch:1.2-cuda10.0-cudnn7-devel /bin/bash
-v  选择服务器到本地映射
-p 端口映射(可以不设置)
--gpus 选择你的gpu,也可以  --gpus all 选择所有gpu

要选择cuda10.0-cudnn7是因为我使用的kaldi不支持cuda10.0以外的。只要你的宿主机不低于cuda10.0就可以

2.容器环境检查&更新

进入容器后首先看两个

nvidia-smi   #查看gpu是否可用
nvcc -V      #查看nvcc是否安装

如果不成功返回第一步重试,因为如果还需要安装cuda,cudnn这些,那么docker便失去了他存在的意义

随后更新

apt-get update

最近拉取的docker镜像,会有一个安全验证问题,如下

[root@3e78fc56e245:/workspace# apt-get update
Get:1 http://archive.ubuntu.com/ubuntu bionic InRelease [242 kB]               
Get:2 http://security.ubuntu.com/ubuntu bionic-security InRelease [88.7 kB]    
Get:3 https://developer.download.nvidia.cn/compute/cuda/repos/ubuntu1804/x86_64  InRelease [1581 B]
Err:3 https://developer.download.nvidia.cn/compute/cuda/repos/ubuntu1804/x86_64  InRelease
  The following signatures couldn't be verified because the public key is not available: NO_PUBKEY Axxxxxxxxxx

但是此时不用管,记住他报错的这个 NO_PUBKEY Axxxxxxxxxx,此时不影响安装sudo,先安装sudo

apt-get install sudo

安装完成后,解决apt-get update报错。如下,这个Axxxx就是上面的NO_PUBKEY Axxxxxxxxxx

apt-key adv --keyserver keyserver.ubuntu.com --recv-keys Axxxxxxxxxx

这时可以执行第一步了`

apt-get update

继续安装一些需要的库

sudo apt install vim git wget

执行成功后更新

sudo apt-get upgrade

3.kaldi安装与编译

此时可以安心安装kaldi了。建议直接在/workspace 中安装.这边建议从gitee上克隆项目,因为快

git clone https://gitee.com/jalaliden/kaldi.git
root@3e78fc56e245:/workspace# git clone https://gitee.com/jalaliden/kaldi.git
Cloning into 'kaldi'...
remote: Enumerating objects: 113240, done.
remote: Total 113240 (delta 0), reused 0 (delta 0), pack-reused 113240
Receiving objects: 100% (113240/113240), 120.99 MiB | 9.71 MiB/s, done.
Resolving deltas: 100% (87482/87482), done.

大概几分钟搞定,然后开始进行编译,官网有两种编译方式,一个是cmake,另一种是一步步执行,本文选择一步步执行,以下文件来自于/workspace/kaldi/tools/INSTALL

To check the prerequisites for Kaldi, first run

extras/check_dependencies.sh

and see if there are any system-level installations you need to do.
Check the output carefully. There are some things that will make your
life a lot easier if you fix them at this stage. If your system
default C++ compiler is not supported, you can do the check with
another compiler by setting the CXX environment variable, e.g.

CXX=g+±4.8 extras/check_dependencies.sh

Then run

make

which by default will install ATLAS headers, OpenFst, SCTK and
sph2pipe. OpenFst requires a relatively recent C++ compiler with C++11
support, e.g. g++ >= 4.7, Apple clang >= 5.0 or LLVM clang >= 3.3. If
your system default compiler does not have adequate support for C++11,
you can specify a C++11 compliant compiler as a command argument, e.g.

make CXX=g+±4.8

If you have multiple CPUs and want to speed things up, you can do a
parallel build by supplying the “-j” option to make, e.g. to use 4
CPUs

make -j 4

In extras/, there are also various scripts to install extra bits and
pieces that are used by individual example scripts. If an example
script needs you to run one of those scripts, it will tell you what to
do.

首先进入/kaldi/tools,并执行

cd kaldi/tools
extras/check_dependencies.sh

接下来你会看到一些输出

[root@3e78fc56e245:/workspace/kaldi/tools# extras/check_dependencies.sh
extras/check_dependencies.sh: automake is not installed.
extras/check_dependencies.sh: autoconf is not installed.
extras/check_dependencies.sh: unzip is not installed.
extras/check_dependencies.sh: sox is not installed.
extras/check_dependencies.sh: gfortran is not installed
extras/check_dependencies.sh: neither libtoolize nor glibtoolize is installed
extras/check_dependencies.sh: subversion is not installed
extras/check_dependencies.sh: python2.7 is not installed
extras/check_dependencies.sh: Intel MKL does not seem to be installed.
 ... Run extras/install_mkl.sh to install it. Some distros (e.g., Ubuntu 20.04) provide
 ... a version of MKL via the package manager, but verify that it is up-to-date.
 ... You can also use other matrix algebra libraries. For information, see:
 ...   http://kaldi-asr.org/doc/matrixwrap.html
extras/check_dependencies.sh: Some prerequisites are missing; install them using the command:
  sudo apt-get install automake autoconf unzip sox gfortran libtool subversion python2.7

随后执行以下,都来源于上面结果的最后一行

sudo apt-get install automake autoconf unzip sox gfortran libtool subversion python2.7

安装完成后执行

extras/install_mkl.sh

继续,这一时间比较长,需要等待一会

make

结束后执行,-j 后面可以加数字,代表你的电脑cpu线程数量

make -j 4

这一步执行结束后,进入src目录继续编译,以下文件来自于/workspace/kaldi/src/INSTALL

These instructions are valid for UNIX-like systems (these steps have
been run on various Linux distributions; Darwin; Cygwin). For native
Windows compilation, see …/windows/INSTALL.

You must first have completed the installation steps in
…/tools/INSTALL (compiling OpenFst; getting ATLAS and CLAPACK
headers).

The installation instructions are

./configure --shared
make depend -j 8
make -j 8

Note that we added the “-j 8” to run in parallel because “make” takes
a long time. 8 jobs might be too many for a laptop or small desktop
machine with not many cores.

For more information, see documentation at http://kaldi-asr.org/doc/
and click on “The build process (how Kaldi is compiled)”.

cd ../src
./configure --shared   

一步一步,以下两步的数字8代表你的cpu数量

make depend -j 8

一步一步

make -j 8

到这里就算是完全安装成功了,恭喜你,由于之前有些步骤结果没记录,所以没办法贴成功的输出,先预祝你成功吧

4.关于kaldi的一些小细节

首先可能你会遇到一个找不到numpy的错误,kaldi有点贱的地方是,不仅要为python3.x安装还要为python2.x安装,保险起见,你需要这样

sudo apt install python-numpy
sudo apt install python3-numpy
pip install numpy

这样才会无论如何都不报模块找不到的错,其他库同理

还有一个示例的环境配置问题
进入你的示例,比如/workspace/kaldi/egs/sad_rats/s5,打开cmd.sh

vim cmd.sh

修改为

#you can change cmd.sh depending on what type of queue you are using.
#If you have no queueing system and want to run on a local machine, you
#can change all instances ‘queue.pl’ to run.pl (but be careful and run
#commands one by one: most recipes will exhaust the memory on your
#machine). queue.pl works with GridEngine (qsub). slurm.pl works
#with slurm. Different queues are configured differently, with different
#queue names and different ways of specifying things like memory;
#to account for these differences you can create and edit the file
#conf/queue.conf to match your queue’s configuration. Search for
#conf/queue.conf in http://kaldi-asr.org/doc/queue.html for more information,
#or search for the string ‘default_config’ in utils/queue.pl or utils/slurm.pl.

export train_cmd=“run.pl”
export decode_cmd=“run.pl”

下一步,打开path.sh

vim path.sh

修改第一行路径为/workspace/kaldi

export KALDI_ROOT=/workspace/kaldi

执行

. ./path.sh

此时就没什么问题了

5.题外话

根据视频:docker教学总结
有以下一些内容仅供参考

基本知识

在这里插入图片描述

获得镜像:从远程仓库直接下

docker pull 镜像名字/:lastest

远程仓库找镜像:dockerhub
查看镜像:看存在哪些镜像

docker images

将镜像运行为一个容器:

docker run -it -v 本地路径:docker路径(workspace) 镜像名字 /bin/bash

-d 代表后台运行
-P 代表内外端口映射
-v代表文件映射
接下来下面会有一段字符串,字符串就是运行容器的id

查看正在运行的容器:

docker ps -a

-a 查看所有的容器
docker容器exit退出后会停止,不停止运行退出,ctrl+P+Q
开启docker容器

docker start 容器id

进入docker中查看:

docker exec it 容器id /bin/bash

注释:容器id只要能区别开只用几个即可

删除容器:

docker rm -f 容器id

删除镜像:

docker rmi 镜像名字

容器保存为一个镜像:

docker commit 容器id 镜像名字

根据dockerfile创建出一个docker镜像:

docker built -t 镜像名称 .

注释:. 代表在当前路径下

将镜像保存为tar文件:

docker save 镜像名字 >tar文件名.tar

读取tar文件:

docker load <tar文件名.tar

总结

这个批工具弄了我好久,差点被领导干了,不过终于成功了,代码现在跑的飞起,我本来想用pytorch自己撸一套代码下来的,但是耐心看过kaldi之后,不得不说这个工具做的真心不错,各个地方都考虑到了,包括gpu的利用率以及后续代码上线等等,都很成熟,开始庆幸自己没撸代码了。祝好!

  • 2
    点赞
  • 4
    收藏
    觉得还不错? 一键收藏
  • 8
    评论
评论 8
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值