Espnet ASR Demo & Quantization Document
- This is a document of how to run Espnet (v1) ASR Demo and its model quantization
- Test enviroment:
Ubuntu | CUDA | GCC |
---|---|---|
21.04 | 11.6 | 11.2 |
Installation
Note: Please follow the original installation guide provided by Espnet. Only some notes below should be paid attention to.
Requirements
sox | sndfile | ffmpeg | flac |
---|---|---|---|
installed | installed | not installed | not installed |
Install Kaldi
Exactly follow the installation guide
Notes:
- The Kaldi installation includes two parts: 1. tools installation 2. src installation. Make sure install them all in order
- Once installed, many
.o
binary files can be found in directories such as:<kaldi-root>\{featbin,fgmmbin,fstbin,etc.}
Install Espnet
Exactly follow the installation guide
Notes:
- Kaldi should be linked into
<espnet>/tools
(check guide) Option A) Setup Anaconda environment
is choosen in this document, so a virtual enviromentespnet
is created withpython==3.8
- Since the current CUDA version is 11.6, which is not compatible with pytorch 1.10.1, so
espnet
should be installed by$ make TH_VERSION=1.10.1 CUDA_VERSION=11.3
, which specifies the version pytorch and CUDA - Custom tools in
[Optional] Custom tool installation
are not installed - install chainer in the
espnet
conda enviroment bypip install chainer==6.0.0
(cupy
is not installed due to some errors)
Run ASR Demo
This demo is to decode (translate)
.wav
audio file into words
Notes: some
- Prepare the audio file
eg. thetest.wav
file inespnet/utils
Put the.wav
file inespnet/egs/tedlium2/asr1
- Perform decoding
a.cd espnet/egs/tedlium2/asr1
andsource ./path.sh
b.recog_wav.sh --models <downloaded-model> test.wav
Notes: The default approach is to usegodown
package, which could cause a time out error due to the network disconnection. In this case, the model file, eg.model.streaming.v1.tar.gz
, need to be downloaded manually from google drive (see Espnet readme)
Then, modify thedownload_from_google_drive.sh
file inespnet/utils
directory as follows:
a. create a variablemanual_download_dir
that specifies the path of the downloaded model file. eg.manual_download_dir="/home/glinttsd/espnet/egs/tedlium2/asr1/model.streaming.v1.tar.gz"
b. replace the codes in line 46-47 with
which skips the download part and decompress the model file directly.if [ -f "$manual_download_dir" ] then echo "File download locally" decompress "${manual_download_dir}" "${download_dir}" else echo "File download from url: ${share_url}" gdown --id "${file_id}" -O "${tmp}" decompress "${tmp}" "${download_dir}" fi
Model Quantization
To quantize the model from FP32 to INT8
Espnet provides dynamic quantization method through pytorch API.
To enable dynamic quantization, add the following codes in espnet/utils/recog_wav.sh
file line 248-249
--quantize-asr-model True \
--quantize-dtype "qint8" \
Now we can perform decoding as described in the last section
More usage can be found here