Espnet ASR Demo & Quantization Document

  • This is a document of how to run Espnet (v1) ASR Demo and its model quantization
  • Test enviroment:
UbuntuCUDAGCC
21.0411.611.2

Installation

Note: Please follow the original installation guide provided by Espnet. Only some notes below should be paid attention to.

Requirements

soxsndfileffmpegflac
installedinstallednot installednot installed

Install Kaldi

Exactly follow the installation guide
Notes:

  • The Kaldi installation includes two parts: 1. tools installation 2. src installation. Make sure install them all in order
  • Once installed, many .o binary files can be found in directories such as: <kaldi-root>\{featbin,fgmmbin,fstbin,etc.}

Install Espnet

Exactly follow the installation guide
Notes:

  • Kaldi should be linked into <espnet>/tools (check guide)
  • Option A) Setup Anaconda environment is choosen in this document, so a virtual enviroment espnet is created with python==3.8
  • Since the current CUDA version is 11.6, which is not compatible with pytorch 1.10.1, so espnet should be installed by $ make TH_VERSION=1.10.1 CUDA_VERSION=11.3, which specifies the version pytorch and CUDA
  • Custom tools in [Optional] Custom tool installation are not installed
  • install chainer in the espnet conda enviroment by pip install chainer==6.0.0 (cupy is not installed due to some errors)

Run ASR Demo

This demo is to decode (translate) .wav audio file into words

Notes: some

  1. Prepare the audio file
    eg. the test.wav file in espnet/utils
    Put the .wav file in espnet/egs/tedlium2/asr1
  2. Perform decoding
    a. cd espnet/egs/tedlium2/asr1 and source ./path.sh
    b. recog_wav.sh --models <downloaded-model> test.wav
    Notes: The default approach is to use godown package, which could cause a time out error due to the network disconnection. In this case, the model file, eg. model.streaming.v1.tar.gz, need to be downloaded manually from google drive (see Espnet readme)
    Then, modify the download_from_google_drive.sh file in espnet/utils directory as follows:
    a. create a variable manual_download_dir that specifies the path of the downloaded model file. eg. manual_download_dir="/home/glinttsd/espnet/egs/tedlium2/asr1/model.streaming.v1.tar.gz"
    b. replace the codes in line 46-47 with
    	if [ -f "$manual_download_dir" ]
    	then 
    	echo "File download locally"
    	decompress "${manual_download_dir}" "${download_dir}"
    	else
    	echo "File download from url: ${share_url}"
    	gdown --id "${file_id}" -O "${tmp}"
    	decompress "${tmp}" "${download_dir}"
    	fi
    
    which skips the download part and decompress the model file directly.

Model Quantization

To quantize the model from FP32 to INT8

Espnet provides dynamic quantization method through pytorch API.

To enable dynamic quantization, add the following codes in espnet/utils/recog_wav.sh file line 248-249

        --quantize-asr-model True \
        --quantize-dtype "qint8" \

Now we can perform decoding as described in the last section

More usage can be found here

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 1
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值