java调用kaldi_kaldi - Online Audio Server（服务器客户端建立方法-旧版在线解码）

weixin_39627455

于 2021-02-27 04:29:34 发布

阅读量110

点赞数

文章标签： java调用kaldi

版权声明：本文为博主原创文章，遵循 CC 4.0 BY-SA 版权协议，转载请附上原文出处链接和本声明。

本文链接：https://blog.csdn.net/weixin_39627455/article/details/114782189

版权

在kaldi 的工具集里有好几个程序可以用于在线识别。这些程序都位在src/onlinebin文件夹里，他们是由src/online文件夹里的文件编译而成(你现在可以用make ext 命令进行编译)。这些程序大多还需要tools文件夹中的portaudio 库文件支持，portaudio 库文件可以使用tools文件夹中的相应脚本文件下载安装。

# 安装portaudio

yum -y install *alsa*

cd kaldi/tools

./install_portaudio.sh

# 编译在线识别工具

cd src/

make ext

[toc] ##一、服务器客户端识别系统建立方法建立整个在线识别系统需要：

准备两台机器，都安装kaldi；

作为服务器的机器，准备好声音模型、词典、解码网络、特征转换矩阵(我还没有使用转换矩阵)

首先启动服务器，待服务器运行后，再启动客户端连接。

###1. Command line to start the server(服务器端启动方式): 使用如下指令online-audio-server-decode-faster启动服务器：

online-audio-server-decode-faster --verbose=1 --rt-min=0.5 --rt-max=3.0 --max-active=6000 \

--beam=72.0 --acoustic-scale=0.0769 final.mdl graph/HCLG.fst graph/words.txt '1:2:3:4:5' \

graph/word_boundary.int 5010 final.mat

####1.1 Arguments are as follow(参数意义):

final.mdl - the acoustic model

HCLG.fst - the complete FST

words.txt - word dictionary (mapping word ids to their textual representation)

'1:2:3:4:5' - list of silence phoneme ids

5010 - port the server is listening on

word_boundary.int- a list of phoneme boundary information required for word alignemnt

final.mat - feature LDA matrix

注意：如果没有word_boundary.int 需要重新运行prepare_lang.sh生成。修改如下：

#原指令：

utils/prepare_lang.sh --position-dependent-phones false data/local/dict "" \

data/local/lang data/lang

#改为：

utils/prepare_lang.sh data/local/dict "" data/local/lang data/lang

启动后结果如下：

###2. Command line to start the client(客户端启动方式): 直接运行如下指令即可启动客户端：

online-audio-client --htk --vtt localhost 5010 scp:test.scp

####2.1 Arguments are as follow(参数意义):

–htk - save results as an HTK label file

–vtt - save results as a WebVTT file

localhost - server to connect to

5010 - port to connect to

scp:test.scp - list of WAV files to send 启动后客户端不断传输数据，服务器实时进行解码！结果如下：

结果是边传输边识别的：

###* Command line to start the Java client(移动客户端): 移动客户端我还未尝试：

java -jar online-audio-client.jar

Or simply double-click the JAR file in the graphical interface.

##二、使用麦克风建立客户端与服务器的实时解码 kaldi提供了读取客户端麦克风数据的解码工具，可以在客户端使用麦克风发送音频，服务器实时返回解码数据。

###1. 使用online-server-gmm-decode-faster启动服务器：

通过网络接收特征进行解码。话语分词是即时完成的。如果给出可选(最后)参数，则使用特征拼接/ LDA变换。否则默认使用delta / delta-delta(2阶)特征。

Usage: online-server-gmm-decode-faster [options] model-infst-in word-symbol-table silence-phones udp-port [lda-matrix-in]

Example: online-server-gmm-decode-faster --rt-min=0.3 --rt-max=0.5 --max-active=4000 --beam=12.0 --acoustic-scale=0.0769 model HCLG.fst words.txt '1:2:3:4:5' 1234 lda-matrix

###2. 使用online-net-client启动客户端：

通过online-net-client工具，使用麦克风(portaudio)作为输入，提取特征并通过网络连接发送它们到服务器上。具体设置如下：

Usage: online-net-client server-address server-port

Options:

--batch-size : The number of feature vectors to be extracted and sent in one go (int, default = 27)

Standard options:

--config : Configuration file to read (this option may be repeated) (string, default = "")

--help : Print out usage message (bool, default = false)

--print-args : Print the command line arguments (to stderr) (bool, default = true)

--verbose : Verbose level (higher->more logging) (int, default = 0)

参考：kaldi首页

weixin_39627455

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫

评论

被折叠的条评论为什么被折叠?

到【灌水乐园】发言

查看更多评论

添加红包

成就一亿技术人!

hope_wisdom

发出的红包

实付元

使用余额支付

点击重新获取

扫码支付

钱包余额 0

抵扣说明：

1.余额是钱包充值的虚拟货币，按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载，可以购买VIP、付费专栏及课程。