lattice输出到<transition-id,后验概率>对齐到每帧

get_phone_post计算phone上概率并产生tacc和transform.mat这里介绍的是从声学模型计算到phone上的对齐每帧的后验概率

计算chain声学模型帧对齐上的<pdf-id,后验概率>(不算语言模型)这里介绍的是从声学模型计算到pdf-id上的对齐到每帧的后验概率(做了部分max概率筛选,如125帧数x2[pdf-id 概率])

nnet3-compute计算chain前向传播概率矩阵(声学模型输出)这里介绍的是从声学模型计算到pdf-id上的对齐每帧的后验概率矩阵(这里输出的是矩阵如125帧数x4064pdf个数)


1 模型预测产生lat.1.gz

这里请参考理解lattice,这里有介绍解码过程中如何产生lattice内容文件,以及lattice内容分析。


2 lattice输出<transition-id,后验概率>并对齐到每帧

这里主要是通过lattice-to-post命令来计算。

gunzip -c 20200921.lat.bin.gz | lattice-to-post ark:- ark,t:-|head -n 1

通过分析可观察出这里音频id=HAO0007501-000000输出的维度是125帧(chain模型下采样帧),然后每一帧上把概率较大的均输出出来了。

通过与lattice文件对比,发现,这里的如[ 1 0.0004011169 3486 0.9995988 ],这里的id是transition-id(总transition-id数为8692个),后面紧跟的是其后验概率

hmm-info exp/chain/tdnn_1a_sp/final.mdl
hmm-info exp/chain/tdnn_1a_sp/final.mdl 
number of phones 217
number of pdfs 4064
number of transition-ids 8692
number of transition-states 4346
HAO0007501-000000 [ 2 0.9999999 ] [ 1 0.9999999 ] [ 1 0.9999999 ] [ 1 0.9999999 ] [ 1 0.9999999 ] [ 1 0.9999999 ] [ 1 0.9999999 ] [ 1 0.9999999 ] [ 1 0.9999999 ] [ 1 0.9999999 ] [ 1 0.9999999 ] [ 1 0.9999999 ] [ 1 0.9999999 ] [ 1 0.9999999 ] [ 1 0.9999999 ] [ 1 0.9999999 ] [ 1 0.9999999 ] [ 1 0.9999999 ] [ 1 0.9999999 ] [ 1 0.9999999 ] [ 1 0.9999999 ] [ 1 0.9999999 ] [ 1 0.9999999 ] [ 1 0.9999999 ] [ 1 0.9999999 ] [ 1 0.9999999 ] [ 1 0.9999999 ] [ 1 0.9999999 ] [ 1 0.9999999 ] [ 1 0.9999999 ] [ 1 0.9999999 ] [ 1 0.9999999 ] [ 1 0.9999999 ] [ 1 0.9999999 ] [ 1 0.9999999 ] [ 1 0.0004011169 3486 0.9995988 ] [ 3485 0.9995988 8046 0.0004011169 ] [ 3485 0.9995988 8045 0.0004011169 ] [ 3485 0.9995988 8045 0.0004011169 ] [ 2970 0.0005311575 3716 0.0005455278 3956 0.0005301464 3994 0.997992 7930 0.0004011169 ] [ 3955 0.0005301464 3993 0.997992 7662 0.0005455278 7668 0.0005311575 7929 0.0004011169 ] [ 3955 0.0005301464 3993 0.003060278 5866 0.9949318 7661 0.0005455278 7667 0.0005311575 7929 0.0004011169 ] [ 2072 0.001132725 3955 0.0005301464 3993 0.002623991 4432 0.0004362874 5865 0.993799 7270 0.0005311575 7304 0.0005455278 7929 0.0004011169 ] [ 2072 0.139322 3955 0.0005301464 3993 0.002623991 4464 0.001132725 5865 0.854477 6982 0.0004362874 7269 0.0005311575 7303 0.0005455278 7929 0.0004011169 ] [ 2072 0.0006666106 4504 0.0005301464 4514 0.0004362874 4534 0.1331031 4588 0.0005311575 4608 0.002623991 4638 0.0004011169 4724 0.0005455278 5062 0.0004889528 5865 0.8538104 5892 0.005729984 6982 0.001132725 ] [ 390 0.003590424 1672 0.0006666106 4514 0.001132725 4533 0.1331031 4587 0.0005311575 4637 0.0004011169 4723 0.0005455278 5061 0.0004889528 5865 0.8538104 5891 0.005729984 ] [ 36 0.0003180968 389 0.003590424 390 0.1264679 2072 0.8595404 2184 0.0004889528 2902 0.0006666106 3346 0.007563362 3858 0.001364277 ] [ 2 0.0003180968 389 0.1300583 2071 0.8595404 2183 0.0004889528 2901 0.0006666106 3345 0.007563362 3857 0.001364277 ] [ 5674 0.0003180968 5676 0.0006666106 5688 0.001364277 5730 0.1300583 5806 0.007563362 5832 0.8595404 5838 0.0004889528 ] [ 5673 0.0003180968 5675 0.0006666106 5687 0.001364277 5729 0.1300583 5805 0.007563362 5831 0.8595404 5837 0.0004889528 ] [ 5673 0.0003180968 5675 0.0006666106 5687 0.001364277 5729 0.1300583 5805 0.007563362 5831 0.8595404 5837 0.0004889528 ] [ 3210 1 ] [ 3209 1 ] [ 3209 1 ] [ 3209 1 ] [ 4226 1 ] [ 4225 1 ] [ 3422 1 ] [ 3421 1 ] [ 3421 1 ] [ 2628 1 ] [ 2627 1 ] [ 528 0.001163401 584 0.9988366 ] [ 527 0.001163401 583 0.9988366 ] [ 4162 0.9988366 4202 0.001163401 ] [ 3762 0.9977787 4161 0.001057859 4201 0.001163401 ] [ 2918 0.001057859 3761 0.9973952 3894 0.001163401 5008 0.0003835548 ] [ 1450 0.001057859 1586 0.001163401 2968 0.0003835548 3761 0.9973952 ] [ 1449 0.001057859 1504 0.0003835548 1585 0.001163401 1642 0.9973952 ] [ 76 1 ] [ 75 1 ] [ 75 1 ] [ 1452 1 ] [ 1451 1 ] [ 3164 1 ] [ 3163 1 ] [ 2650 1 ] [ 2649 1 ] [ 2649 1 ] [ 2649 1 ] [ 6928 1 ] [ 6927 1 ] [ 6927 1 ] [ 6927 1 ] [ 236 0.04848694 6927 0.9515131 ] [ 74 0.04848694 6927 0.9515131 ] [ 73 0.04848694 6927 0.9515131 ] [ 73 0.04848694 6927 0.9515131 ] [ 73 0.04848694 6927 0.9515131 ] [ 73 0.04848694 6927 0.9515131 ] [ 73 0.04848694 6927 0.9515131 ] [ 73 0.04848694 6927 0.9515131 ] [ 73 0.04848694 6927 0.9515131 ] [ 73 0.04848694 6927 0.9515131 ] [ 73 0.04848694 6927 0.9515131 ] [ 73 0.04848694 6927 0.9515131 ] [ 2 1 ] [ 1 1 ] [ 1 1 ] [ 1 1 ] [ 1 1 ] [ 1 1 ] [ 1 1 ] [ 1 1 ] [ 1 1 ] [ 1 1 ] [ 1 1 ] [ 1 1 ] [ 1 1 ] [ 1 1 ] [ 1 1 ] [ 1 1 ] [ 1 1 ] [ 1 1 ] [ 1 1 ] [ 1 1 ] [ 1 1 ] [ 1 1 ] [ 1 1 ] [ 1 1 ] [ 1 1 ] [ 1 1 ] [ 1 1 ] [ 1 1 ] [ 1 1 ] 
...

【注意】It's derived from posteriors in the lattice, so yes, it would be affected from the language model. There are different ways of getting phone posteriors, I mentioned another one【如下】 that doesn't depend on the language modelDan说这种从lattice产生的后验概率是受语言模型影响计算出来的,而直接从声学模型计算到phone上的后验概率是不依赖语言模型计算得出来的

【mentioned another one】There are different ways, this has been discussed on the list before. One involves lattice-to-post and then post-to-phone-post, but it will give sparse posteriors. Or do nnet3-compute (using the xent output, you may have to use nnet3-edits to rename it to 'output'), and put that into post-to-phone-post with --transition-id-counts set (so that it expects pdf-ids on the input).

Reference

How to get the acoustic probability from chain model?
ow can we calculate the posterior probabilities of phone according to force alignment

  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值