声学模型学习笔记(五) SDT(MMI/BMMI/MPE/sMBR)

DNN训练使用的CE准则是基于每一帧进行分类的优化,最小化帧错误率,但是实际上语音识别是一个序列分类的问题,更关心的是序列的准确性。所以引入SDT(sequence-discriminative training),训练准则更符合实际,有利于提升识别率。常用的准则包括MMI/BMMI、MPE、MBR等。

准则目标函数
CE帧错误率
MMI/BMMI句子正确率
MPEphone错误率
sMBR状态错误率

MMI

MMI(maximum mutual information)准则最大化观察序列分布和word序列分布之间的互信息,减小句子错误率。
假设观察序列 o m = o 1 m , . . . , o T m m o^m=o_1^m,...,o_{T_m}^m om=o1m,...,oTmm,word序列 w m = w 1 m , . . . , w N m m w^m=w_1^m,...,w_{N_m}^m wm=w1m,...,wNmm,其中 m m m表示utterance, T m T_m Tm表示帧数, N m N_m Nm表示word个数。训练集为 S = { ( o m , w m ) ∣ 0 ≤ m ≤ M } , S=\{(o^m,w^m)|0\le m\le M\}, S={(om,wm)0mM}MMI准则可以表示如下:
J M M I ( θ ; S ) = ∑ m = 1 M J M M I ( θ ; o m , w m ) = ∑ m = 1 M l o g P ( w m ∣ o m ; θ ) J_{MMI}(\theta;S)=\sum_{m=1}^MJ_{MMI}(\theta;o^m,w^m)=\sum_{m=1}^MlogP(w^m|o^m;\theta) JMMI(θ;S)=m=1MJMMI(θ;om,wm)=m=1MlogP(wmom;θ)
= ∑ m = 1 M l o g p ( o m ∣ s m ; θ ) k P ( w m ) ∑ w p ( o m ∣ s w ; θ ) k P ( w ) =\sum_{m=1}^Mlog\frac{p(o^m|s^m;\theta)^kP(w^m)}{\sum_w p(o^m|s^w;\theta)^k P(w)} =m=1Mlogwp(omsw;θ)kP(w)p(omsm;θ)kP(wm)
其中 k k k表示acoustic scale, θ \theta θ表示模型参数, s m s^m sm表示状态序列。物理意义可以理解为:分子表示准确结果对应路径的总得分(声学和语言),分母表示所有路径对应的得分总和(为了计算上的可操作性,实际用lattice简化表示)。模型参数的梯度可以表示如下:
∇ J M M I ( θ ; o m , w m ) = ∑ m ∑ t ∇ z m t L J M M I ( θ ; o m , w m ) ∂ z m t L ∂ θ = ∑ m ∑ t e ¨ m t L ∂ z m t L ∂ θ \nabla{J_{MMI}(\theta;o^m,w^m)}=\sum_m\sum_t \nabla_{z_{mt}^L}{J_{MMI}(\theta;o^m,w^m)} \frac{\partial z_{mt}^L}{\partial \theta}=\sum_m\sum_t \ddot{e}_{mt}^L\frac{\partial z_{mt}^L}{\partial \theta} JMMI(θ;om,wm)=mtzmtLJMMI(θ;om,wm)θzmtL=mte¨mtLθzmtL
其中 z m t L z_{mt}^L zmtL表示softmax层的输入(没有做softmax运算),跟CE准则的不同体现在 e ¨ m t L \ddot{e}_{mt}^L e¨mtL,进一步计算如下:
e ¨ m t L ( i ) = ∇ z m t L ( i ) J M M I ( θ ; o m , w m ) \ddot{e}_{mt}^L(i)=\nabla_{z_{mt}^L(i)}{J_{MMI}(\theta;o^m,w^m)} e¨mtL(i)=zmtL(i)JMMI(θ;om,wm)
= ∑ r ∂ J M M I ( θ ; o m , w m ) ∂ l o g p ( o t m ∣ r ) ∂ l o g p ( o t m ∣ r ) ∂ z m t L ( i ) =\sum_r \frac{\partial J_{MMI}(\theta;o^m,w^m)}{\partial logp(o_t^m|r)}\frac{\partial logp(o_t^m|r)}{\partial z_{mt}^L(i)} =rlogp(otmr)JMMI(θ;om,wm)zmtL(i)logp(otmr)

第一部分

∂ J M M I ( θ ; o m , w m ) ∂ l o g p ( o t m ∣ r ) \frac{\partial J_{MMI}(\theta;o^m,w^m)}{\partial logp(o_t^m|r)} logp(otmr)JMMI(θ;om,wm)
= ∂ l o g p ( o m ∣ s m ) k P ( w m ) ∑ w p ( o m ∣ s w ) k P ( w ) ∂ l o g p ( o t m ∣ r ) =\frac{\partial log\frac{p(o^m|s^m)^kP(w^m)}{\sum_w p(o^m|s^w)^k P(w)}}{\partial logp(o_t^m|r)} =logp(otmr)logwp(omsw)kP(w)p(omsm)kP(wm)
= k ∂ l o g p ( o m ∣ s m ) ∂ l o g p ( o t m ∣ r ) − ∂ l o g ∑ w p ( o m ∣ s w ) k P ( w ) ∂ l o g p ( o t m ∣ r ) =k\frac{\partial log p(o^m|s^m)}{\partial log p(o_t^m|r)}-\frac{\partial log\sum_w p(o^m|s^w)^k P(w)}{\partial logp(o_t^m|r)} =klogp(otmr)logp(omsm)logp(otmr)logwp(omsw)kP(w)
考虑到 p ( o m ∣ s m ) = p ( o 1 m ∣ s 1 m ) p ( o 2 m ∣ s 2 m ) . . . p ( o T m m ∣ s T m m ) p(o^m|s^m)=p(o_1^m|s_1^m)p(o_2^m|s_2^m)...p(o_{T_m}^m|s_{T_m}^m) p(omsm)=p(o1ms1m)p(o2ms2m)...p(oTmmsTmm),所以上式第一项可以简化为:
k ∂ p ( o m ∣ s m ) ∂ l o g p ( o t m ∣ r ) = k ( δ ( r = s t m ) ) k\frac{\partial p(o^m|s^m)}{\partial logp(o_t^m|r)}=k(\delta(r=s_t^m)) klogp(otmr)p(omsm)=k(δ(r=stm))
第二项可以进一步求导:
∂ l o g ∑ w p ( o m ∣ s w ) k P ( w ) ∂ l o g p ( o t m ∣ r ) \frac{\partial log\sum_w p(o^m|s^w)^k P(w)}{\partial logp(o_t^m|r)} logp(otmr)logwp(omsw)kP(w)
= ∂ l o g ∑ w e l o g p ( o m ∣ s w ) k P ( w ) ∂ l o g p ( o t m ∣ r ) =\frac{\partial log\sum_w e^{logp(o^m|s^w)^k P(w)}}{\partial logp(o_t^m|r)} =logp(otmr)logwelogp(omsw)kP(w)
= 1 ∑ w e l o g p ( o m ∣ s w ) k P ( w ) ∂ ∑ w e l o g p ( o m ∣ s w ) k P ( w ) ∂ l o g p ( o t m ∣ r ) =\frac{1}{\sum_w e^{logp(o^m|s^w)^k P(w)}}\frac{\partial \sum_w e^{log p(o^m|s^w)^k P(w)}}{\partial logp(o_t^m|r)} =welogp(omsw)kP(w)1logp(otmr)welogp(omsw)kP(w)
= 1 ∑ w p ( o m ∣ s w ) k P ( w ) ∗ ∑ w e l o g p ( o m ∣ s w ) k P ( w ) ∗ ∂ l o g p ( o m ∣ s w ) k P ( w ) ∂ l o g p ( o t m ∣ r ) =\frac{1}{\sum_w p(o^m|s^w)^k P(w)}*\sum_w e^{logp(o^m|s^w)^k P(w)}*\frac{\partial log p(o^m|s^w)^k P(w)}{\partial logp(o_t^m|r)} =wp(omsw)kP(w)1welogp(omsw)kP(w)logp(otmr)logp(omsw)kP(w)
= 1 ∑ w p ( o m ∣ s w ) k P ( w ) ∗ ∑ w p ( o m ∣ s w ) k P ( w ) ∗ δ ( s t m = r ) =\frac{1}{\sum_w p(o^m|s^w)^k P(w)}*\sum_w p(o^m|s^w)^k P(w)*\delta (s_t^m=r) =wp(omsw)kP(w)1wp(omsw)kP(w)δ(stm=r)
= ∑ w : s t = r p ( o m ∣ s w ) k P ( w ) ∑ w p ( o m ∣ s w ) k P ( w ) =\frac{\sum_{w:s_t=r} p(o^m|s^w)^k P(w)}{\sum_w p(o^m|s^w)^k P(w)} =wp(omsw)kP(w)w:st=rp(omsw)kP(w)
综合前面的第一项和第二项,可得:
∂ J M M I ( θ ; o m , w m ) ∂ l o g p ( o t m ∣ r ) = k ( δ ( r = s t m ) − ∑ w : s t = r p ( o m ∣ s m ) k P ( w ) ∑ w p ( o m ∣ s m ) k P ( w ) ) \frac{\partial J_{MMI}(\theta;o^m,w^m)}{\partial logp(o_t^m|r)}=k(\delta(r=s_t^m)-\frac{\sum_{w:s_t=r} p(o^m|s^m)^k P(w)}{\sum_w p(o^m|s^m)^k P(w)}) logp(otmr)JMMI(θ;om,wm)=k(δ(r=stm)wp(omsm)kP(w)w:st=rp(omsm)kP(w))

第二部分

考虑到 p ( x ∣ y ) ∗ p ( y ) = p ( y ∣ x ) ∗ p ( x ) p(x|y)*p(y)=p(y|x)*p(x) p(xy)p(y)=p(yx)p(x),第二部分可以表示如下:
∂ l o g p ( o t m ∣ r ) ∂ z m t L ( i ) \frac{\partial logp(o_t^m|r)}{\partial z_{mt}^L(i)} zmtL(i)logp(otmr)
= ∂ l o g p ( r ∣ o t m ) − l o g p ( r ) + l o g p ( o t m ) ∂ z m t L ( i ) =\frac{\partial log p(r|o_t^m)-logp(r)+logp(o_t^m)}{\partial z_{mt}^L(i)} =zmtL(i)logp(rotm)logp(r)+logp(otm)
= ∂ l o g p ( r ∣ o t m ) ∂ z m t L ( i ) =\frac{\partial log p(r|o_t^m)}{\partial z_{mt}^L(i)} =zmtL(i)logp(rotm)
其中 p ( r ∣ o t m ) p(r|o_t^m) p(rotm)表示DNN的第r个输出,
p ( r ∣ o t m ) = s o f t m a x r ( z m t L ) = e z m t L ( r ) ∑ j e z m t L ( j ) p(r|o_t^m)=softmax_r(z_{mt}^L)=\frac{e^{z_{mt}^L(r)}}{\sum_j e^{z_{mt}^L(j)}} p(rotm)=softmaxr(zmtL)=jezmtL(j)ezmtL(r)
所以,
∂ l o g p ( o t m ∣ r ) ∂ z m t L ( i ) = δ ( r = i ) \frac{\partial logp(o_t^m|r)}{\partial z_{mt}^L(i)}=\delta(r=i) zmtL(i)logp(otmr)=δ(r=i)
按照文章的推导应该得到这个结果,但是实际上分母还包含 z m t L ( i ) z_{mt}^L(i) zmtL(i),是不是做了近似认为分母是常量,这一步有疑问????

综合上面两部分,可以得到最终的公式:
e ¨ m t L ( i ) = k ( δ ( i = s t m ) − ∑ w : s t = i p ( o m ∣ s m ) k P ( w ) ∑ w p ( o m ∣ s m ) k P ( w ) ) \ddot{e}_{mt}^L(i)=k(\delta(i=s_t^m)-\frac{\sum_{w:s_t=i} p(o^m|s^m)^k P(w)}{\sum_w p(o^m|s^m)^k P(w)}) e¨mtL(i)=k(δ(i=stm)wp(omsm)kP(w)w:st=ip(omsm)kP(w))
##Boosted MMI
J B M M I ( θ ; S ) = ∑ m = 1 M J B M M I ( θ ; o m , w m ) = ∑ m = 1 M l o g P ( w m ∣ o m ) ∑ w P ( w ∣ o m ) e − b A ( w , w m ) J_{BMMI}(\theta;S)=\sum_{m=1}^MJ_{BMMI}(\theta;o^m,w^m)=\sum_{m=1}^Mlog \frac{P(w^m|o^m)}{\sum_w P(w|o^m)e^{-bA(w,w^m)}} JBMMI(θ;S)=m=1MJBMMI(θ;om,wm)=m=1MlogwP(wom)ebA(w,wm)P(wmom)
= ∑ m = 1 M l o g P ( o m ∣ w m ) k P ( w m ) ∑ w P ( o m ∣ w m ) k P ( w ) e − b A ( w , w m ) =\sum_{m=1}^Mlog \frac{P(o^m|w^m)^kP(w^m)}{\sum_w P(o^m|w^m)^k P(w)e^{-bA(w,w^m)}} =m=1MlogwP(omwm)kP(w)ebA(w,wm)P(omwm)kP(wm)
相比于MMI,BMMI在分母上面增加了一个权重系数 e − b A ( w , w m ) e^{-bA(w,w^m)} ebA(w,wm),一般 b = 0.5 b=0.5 b=0.5, A ( w , w m ) A(w,w^m) A(w,wm) w w w w m w^m wm之间准确率的度量,可以是word/phoneme/state级别的准确率。
物理意义:
参考[3]给出的解释,We boost the likelihood of the sentences that have more errors, thus generating more confusable data. Boosted MMI can viewed as trying to enforce a soft margin that is proportional to the number of errors in a hypothesised sentence。
结合参数理解,就是 w w w w m w^m wm越接近(错误的word越少), e − b A ( w , w m ) e^{-bA(w,w^m)} ebA(w,wm)这个权重越小,相反,权重会越大,增加了数据的困惑度。
通过可以推导出误差信号:
e ¨ m t L ( i ) = k ( δ ( i = s t m ) − ∑ w : s t = i p ( o m ∣ s w ) k P ( w ) e − b A ( w , w m ) ∑ w p ( o m ∣ s w ) k P ( w ) e − b A ( w , w m ) ) \ddot{e}_{mt}^L(i)=k(\delta(i=s_t^m)-\frac{\sum_{w:s_t=i} p(o^m|s^w)^k P(w) e^{-bA(w,w^m)}}{\sum_w p(o^m|s^w)^k P(w) e^{-bA(w,w^m)}}) e¨mtL(i)=k(δ(i=stm)wp(omsw)kP(w)ebA(w,wm)w:st=ip(omsw)kP(w)ebA(w,wm))

MPE/sMBR

MBR(minimum Bayes risk)的目标函数是最小化各种粒度指标的错误,比如MPE是最小化phone级别的错误,sMBR最小化状态的错误。目标函数如下:
J M B R ( θ ; S ) = ∑ m = 1 M J M B R ( θ ; o m , w m ) = ∑ m = 1 M ∑ w P ( w ∣ o m ) A ( w , w m ) J_{MBR}(\theta;S)=\sum_{m=1}^MJ_{MBR}(\theta;o^m,w^m)=\sum_{m=1}^M \sum_w P(w|o^m) A(w,w^m) JMBR(θ;S)=m=1MJMBR(θ;om,wm)=m=1MwP(wom)A(w,wm)
= ∑ m = 1 M ∑ w P ( o m ∣ s w ) k P ( w ) A ( w , w m ) ∑ w ′ P ( o m ∣ s w ′ ) k P ( w ′ ) =\sum_{m=1}^M \frac{\sum_w P(o^m|s^w)^kP(w) A(w,w^m)}{\sum_{w'} P(o^m|s^{w'})^k P(w')} =m=1MwP(omsw)kP(w)wP(omsw)kP(w)A(w,wm)
其中 A ( w , w m ) A(w,w^m) A(w,wm)表示两个序列之间的差异,MPE就是正确的phone的个数,sMBR是指正确的state的个数。求导可得:
e ¨ m t L ( i ) = ∇ z m t L ( i ) J M B R ( θ ; o m , w m ) \ddot{e}_{mt}^L(i)=\nabla_{z_{mt}^L(i)}{J_{MBR}(\theta;o^m,w^m)} e¨mtL(i)=zmtL(i)JMBR(θ;om,wm)
= ∑ r ∂ J M B R ( θ ; o m , w m ) ∂ l o g p ( o t m ∣ r ) ∂ l o g p ( o t m ∣ r ) ∂ z m t L ( i ) =\sum_r \frac{\partial J_{MBR}(\theta;o^m,w^m)}{\partial logp(o_t^m|r)}\frac{\partial logp(o_t^m|r)}{\partial z_{mt}^L(i)} =rlogp(otmr)JMBR(θ;om,wm)zmtL(i)logp(otmr)
###第一部分
对于MPE,参考文献[4]:
首先将 J M B R ( θ ; o m , s m ) J_{MBR}(\theta;o^m,s^m) JMBR(θ;om,sm)分子分母求和部分分为两块, r ∈ s w r\in s^w rsw r ∉ s w r\notin s^w r/sw
J M B R ( θ ; o m , s m ) = ∑ s P ( o m ∣ s ) k P ( s ) A ( s , s m ) ∑ s ′ P ( o m ∣ s ′ ) k P ( s ′ ) J_{MBR}(\theta;o^m,s^m)=\frac{\sum_s P(o^m|s)^kP(s) A(s, s^m)}{\sum_{s'} P(o^m|s')^k P(s')} JMBR(θ;om,sm)=sP(oms)kP(s)sP(oms)kP(s)A(s,sm)
= ∑ s : r ∈ s P ( o m ∣ s ) k P ( s ) A ( s , s m ) + ∑ s : r ∉ s P ( o m ∣ s ) k P ( s ) A ( s , s m ) ∑ s ′ : r ∈ s ′ P ( o m ∣ s ′ ) k P ( s ′ ) + ∑ s ′ : r ∉ s ′ P ( o m ∣ s ′ ) k P ( s ′ ) = \frac{\sum_{s:r\in s} P(o^m|s)^kP(s) A(s, s^m)+\sum_{s:r\notin s} P(o^m|s)^kP(s) A(s, s^m)}{\sum_{s':r\in s'} P(o^m|s')^k P(s')+\sum_{s':r\notin s'} P(o^m|s')^k P(s')} =s:rsP(oms)kP(s)+s:r/sP(oms)kP(s)s:rsP(oms)kP(s)A(s,sm)+s:r/sP(oms)kP(s)A(s,sm)

  • 如果满足 r ∈ s r\in s rs,那么导数满足以下关系:
    ∂ P ( o m ∣ s ) k ∂ l o g p ( o t m ∣ r ) = ∂ e k ∗ l o g P ( o m ∣ s ) ∂ l o g p ( o t m ∣ r ) = k ∗ P ( o m ∣ s ) k \frac{\partial P(o^m|s)^k}{\partial log p(o_t^m|r)}=\frac{\partial e^{k*logP(o^m|s)}}{\partial log p(o_t^m|r)}=k*P(o^m|s)^k logp(otmr)P(oms)k=logp(otmr)eklogP(oms)=kP(oms)k
  • 如果不满足 r ∈ s r\in s rs,那么导数将为0:
    ∂ P ( o m ∣ s ) k ∂ l o g p ( o t m ∣ r ) = 0 \frac{\partial P(o^m|s)^k}{\partial log p(o_t^m|r)}=0 logp(otmr)P(oms)k=0

不难推出:
∂ J M B R ( θ ; o m , s m ) ∂ l o g p ( o t m ∣ r ) \frac{\partial J_{MBR}(\theta;o^m,s^m)}{\partial logp(o_t^m|r)} logp(otmr)JMBR(θ;om,sm)
= k ∗ ∑ s : r ∈ s P ( o m ∣ s ) k P ( s ) A ( s , s m ) ∑ s ′ P ( o m ∣ s ′ ) k P ( s ′ ) − k ∗ ∑ s P ( o m ∣ s ) k P ( s ) A ( s , s m ) ∑ s ′ P ( o m ∣ s ′ ) k P ( s ′ ) ∗ ∑ s : r ∈ s P ( o m ∣ s ) k P ( s ) ∑ s ′ P ( o m ∣ s ′ ) k P ( s ′ ) =k*\frac{\sum_{s:r\in s} P(o^m|s)^kP(s) A(s, s^m)}{\sum_{s'} P(o^m|s')^k P(s')}-k*\frac{\sum_s P(o^m|s)^kP(s) A(s, s^m)}{\sum_{s'} P(o^m|s')^k P(s')}*\frac{\sum_{s:r\in s} P(o^m|s)^kP(s)}{\sum_{s'} P(o^m|s')^k P(s')} =ksP(oms)kP(s)s:rsP(oms)kP(s)A(s,sm)ksP(oms)kP(s)sP(oms)kP(s)A(s,sm)sP(oms)kP(s)s:rsP(oms)kP(s)
上面的等式可以简化为以下形式:
∂ J M B R ( θ ; o m , s m ) ∂ l o g p ( o t m ∣ r ) = k ∗ γ ¨ m t D E N ( r ) ( A ˉ m ( r = s t m ) − A ˉ m ) \frac{\partial J_{MBR}(\theta;o^m,s^m)}{\partial logp(o_t^m|r)}=k*\ddot{\gamma}_{mt}^{DEN}(r)(\bar{A}^m(r=s_t^m)-\bar{A}^m) logp(otmr)JMBR(θ;om,sm)=kγ¨mtDEN(r)(Aˉm(r=stm)Aˉm)
各个部分的定义如下:
γ ¨ m t D E N ( r ) = ∑ s : r ∈ s P ( o m ∣ s ) k P ( s ) ∑ s ′ P ( o m ∣ s ′ ) k P ( s ′ ) \ddot{\gamma}_{mt}^{DEN}(r)=\frac{\sum_{s:r\in s} P(o^m|s)^kP(s)}{\sum_{s'} P(o^m|s')^k P(s')} γ¨mtDEN(r)=sP(oms)kP(s)s:rsP(oms)kP(s)
A ˉ m = ∑ s P ( o m ∣ s ) k P ( s ) A ( s , s m ) ∑ s ′ P ( o m ∣ s ′ ) k P ( s ′ ) \bar{A}^m=\frac{\sum_s P(o^m|s)^kP(s) A(s, s^m)}{\sum_{s'} P(o^m|s')^k P(s')} Aˉm=sP(oms)kP(s)sP(oms)kP(s)A(s,sm)
A ˉ m ( r = s t m ) = E ( A ( s , s m ) ) = ∑ s : r ∈ s P ( o m ∣ s ) k P ( s ) A ( s , s m ) ∑ s ′ : r ∈ s ′ P ( o m ∣ s ′ ) k P ( s ′ ) \bar{A}^m(r=s_t^m)=\mathbb E(A(s, s^m))=\frac{\sum_{s:r\in s} P(o^m|s)^kP(s)A(s,s^m)}{\sum_{s':r\in s'} P(o^m|s')^k P(s')} Aˉm(r=stm)=E(A(s,sm))=s:rsP(oms)kP(s)s:rsP(oms)kP(s)A(s,sm)
第一项表示occupancy statistics
第二项表示lattice中所有路径的平均准确率
第三项表示lattice中所有经过r的路径的平均准确率,是 A ( s , s m ) A(s, s^m) A(s,sm)的均值,可以将三个三项合并起来进行还原就很容易里面均值的含义。

第二部分

第二部分和MMI的一致

tricks

lattice generation

区分性训练时生成高质量的lattice很重要,需要使用最好的模型来生成对应的lattice,并且作为seed model。
###lattice compensation
如果lattice产生的不合理的话,会导致计算出来的梯度异常,比如分子的标注路径没有在分母中的lattice出现,这种情况对于silience帧尤其常见,因为silience经常出现在分子的lattice,但是很容易被分母的lattice忽略。有一些方法可以解决这种问题:

  • fame rejection,直接删除这些帧
  • 根据reference hypothesis修正lattice,比如在lattice中人为地添加一下silience边

frame smoothing

SDT很容易出现overfitting,两方面原因

  • sparse lattice
  • sdt的squence相比于frame增加了建模的维度,导致训练集的后验概率分布更容易跟测试集出现差异

可以修改训练准则来减弱overfitting,通过结合squence criteria和frame criteria来实现:
J F S − S E Q ( θ ; S ) = ( 1 − H ) J C E ( θ ; S ) + H J S E Q ( θ ; S ) J_{FS-SEQ}(\theta;S)=(1-H)J_{CE}(\theta;S)+HJ_{SEQ}(\theta;S) JFSSEQ(θ;S)=(1H)JCE(θ;S)+HJSEQ(θ;S)
H H H成为smoothing factor,经验值设为 4 / 5 4/5 4/5 10 / 11 10/11 10/11
###learning rate
SDT的学习率相比于CE要下,因为

  • SDT的起点一般基于CE训练出来的model
  • SDT训练容易出现overfitting

criterion selection

sMBR效果相比其他会好一点,MMI比较容易理解和实现。

noise contrastIve estimation

NCE可以用于加速训练

参考

[1]《automatic speech recognition a deep learning approach》 chapter8
[2]Sequence-discriminative training of deep neural networks
[3]Boosted MMI for model and feature-space discriminative training
[4]discriminative training for large vocabulary speech recognition {daniel povey的博士论文chapter6}

后面的技术分享转移到微信公众号上面更新了,【欢迎扫码关注交流】

在这里插入图片描述

  • 9
    点赞
  • 24
    收藏
    觉得还不错? 一键收藏
  • 5
    评论
以下是一个简单的 Java 示例代码,用于上传图片到海康威视的设备中: ```java import java.io.File; import java.io.IOException; import java.io.InputStream; import java.io.OutputStream; import java.io.PrintWriter; import java.net.HttpURLConnection; import java.net.URL; import java.util.UUID; public class PictureUploadDemo { public static void main(String[] args) { String ip = "192.168.1.100"; // 设备的 IP 地址 String username = "admin"; // 登录用户名 String password = "123456"; // 登录密码 String picturePath = "C:\\pictures\\test.jpg"; // 待上传的图片路径 String requestUrl = "http://" + ip + "/ISAPI/SDT/pictureUpload"; // 图片上传接口地址 try { // 登录获取 cookie,用于后续请求的认证 String cookie = login(ip, username, password); // 上传图片 String result = uploadPicture(requestUrl, picturePath, cookie); System.out.println(result); } catch (IOException e) { e.printStackTrace(); } } /** * 登录设备,获取 cookie */ public static String login(String ip, String username, String password) throws IOException { String requestUrl = "http://" + ip + "/ISAPI/Security/userCheck"; HttpURLConnection conn = (HttpURLConnection) new URL(requestUrl).openConnection(); conn.setRequestMethod("POST"); conn.setRequestProperty("Content-Type", "application/xml"); // 构造登录请求的 XML 数据 String xmlData = String.format("<UserCheck><userName>%s</userName><password>%s</password></UserCheck>", username, password); // 发送登录请求 OutputStream out = conn.getOutputStream(); out.write(xmlData.getBytes()); out.flush(); out.close(); // 获取 cookie String cookie = conn.getHeaderField("Set-Cookie"); conn.disconnect(); return cookie; } /** * 上传图片 */ public static String uploadPicture(String requestUrl, String picturePath, String cookie) throws IOException { HttpURLConnection conn = (HttpURLConnection) new URL(requestUrl).openConnection(); conn.setRequestMethod("POST"); conn.setRequestProperty("Content-Type", "multipart/form-data; boundary=" + UUID.randomUUID().toString()); conn.setRequestProperty("Cookie", cookie); // 构造 multipart/form-data 请求体 OutputStream out = conn.getOutputStream(); PrintWriter writer = new PrintWriter(out); // 第一个部分,图片文件 File pictureFile = new File(picturePath); writer.printf("--%s\r\n", UUID.randomUUID().toString()); writer.printf("Content-Disposition: form-data; name=\"file\"; filename=\"%s\"\r\n", pictureFile.getName()); writer.printf("Content-Type: image/jpeg\r\n\r\n"); writer.flush(); InputStream pictureIn = PictureUploadDemo.class.getResourceAsStream(picturePath); byte[] buffer = new byte[1024]; int len; while ((len = pictureIn.read(buffer)) != -1) { out.write(buffer, 0, len); } pictureIn.close(); writer.printf("\r\n--%s--\r\n", UUID.randomUUID().toString()); writer.flush(); // 发送请求并获取响应 out.flush(); out.close(); int responseCode = conn.getResponseCode(); InputStream in; if (responseCode >= 200 && responseCode < 300) { in = conn.getInputStream(); } else { in = conn.getErrorStream(); } buffer = new byte[1024]; StringBuilder response = new StringBuilder(); while ((len = in.read(buffer)) != -1) { response.append(new String(buffer, 0, len)); } in.close(); conn.disconnect(); return response.toString(); } } ``` 在这个示例代码中,我们通过 `login` 方法登录设备,并获取到了认证用的 cookie。然后,在 `uploadPicture` 方法中,我们构造了一个 `multipart/form-data` 的 POST 请求,将待上传的图片文件作为其中的一个部分。最后,我们发送请求并获取响应,将响应的内容作为方法的返回值。 注意,这个代码中的 `picturePath` 是指图片文件在本地的路径,如果你要上传的图片文件不在本地文件系统中,你需要对代码进行相应的修改。
评论 5
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值