EM algorithm based on GMM to extract voice features and identify male and female voices
文章目录
1.理论推导
1.1.EM算法的推导
对于样本 x ∼ F ( x , θ ) x\sim F(x,\theta) x∼F(x,θ),有 x = ( x ( 1 ) , x ( 2 ) , . . . , x ( n ) ) x=(x^{(1)},x^{(2)},...,x^{(n)}) x=(x(1),x(2),...,x(n)),若用 M L E MLE MLE方法求未知参数 θ \theta θ,则
θ ^ = a r g m a x θ L ( θ ) = a r g m a x θ ∑ i = 1 n l n p ( x ( i ) , θ ) \hat{\theta}=\mathop{argmax}\limits_{\theta}L(\theta)=\mathop{argmax}\limits_{\theta}\sum_{i=1}^n lnp(x^{(i)},\theta) θ^=θargmaxL(θ)=θargmax∑i=1nlnp(x(i),θ),但如果样本来自若干个仅参数不同的相同分布,就无法用 M L E MLE MLE方法求出,可以设每
个样本 x ( i ) x^{(i)} x(i)有 z ( k ) z^{(k)} z(k)的概率服从于某个分布, z ( k ) z^{(k)} z(k)也是个随机变量,服从于多项分布,
有 z = ( z ( 1 ) , z ( 2 ) , . . . , z ( k ) ) z=(z^{(1)},z^{(2)},...,z^{(k)}) z=(z(1),z(2),...,z(k)),设 P ( z ( i ) ) = Q i ( z ( i ) ) P(z^{(i)})=Q_i(z^{(i)}) P(z(i))=Qi(z(i)),
那么有 ∑ z ( i ) Q i ( z ( i ) ) = 1 \sum_{z^{(i)}}Q_i(z^{(i)})=1 ∑z(i)Qi(z(i))=1,且对于求解 θ \theta θ来说, z z z是隐变量,那么如何求解未知参数 θ \theta θ呢?有:
θ ^ = a r g m a x θ ∑ i = 1 n l n p ( x ( i ) , θ ) = a r g m a x θ ∑ i = 1 n l n [ ∑ z ( i ) P ( z ( i ) ) p ( x ( i ) , θ ∣ z ( i ) ) ] = a r g m a x θ ∑ i = 1 n l n [ ∑ z ( i ) p ( x ( i ) , z ( i ) , θ ) ] \hat{\theta}=\mathop{argmax}\limits_{\theta}\sum_{i=1}^n lnp(x^{(i)},\theta)=\mathop{argmax}\limits_{\theta}\sum_{i=1}^n ln[\sum_{z^{(i)}}P(z^{(i)})p(x^{(i)},\theta|z^{(i)})] \\ =\mathop{argmax}\limits_{\theta}\sum_{i=1}^n ln[\sum_{z^{(i)}}p(x^{(i)},z^{(i)},\theta)] θ^=θargmax∑i=1nlnp(x(i),θ)=θargmax∑i=1nln[∑z(i)P(z(i))p(x(i),θ∣z(i))]=θargmax∑i=1nln[∑z(i)p(x(i),z(i),θ)]
那么将 θ ^ \hat{\theta} θ^作变换,有 θ ^ = a r g m a x θ ∑ i = 1 n l n [ ∑ z ( i ) Q i ( z ( i ) ) p ( x ( i ) , z ( i ) , θ ) Q i ( z ( i ) ) ] \hat{\theta}=\mathop{argmax}\limits_{\theta}\sum_{i=1}^n ln[\sum_{z^{(i)}}Q_i(z^{(i)})\frac{p(x^{(i)},z^{(i)},\theta)}{Q_i(z^{(i)})}] θ^=θargmax∑i=1nln[∑z(i)Qi(z(i))Qi(z(i))p(x(i),z(i),θ)] ,由 J e n s e n Jensen Jensen不等式可知,若 f ′ ′ ( x ) ≥ 0 f^{''}(x)\geq0 f′′(x)≥0,有 E f ( x ) ≥ f ( E x ) , ∵ ( l n x ) ′ ′ = − 1 x 2 < 0 , ∴ E l n x ≤ l n ( E x ) Ef(x)\geq f(Ex),\because(lnx)^{''}=-\frac{1}{x^2}<0,\therefore Elnx \leq ln(Ex) Ef(x)≥f(Ex),∵(lnx)′′=−x21<0,∴Elnx≤ln(Ex),可以看出 ∑ z ( i ) Q i ( z ( i ) ) p ( x ( i ) , z ( i ) , θ ) Q i ( z ( i ) ) \sum_{z^{(i)}}Q_i(z^{(i)})\frac{p(x^{(i)},z^{(i)},\theta)}{Q_i(z^{(i)})} ∑z(i)Qi(z(i))Qi(z(i))p(x(i),z(i),θ)就是 p ( x ( i ) , z ( i ) , θ ) Q i ( z ( i ) ) \frac{p(x^{(i)},z^{(i)},\theta)}{Q_i(z^{(i)})} Qi(z(i))p(x(i),z(i),θ)的关于 z ( i ) z^{(i)} z(i)的期望。
∴ ∑ i = 1 n l n [ ∑ z ( i ) Q i ( z ( i ) ) p ( x ( i ) , z ( i ) , θ ) Q i ( z ( i ) ) ] = ∑ i = 1 n l n [ E p ( x ( i ) , z ( i ) , θ ) Q i ( z ( i ) ) ] ≥ ∑ i = 1 n E l n p ( x ( i ) , z ( i ) , θ ) Q i ( z ( i ) ) = ∑ i = 1 n ∑ z ( i ) Q i ( z ( i ) ) l n p ( x ( i ) , z ( i ) , θ ) Q i ( z ( i ) ) = ∑ i = 1 n ∑ z ( i ) Q i ( z ( i ) ) l n p ( x ( i ) , z ( i ) , θ ) − ∑ i = 1 n ∑ z ( i ) Q i ( z ( i ) ) l n Q i ( z ( i ) ) \therefore\sum_{i=1}^n ln[\sum_{z^{(i)}}Q_i(z^{(i)})\frac{p(x^{(i)},z^{(i)},\theta)}{Q_i(z^{(i)})}]=\sum_{i=1}^n ln[E\frac{p(x^{(i)},z^{(i)},\theta)}{Q_i(z^{(i)})}]\geq\sum_{i=1}^n Eln\frac{p(x^{(i)},z^{(i)},\theta)}{Q_i(z^{(i)})} \\ =\sum_{i=1}^n \sum_{z^{(i)}}Q_i(z^{(i)})ln\frac{p(x^{(i)},z^{(i)},\theta)}{Q_i(z^{(i)})} \\ =\sum_{i=1}^n \sum_{z^{(i)}}Q_i(z^{(i)})lnp(x^{(i)},z^{(i)},\theta)-\sum_{i=1}^n \sum_{z^{(i)}}Q_i(z^{(i)})lnQ_i(z^{(i)}) ∴∑i=1nln[∑z(i)Qi(z(i))Qi(z(i))p(x(i),z(i),θ)]=∑i=1nln[EQi(z(i))p(x(i),z(i),θ)]≥∑i=1nElnQi(z(i))p(x(i),z(i),θ)=∑i=1n∑z(i)Qi(z(i))lnQi(z(i))p(x(i),z(i),θ)=∑i=1n∑z(i)Qi(z(i))lnp(x(i),z(i),θ)−∑i=1n∑z(i)Qi(z(i))lnQi(z(i))
∴ θ ^ = a r g m a x θ [ ∑ i = 1 n ∑ z ( i ) Q i ( z ( i ) ) l n p ( x ( i ) , z ( i ) , θ ) − ∑ i = 1 n ∑ z ( i ) Q i ( z ( i ) ) l n Q i ( z ( i ) ) ] \therefore \hat{\theta}=\mathop{argmax}\limits_{\theta} \Big[\sum_{i=1}^n \sum_{z^{(i)}}Q_i(z^{(i)})lnp(x^{(i)},z^{(i)},\theta)-\sum_{i=1}^n \sum_{z^{(i)}}Q_i(z^{(i)})lnQ_i(z^{(i)})\Big] ∴θ^=θargmax[∑i=1n∑z(i)Qi(z(i))lnp(x(i),z(i),θ)−∑i=1n∑z(i)Qi(z(i))lnQi(z(i))],可以看出上式后半部分与 θ \theta θ无关
∴ θ ^ = a r g m a x θ ∑ i = 1 n ∑ z ( i ) Q i ( z ( i ) ) l n p ( x ( i ) , z ( i ) , θ ) \therefore \hat{\theta}=\mathop{argmax}\limits_{\theta} \sum_{i=1}^n \sum_{z^{(i)}}Q_i(z^{(i)})lnp(x^{(i)},z^{(i)},\theta) ∴θ^=θargmax∑i=1n∑z(i)Qi(z(i))lnp(x(i),z(i),θ)
但注意,我们这里是当且仅当 J e n s e n Jensen Jensen不等式成立时,也就是 P ( X = E X ) = 1 P(X=EX)=1 P(X=EX)=1,即认为 X X X为常数时,放到这里也就是 p ( x ( i ) , z ( i ) , θ ) Q i ( z ( i ) ) = c \frac{p(x^{(i)},z^{(i)},\theta)}{Q_i(z^{(i)})}=c Qi(z(i))p(x(i),z(i),θ)=c( c c c为常数), ∵ L ( θ ) = ∑ i = 1 n l n [ E p ( x ( i ) , z ( i ) , θ ) Q i ( z ( i ) ) ] ≥ ∑ i = 1 n E l n p ( x ( i ) , z ( i ) , θ ) Q i ( z ( i ) ) \because L(\theta)=\sum_{i=1}^n ln[E\frac{p(x^{(i)},z^{(i)},\theta)}{Q_i(z^{(i)})}]\geq\sum_{i=1}^n Eln\frac{p(x^{(i)},z^{(i)},\theta)}{Q_i(z^{(i)})} ∵L(θ)=∑i=1nln[EQi(z(i))p(x(i),z(i),θ)]≥∑i=1nElnQi(z(i))p(x(i),z(i),θ),若我们最大化 L ( θ ) L(\theta) L(θ),通过最大化该式右半部分也就是最大化 L ( θ ) L(\theta) L(θ)的下界,未必可以找到 L ( θ ) L(\theta) L(θ)的最大值,除非等号成立,也就是 p ( x ( i ) , z ( i ) , θ ) Q i ( z ( i ) ) = c \frac{p(x^{(i)},z^{(i)},\theta)}{Q_i(z^{(i)})}=c Qi(z(i))p(x(i),z(i),θ)=c,这其实就是 E M EM EM算法中的 E E E步。
若定义 p ( x ( i ) , z ( i ) , θ ) Q i ( z ( i ) ) = c \frac{p(x^{(i)},z^{(i)},\theta)}{Q_i(z^{(i)})}=c Qi(z(i))p(x(i),z(i),θ)=c, ∴ p ( x ( i ) , z ( i ) , θ ) = c ∗ Q i ( z ( i ) ) \therefore p(x^{(i)},z^{(i)},\theta)=c\ast Q_i(z^{(i)}) ∴p(x(i),z(i),θ)=c∗Qi(z(i)),两边对 z ( i ) z^{(i)} z(i)求和, ∴ ∑ z ( i ) p ( x ( i ) , z ( i ) , θ ) = c ∗ ∑ z ( i ) Q i ( z ( i ) ) = c \therefore \sum_{z^{(i)}}p(x^{(i)},z^{(i)},\theta)=c\ast \sum_{z^{(i)}}Q_i(z^{(i)})=c ∴∑z(i)p(x(i),z(i),θ)=c∗∑z(i)Qi(z(i))=c
∴ p ( x ( i ) , z ( i ) , θ ) Q i ( z ( i ) ) = ∑ z ( i ) p ( x ( i ) , z ( i ) , θ ) , ∴ Q i ( z ( i ) ) = p ( x ( i ) , z ( i ) , θ ) ∑ z ( i ) p ( x ( i ) , z ( i ) , θ ) = p ( x ( i ) , z ( i ) , θ ) ∑ z ( i ) p ( z ( i ) ) p ( x ( i ) , θ ∣ z ( i ) ) = p ( z ( i ) ∣ x ( i ) , θ ) \therefore \frac{p(x^{(i)},z^{(i)},\theta)}{Q_i(z^{(i)})}=\sum_{z^{(i)}}p(x^{(i)},z^{(i)},\theta) ,\therefore Q_i(z^{(i)})=\frac{p(x^{(i)},z^{(i)},\theta)}{\sum_{z^{(i)}}p(x^{(i)},z^{(i)},\theta)}=\frac{p(x^{(i)},z^{(i)},\theta)}{\sum_{z^{(i)}}p(z^{(i)})p(x^{(i)},\theta|z^{(i)})}=p(z^{(i)}|x^{(i)},\theta) ∴Qi(z(i))p(x(i),z(i),θ)=∑z(i)p(x(i),z(i),θ),∴Qi(z(i))=∑z(i)p(x(i),z(i),θ)p(x(i),z(i),θ)=∑z(i)p(z(i))p(x(i),θ∣z(i))p(x(i),z(i),θ)=p(z(i)∣x(i),θ)
∴ \therefore ∴当且仅当 p ( x ( i ) , z ( i ) , θ ) Q i ( z ( i ) ) = c \frac{p(x^{(i)},z^{(i)},\theta)}{Q_i(z^{(i)})}=c Qi(z(i))p(x(i),z(i),θ)=c,即为定值时,有:
θ ^ = a r g m a x θ ∑ i = 1 n ∑ z ( i ) p ( z ( i ) ∣ x ( i ) , θ ) ∗ l n p ( x ( i ) , z ( i ) , θ ) \hat{\theta}=\mathop{argmax}\limits_{\theta} \sum_{i=1}^n \sum_{z^{(i)}}p(z^{(i)}|x^{(i)},\theta)*lnp(x^{(i)},z^{(i)},\theta) θ^=θargmax∑i=1n∑z(i)p(z(i)∣x(i),θ)∗lnp(x(i),z(i),θ)
可以看出 ∑ i = 1 n ∑ z ( i ) p ( z ( i ) ∣ x ( i ) , θ ) ∗ l n p ( x ( i ) , z ( i ) , θ ) \sum_{i=1}^n \sum_{z^{(i)}}p(z^{(i)}|x^{(i)},\theta)*lnp(x^{(i)},z^{(i)},\theta) ∑i=1n∑z(i)p(z(i)∣x(i),θ)∗lnp(x(i),z(i),θ)就是 l n p ( x ( i ) , z ( i ) , θ ) lnp(x^{(i)},z^{(i)},\theta) lnp(x(i),z(i),θ)关于 z ∣ x , θ z|x,\theta z∣x,θ的条件期望,也就是当样本和未知参数 θ \theta θ给定时,我们既然不知道隐变量 z z z的取值,就用期望去替代它,当然时在样本 x x x和参数 θ \theta θ给定的条件下,求完这个条件期望,再去关于 θ \theta θ去极大化 L ( θ ) L(\theta) L(θ),这就是 E M EM EM算法的基本思想。
所以,我们的 E E E步就是,先随机定出 θ j \theta^{j} θj作为初始值,当然这个初始值也包含隐变量 z z z,然后利用样本 x x x去求条件概率 Q i ( z ( i ) ) = p ( z ( i ) ∣ x ( i ) , θ j ) , j = 1 , 2 , . . . N Q_i(z^{(i)})=p(z^{(i)}|x^{(i)},\theta^j),j=1,2,...N Qi(z(i))=p(z(i)∣x(i),θj),j=1,2,...N为迭代次数, θ j \theta^j θj和 θ \theta θ不一样, θ \theta θ是上帝知道的最优值, θ j \theta^j θj是不断迭代去靠近 θ \theta θ的值,求出 Q i ( z ( i ) ) Q_i(z^{(i)}) Qi(z(i))后代入到 L ( θ ) = ∑ i = 1 n ∑ z ( i ) Q i ( z ( i ) ) ∗ l n p ( x ( i ) , z ( i ) , θ ) L(\theta)=\sum_{i=1}^n \sum_{z^{(i)}}Q_i(z^{(i)})*lnp(x^{(i)},z^{(i)},\theta) L(θ)=∑i=1n∑z(i)Qi(z(i))∗lnp(x(i),z(i),θ)当中,然后对 θ \theta θ求偏导,极大化 L ( θ ) L(\theta) L(θ),得到 θ j + 1 \theta^{j+1} θj+1,这是 M M M步。再把 θ j + 1 \theta^{j+1} θj+1替换原来的 θ j \theta^j θj,再重新执行 E E E步,不断迭代,直到 L ( θ ) L(\theta) L(θ)收敛。
1.2.EM算法的收敛性证明
那么如何得知 L ( θ ) L(\theta) L(θ)按照这样的方法一定会收敛(极大化)呢?
即要证明 L ( θ , θ j + 1 ) ≥ L ( θ , θ j ) L(\theta,\theta^{j+1})\geq L(\theta,\theta^j) L(θ,θj+1)≥L(θ,θj)就能说明每次 L ( θ , θ j ) L(\theta,\theta^j) L(θ,θj)都在变大。
∵ L ( θ , θ j ) = ∑ i = 1 n ∑ z ( i ) p ( z ( i ) ∣ x ( i ) , θ j ) ∗ l n p ( x ( i ) , z ( i ) , θ ) , \because L(\theta,\theta^{j})=\sum_{i=1}^n \sum_{z^{(i)}}p(z^{(i)}|x^{(i)},\theta^j)*lnp(x^{(i)},z^{(i)},\theta), ∵L(θ,θj)=∑i=1n∑z(i)p(z(i)∣x(i),θj)∗lnp(x(i),z(i),θ), ①
定义 H ( θ , θ j ) = ∑ i = 1 n ∑ z ( i ) p ( z ( i ) ∣ x ( i ) , θ j ) ∗ l n p ( z ( i ) ∣ x ( i ) , θ ) , H(\theta,\theta^{j})=\sum_{i=1}^n \sum_{z^{(i)}}p(z^{(i)}|x^{(i)},\theta^j)*lnp(z^{(i)}|x^{(i)},\theta), H(θ,θj)=∑i=1n∑z(i)p(z(i)∣x(i),θj)∗lnp(z(i)∣x(i),θ), ②
① − - −②:
L ( θ , θ j ) − H ( θ , θ j ) = ∑ i = 1 n ∑ z ( i ) p ( z ( i ) ∣ x ( i ) , θ j ) ∗ l n p ( x ( i ) , z ( i ) , θ ) p ( z ( i ) ∣ x ( i ) , θ ) = ∑ i = 1 n ∑ z ( i ) p ( z ( i ) ∣ x ( i ) , θ j ) ∗ l n p ( x ( i ) , θ ) = ∑ i = 1 n l n p ( x ( i ) , θ ) ∑ z ( i ) p ( z ( i ) ∣ x ( i ) , θ j ) = ∑ i = 1 n l n p ( x ( i ) , θ ) L(\theta,\theta^{j})-H(\theta,\theta^{j})=\sum_{i=1}^n \sum_{z^{(i)}}p(z^{(i)}|x^{(i)},\theta^j)*ln \frac{p(x^{(i)},z^{(i)},\theta)}{p(z^{(i)}|x^{(i)},\theta)}=\sum_{i=1}^n \sum_{z^{(i)}}p(z^{(i)}|x^{(i)},\theta^j)*lnp(x^{(i)},\theta) \\ =\sum_{i=1}^n lnp(x^{(i)},\theta)\sum_{z^{(i)}}p(z^{(i)}|x^{(i)},\theta^j)=\sum_{i=1}^n lnp(x^{(i)},\theta) L(θ,θj)−H(θ,θj)=∑i=1n∑z(i)p(z(i)∣x(i),θj)∗lnp(z(i)∣x(i),θ)p(x(i),z(i),θ)=∑i=1n∑z(i)p(z(i)∣x(i),θj)∗lnp(x(i),θ)=∑i=1nlnp(x(i),θ)∑z(i)p(z(i)∣x(i),θj)=∑i=1nlnp(x(i),θ)
即 ∑ i = 1 n l n p ( x ( i ) , θ ) = L ( θ , θ j ) − H ( θ , θ j ) \sum_{i=1}^n lnp(x^{(i)},\theta)=L(\theta,\theta^{j})-H(\theta,\theta^{j}) ∑i=1nlnp(x(i),θ)=L(θ,θj)−H(θ,θj)
∴ ∑ i = 1 n l n p ( x ( i ) , θ j + 1 ) − ∑ i = 1 n l n p ( x ( i ) , θ j ) = [ L ( θ j + 1 , θ j ) − H ( θ j + 1 , θ j ) ] − [ L ( θ j , θ j ) − H ( θ j , θ j ) ] = [ L ( θ j + 1 , θ j ) − L ( θ j , θ j ) ] − [ H ( θ j + 1 , θ j ) − H ( θ j , θ j ) ] \therefore \sum_{i=1}^n lnp(x^{(i)},\theta^{j+1})-\sum_{i=1}^n lnp(x^{(i)},\theta^j)=[L(\theta^{j+1},\theta^{j})-H(\theta^{j+1},\theta^{j})]-[L(\theta^j,\theta^{j})-H(\theta^j,\theta^{j})] \\ =[L(\theta^{j+1},\theta^{j})-L(\theta^j,\theta^{j})]-[H(\theta^{j+1},\theta^{j})-H(\theta^j,\theta^{j})] ∴∑i=1nlnp(x(i),θj+1)−∑i=1nlnp(x(i),θj)=[L(θj+1,θj)−H(θj+1,θj)]−[L(θj,θj)−H(θj,θj)]=[L(θj+1,θj)−L(θj,θj)]−[H(θj+1,θj)−H(θj,θj)]
定义前半部分为 A A A, A = L ( θ j + 1 , θ j ) − L ( θ j , θ j ) A=L(\theta^{j+1},\theta^{j})-L(\theta^j,\theta^{j}) A=L(θj+1,θj)−L(θj,θj), ∵ \because ∵从 θ j \theta^j θj代入到 L ( θ ) L(\theta) L(θ)中,到 θ j + 1 \theta^{j+1} θj+1代入到 L ( θ ) L(\theta) L(θ),就是在不断极大化 L ( θ ) L(\theta) L(θ),属于 M M M步,所以 L ( θ j + 1 , θ j ) L(\theta^{j+1},\theta^{j}) L(θj+1,θj)肯定大于 L ( θ j , θ j ) L(\theta^{j},\theta^{j}) L(θj,θj), ∴ A ≥ 0 \therefore A\geq0 ∴A≥0
定义后半部分为 B B B,有 B = H ( θ j + 1 , θ j ) − H ( θ j , θ j ) = ∑ i = 1 n ∑ z ( i ) p ( z ( i ) ∣ x ( i ) , θ j ) ∗ l n p ( z ( i ) ∣ x ( i ) , θ j + 1 ) − ∑ i = 1 n ∑ z ( i ) p ( z ( i ) ∣ x ( i ) , θ j ) ∗ l n p ( z ( i ) ∣ x ( i ) , θ j ) = ∑ i = 1 n ∑ z ( i ) p ( z ( i ) ∣ x ( i ) , θ j ) ∗ l n p ( z ( i ) ∣ x ( i ) , θ j + 1 ) p ( z ( i ) ∣ x ( i ) , θ j ) = ∑ i = 1 n E l n p ( z ( i ) ∣ x ( i ) , θ j + 1 ) p ( z ( i ) ∣ x ( i ) , θ j ) ≤ ∑ i = 1 n l n E p ( z ( i ) ∣ x ( i ) , θ j + 1 ) p ( z ( i ) ∣ x ( i ) , θ j ) = ∑ i = 1 n l n ∑ z ( i ) p ( z ( i ) ∣ x ( i ) , θ j ) p ( z ( i ) ∣ x ( i ) , θ j + 1 ) p ( z ( i ) ∣ x ( i ) , θ j ) = ∑ i = 1 n l n ∑ z ( i ) p ( z ( i ) ∣ x ( i ) , θ j + 1 ) = ∑ i = 1 n l n 1 = 0 B=H(\theta^{j+1},\theta^{j})-H(\theta^j,\theta^{j})=\sum_{i=1}^n \sum_{z^{(i)}}p(z^{(i)}|x^{(i)},\theta^j)*lnp(z^{(i)}|x^{(i)},\theta^{j+1})-\sum_{i=1}^n \sum_{z^{(i)}}p(z^{(i)}|x^{(i)},\theta^j)*lnp(z^{(i)}|x^{(i)},\theta^j) \\ =\sum_{i=1}^n \sum_{z^{(i)}}p(z^{(i)}|x^{(i)},\theta^j)*ln \frac{p(z^{(i)}|x^{(i)},\theta^{j+1})}{p(z^{(i)}|x^{(i)},\theta^{j})}=\sum_{i=1}^n Eln \frac{p(z^{(i)}|x^{(i)},\theta^{j+1})}{p(z^{(i)}|x^{(i)},\theta^{j})}\leq\sum_{i=1}^n lnE \frac{p(z^{(i)}|x^{(i)},\theta^{j+1})}{p(z^{(i)}|x^{(i)},\theta^{j})} \\ =\sum_{i=1}^n ln\sum_{z^{(i)}}p(z^{(i)}|x^{(i)},\theta^j) \frac{p(z^{(i)}|x^{(i)},\theta^{j+1})}{p(z^{(i)}|x^{(i)},\theta^{j})}=\sum_{i=1}^n ln\sum_{z^{(i)}} p(z^{(i)}|x^{(i)},\theta^{j+1})=\sum_{i=1}^n ln1=0 B=H(θj+1,θj)−H(θj,θj)=∑i=1n∑z(i)p(z(i)∣x(i),θj)∗lnp(z(i)∣x(i),θj+1)−∑i=1n∑z(i)p(z(i)∣x(i),θj)∗lnp(z(i)∣x(i),θj)=∑i=1n∑z(i)p(z(i)∣x(i),θj)∗lnp(z(i)∣x(i),θj)p(z(i)∣x(i),θj+1)=∑i=1nElnp(z(i)∣x(i),θj)p(z(i)∣x(i),θj+1)≤∑i=1nlnEp(z(i)∣x(i),θj)p(z(i)∣x(i),θj+1)=∑i=1nln∑z(i)p(z(i)∣x(i),θj)p(z(i)∣x(i),θj)p(z(i)∣x(i),θj+1)=∑i=1nln∑z(i)p(z(i)∣x(i),θj+1)=∑i=1nln1=0
( \big( (上式中用到了 J e n s e n Jensen Jensen不等式, E l n x ≤ l n ( E x ) Elnx\leq ln(Ex) Elnx≤ln(Ex) ) \big) ), ∴ B ≤ 0 \therefore B\leq0 ∴B≤0
∵ A ≥ 0 , B ≤ 0 \because A\geq0,B\leq 0 ∵A≥0,B≤0,大于等于0的数减去小于等于0的数必大于等于0, ∴ L ( θ , θ j + 1 ) ≥ L ( θ , θ j ) \therefore L(\theta,\theta^{j+1})\geq L(\theta,\theta^j) ∴L(θ,θj+1)≥L(θ,θj), ∴ \therefore ∴似然函数收敛
1.3.EM算法的步骤
有样本 x = ( x ( 1 ) , x ( 2 ) , . . . , x ( n ) ) x=(x^{(1)},x^{(2)},...,x^{(n)}) x=(x(1),x(2),...,x(n)),隐变量 z = ( z ( 1 ) , z ( 2 ) , . . . , z ( n ) ) z=(z^{(1)},z^{(2)},...,z^{(n)}) z=(z(1),z(2),...,z(n)),设最大迭代次数为 N N N,未知参数 θ = ( θ 1 , θ 2 , . . . , θ m , z ) \theta=(\theta_1,\theta_2,...,\theta_m,z) θ=(θ1,θ2,...,θm,z),注意, z z z也是个未知参数
1.取 θ i t e r = θ 0 \theta_{iter}=\theta^0 θiter=θ0,初始迭代次数 n i t e r = 0 n_{iter}=0 niter=0
2. w h i l e \quad while while n i t e r ≤ N : n_{iter} \leq N: niter≤N:
E 步 = { l a s t θ i t e r = θ i t e r Q i ( z ( i ) ) = p ( z ( i ) ∣ x ( i ) , θ i t e r ) L ( θ , θ i t e r ) = ∑ i = 1 n ∑ z ( i ) Q i ( z ( i ) ) l n p ( x ( i ) , z ( i ) , θ i t e r ) E步=\begin{cases}last\theta_{iter}=\theta_{iter} \\ Q_i(z^{(i)})=p(z^{(i)}|x^{(i)},\theta_{iter}) \\ L(\theta,\theta_{iter})=\sum_{i=1}^n \sum_{z^{(i)}}Q_i(z^{(i)})lnp(x^{(i)},z^{(i)},\theta_{iter})\end{cases} E步=⎩ ⎨ ⎧lastθiter=θiterQi(z(i))=p(z(i)∣x(i),θiter)L(θ,θiter)=∑i=1n∑z(i)Qi(z(i))lnp(x(i),z(i),θiter)
M 步 = { θ i t e r = a r g m a x θ L ( θ , θ i t e r ) i f ∣ L ( θ , θ i t e r ) − L ( θ , l a s t θ i t e r ) ∣ < ϵ b r e a k M步=\begin{cases}\theta_{iter}=\mathop{argmax}\limits_{\theta}L(\theta,\theta_{iter}) \\ if \quad |L(\theta,\theta_{iter})-L(\theta,last\theta_{iter})|<\epsilon \\ \qquad break\end{cases} M步=⎩ ⎨ ⎧θiter=θargmaxL(θ,θiter)if∣L(θ,θiter)−L(θ,lastθiter)∣<ϵbreak
n = n + 1 n=n+1 n=n+1
r e t u r n θ i t e r return \quad \theta_{iter} returnθiter
1.4.GMM混合高斯分布在EM算法中的求解过程
现有样本 x = ( x 1 , x 2 , . . . , x n ) x=(x_1,x_2,...,x_n) x=(x1,x2,...,xn),每个单独的样本为一个二维向量,即 x i = ( x i ( 1 ) , x i ( 2 ) ) x_i=(x_i^{(1)},x_i^{(2)}) xi=(xi(1),xi(2))
但是样本里混合了两个高斯分布,两个高斯分布的均值方差和相关系数均未知,即未知参数为 θ = ( u , σ 2 , ρ ) \theta=(u,\sigma^2,\rho) θ=(u,σ2,ρ)
且每个样本均有 Q i ( z ( i ) ) Q_i(z^{(i)}) Qi(z(i))的概率服从一个二维高斯分布,二维高斯密度为:
φ ( x ( 1 ) , x ( 2 ) ) = 1 2 π σ 1 σ 2 1 − ρ 2 e x p { − 1 2 ( 1 − ρ 2 ) [ ( x ( 1 ) − u ( 1 ) ) 2 σ 1 2 − 2 ρ ( x ( 1 ) − u ( 1 ) ) ( x ( 2 ) − u ( 2 ) ) σ 1 σ 2 + ( x ( 2 ) − u ( 2 ) ) 2 σ 2 2 ] } \varphi(x^{(1)},x^{(2)})=\frac{1}{2\pi\sigma_1\sigma_2\sqrt{1-\rho^2}}exp\left\{-\frac{1}{2(1-\rho^2)}\big[\frac{(x^{(1)}-u^{(1)})^2}{\sigma_1^2}-2\rho\frac{(x^{(1)}-u^{(1)})(x^{(2)}-u^{(2)})}{\sigma_1\sigma_2}+\frac{(x^{(2)}-u^{(2)})^2}{\sigma_2^2}\big]\right\} φ(x(1),x(2))=2πσ1σ21−ρ21exp{−2(1−ρ2)1[σ12(x(1)−u(1))2−2ρσ1σ2(x(1)−u(1))(x(2)−u(2))+σ22(x(2)−u(2))2]}
∴ \therefore ∴加入隐变量的最大似然过程为:
θ ^ = a r g m a x θ ∑ i = 1 n ∑ j = 1 2 p ( z j i ∣ x i , θ ) ∗ l n p ( x i , z j i , θ ) , x i = ( x i ( 1 ) , x i ( 2 ) ) \hat{\theta}=\mathop{argmax}\limits_{\theta} \sum_{i=1}^n \sum_{j=1}^2p(z_{ji}|x_i,\theta)*lnp(x_i,z_ji,\theta),\quad x_i=(x_i^{(1)},x_i^{(2)}) θ^=θargmax∑i=1n∑j=12p(zji∣xi,θ)∗lnp(xi,zji,θ),xi=(xi(1),xi(2))
其中 p ( z j i ∣ x i , θ ) = p ( z j i ) p ( x i , θ ∣ z j i ) p ( z 1 i ) p ( x i , θ ∣ z 1 i ) + p ( z 2 i ) p ( x i , θ ∣ z 2 i ) , j = 1 , 2 ; i = 1 , 2 , . . . , n p(z_{ji}|x_i,\theta)=\frac{p(z_ji)p(x_i,\theta|z_ji)}{p(z_{1i})p(x_i,\theta|z_1i)+p(z_{2i})p(x_i,\theta|z_2i)},j=1,2;i=1,2,...,n p(zji∣xi,θ)=p(z1i)p(xi,θ∣z1i)+p(z2i)p(xi,θ∣z2i)p(zji)p(xi,θ∣zji),j=1,2;i=1,2,...,n
隐变量条件分布的矩阵为:
x 1 x_1 x1 | x 2 x_2 x2 | … | x n x_n xn | |
---|---|---|---|---|
p ( z 1 ) p(z_1) p(z1) | $p(z_{11} | x_1,\theta)$ | $p(z_{12} | x_2,\theta)$ |
p ( z 2 ) p(z_2) p(z2) | $p(z_{21} | x_1,\theta)$ | $p(z_{22} | x_2,\theta)$ |
s u m sum sum | 1 | 1 | … | 1 |
每个 x i = ( x i ( 1 ) , x i ( 2 ) ) ∼ N ( u ( 1 ) , u ( 2 ) ; σ 2 ( 1 ) , σ 2 ( 2 ) ; ρ ) x_i=(x_i^{(1)},x_i^{(2)})\sim N(u^{(1)},u^{(2)};\sigma^{2(1)},\sigma^{2(2)};\rho) xi=(xi(1),xi(2))∼N(u(1),u(2);σ2(1),σ2(2);ρ),我们首先要设置初始值:
u ( 0 ) = ( u 11 , u 12 ; u 21 , u 22 ) , σ 2 ( 0 ) = ( σ 11 2 , σ 12 2 ; σ 21 2 , σ 22 2 ) , ρ ( 0 ) = ( ρ 1 , ρ 2 ) , p ( z ) ( 0 ) = ( p ( z 1 ) , p ( z 2 ) ) u^{(0)}=(u_{11},u_{12};u_{21},u_{22}),\quad \sigma^{2(0)}=(\sigma_{11}^2,\sigma_{12}^2;\sigma_{21}^2,\sigma_{22}^2),\quad \rho^{(0)}=(\rho_1,\rho_2),\quad p(z)^{(0)}=(p(z_1),p(z_2)) u(0)=(u11,u12;u21,u22),σ2(0)=(σ112,σ122;σ212,σ222),ρ(0)=(ρ1,ρ2),p(z)(0)=(p(z1),p(z2))
经过对似然函数求偏导方法,可得:
E E E步为:
p ( z 1 ) ( 1 ) = ∑ i = 1 n p ( z 1 ∣ x i , θ ) n , p ( z 2 ) ( 1 ) = ∑ i = 1 n p ( z 2 ∣ x i , θ ) n p(z_1)^{(1)}=\frac{\sum_{i=1}^n p(z_1|x_i,\theta)}{n},\qquad p(z_2)^{(1)}=\frac{\sum_{i=1}^n p(z_2|x_i,\theta)}{n} p(z1)(1)=n∑i=1np(z1∣xi,θ),p(z2)(1)=n∑i=1np(z2∣xi,θ)
M M M步为:
u 11 ( 1 ) = ∑ i = 1 n p ( z 1 ∣ x i , θ ) x i ( 1 ) ∑ i = 1 n p ( z 1 ∣ x i , θ ) , u 12 ( 1 ) = ∑ i = 1 n p ( z 1 ∣ x i , θ ) x i ( 2 ) ∑ i = 1 n p ( z 1 ∣ x i , θ ) , u 21 ( 1 ) = ∑ i = 1 n p ( z 2 ∣ x i , θ ) x i ( 1 ) ∑ i = 1 n p ( z 2 ∣ x i , θ ) , u 22 ( 1 ) = ∑ i = 1 n p ( z 2 ∣ x i , θ ) x i ( 2 ) ∑ i = 1 n p ( z 2 ∣ x i , θ ) u_{11}^{(1)}=\frac{\sum_{i=1}^n p(z_1|x_i,\theta)x_{i}^{(1)}}{\sum_{i=1}^n p(z_1|x_i,\theta)},\quad u_{12}^{(1)}=\frac{\sum_{i=1}^n p(z_1|x_i,\theta)x_{i}^{(2)}}{\sum_{i=1}^n p(z_1|x_i,\theta)},\quad u_{21}^{(1)}=\frac{\sum_{i=1}^n p(z_2|x_i,\theta)x_{i}^{(1)}}{\sum_{i=1}^n p(z_2|x_i,\theta)},\quad u_{22}^{(1)}=\frac{\sum_{i=1}^n p(z_2|x_i,\theta)x_{i}^{(2)}}{\sum_{i=1}^n p(z_2|x_i,\theta)} u11(1)=∑i=1np(z1∣xi,θ)∑i=1np(z1∣xi,θ)xi(1),u12(1)=∑i=1np(z1∣xi,θ)∑i=1np(z1∣xi,θ)xi(2),u21(1)=∑i=1np(z2∣xi,θ)∑i=1np(z2∣xi,θ)xi(1),u22(1)=∑i=1np(z2∣xi,θ)∑i=1np(z2∣xi,θ)xi(2)
σ 11 2 ( 1 ) = ∑ i = 1 n p ( z 1 ∣ x i , θ ) ( x i ( 1 ) − u 11 ( 1 ) ) 2 ∑ i = 1 n p ( z 1 ∣ x i , θ ) , σ 12 2 ( 1 ) = ∑ i = 1 n p ( z 1 ∣ x i , θ ) ( x i ( 2 ) − u 12 ( 1 ) ) 2 ∑ i = 1 n p ( z 1 ∣ x i , θ ) , σ 21 2 ( 1 ) = ∑ i = 1 n p ( z 2 ∣ x i , θ ) ( x i ( 1 ) − u 21 ( 1 ) ) 2 ∑ i = 1 n p ( z 2 ∣ x i , θ ) , σ 22 2 ( 1 ) = ∑ i = 1 n p ( z 2 ∣ x i , θ ) ( x i ( 2 ) − u 22 ( 1 ) ) 2 ∑ i = 1 n p ( z 2 ∣ x i , θ ) \sigma_{11}^{2(1)}=\frac{\sum_{i=1}^n p(z_1|x_i,\theta)(x_{i}^{(1)}-u_{11}^{(1)})^2}{\sum_{i=1}^n p(z_1|x_i,\theta)},\quad \sigma_{12}^{2(1)}=\frac{\sum_{i=1}^n p(z_1|x_i,\theta)(x_{i}^{(2)}-u_{12}^{(1)})^2}{\sum_{i=1}^n p(z_1|x_i,\theta)},\quad \sigma_{21}^{2(1)}=\frac{\sum_{i=1}^n p(z_2|x_i,\theta)(x_{i}^{(1)}-u_{21}^{(1)})^2}{\sum_{i=1}^n p(z_2|x_i,\theta)},\quad \sigma_{22}^{2(1)}=\frac{\sum_{i=1}^n p(z_2|x_i,\theta)(x_{i}^{(2)}-u_{22}^{(1)})^2}{\sum_{i=1}^n p(z_2|x_i,\theta)} σ112(1)=∑i=1np(z1∣xi,θ)∑i=1np(z1∣xi,θ)(xi(1)−u11(1))2,σ122(1)=∑i=1np(z1∣xi,θ)∑i=1np(z1∣xi,θ)(xi(2)−u12(1))2,σ212(1)=∑i=1np(z2∣xi,θ)∑i=1np(z2∣xi,θ)(xi(1)−u21(1))2,σ222(1)=∑i=1np(z2∣xi,θ)∑i=1np(z2∣xi,θ)(xi(2)−u22(1))2
ρ 1 ( 1 ) = p ( z 1 ) ( 1 ) r , ρ 2 ( 1 ) = p ( z 2 ) ( 1 ) r \rho_1^{(1)}=p(z_1)^{(1)}r,\quad \rho_2^{(1)}=p(z_2)^{(1)}r ρ1(1)=p(z1)(1)r,ρ2(1)=p(z2)(1)r
迭代退出条件: ∣ L ( θ j + 1 ) − L ( θ j ) ∣ < ϵ |L(\theta^{j+1})-L(\theta^{j})|<\epsilon ∣L(θj+1)−L(θj)∣<ϵ
2.识别男女语音的具体应用
2.1.提取声音特征
首先下载一个开源中文语音数据集ST-CMDS-20170001_1-OS,ST-CMDS是由一个AI数据公司发布的中文语音数据集,包含10万余条语音文件,大约100余小时的语音数据。数据内容以平时的网上语音聊天和智能语音控制语句为主,855个不同说话者,同时有男声和女声,适合多种场景下使用。国外镜像地址为:https://link.ailemon.net/?target=http://www.openslr.org/resources/38/ST-CMDS-20170001_1-OS.tar.gz
该数据集包含102600条语音数据,每条语音约从第1秒开始,持续3秒,以生活类的聊天为主,但是这10万条语音没有标注,我们可以设想,对于有利于区分性别的声音特征,是一个服从多维高斯分布的随机变量,显然在这个问题里,我们有两个多维高斯分布混在一起,我们的研究目的是将两个分布区分开,并且分别计算出这两个多维高斯分布的均值向量和协方差矩阵。
然后,随意给出一条语音,提取特征后,计算它分别属于两个分布的概率,哪个概率大,就认为该特征属于哪个分布,就达到了识别男女声音的目的。
但是我们首先要解决两个问题:
1.遍历语音文件
2.提取出有利于区分性别的声音特征
第1点可以通过 P y t h o n Python Python O S OS OS库解决,第2点,在 k a g g l e kaggle kaggle上标注好性别的声音各个特征的数据集,其各个声音特征语义为:
feature | intepretation |
---|---|
meanfreq | 平均频率(以 kHz 为单位) |
sd | 频率的标准偏差 |
median | 中值频率(以 kHz 为单位) |
Q25 | 频率第一个分位数(以 kHz 为单位) |
Q75 | 频率第三分位数(以 kHz 为单位) |
IQR | 频率四分位距 |
skew | 频率偏度 |
kurt | 频率峰度 |
sp.ent | 谱熵 |
sfm | 光谱平坦度 |
model | 频率模式 |
centroid | 频率质心 |
peakf | 峰值频率 |
meanfun | 在声学信号上测量的基频平均值 |
minfun | 在声学信号上测量的最小基频 |
maxfun | 在声学信号上测量的最大基频 |
meandom | 在声学信号上测量的主频率平均值 |
mindom | 在声学信号上测量的主频率最小值 |
maxdom | 在声学信号上测量的主频率最大值 |
dfrange | 在声学信号上测量的主频率范围 |
modindx | 调制指数,计算为基频相邻测量值之间的累积绝对差除以频率范围 |
我们可以利用标注好的数据集使用
X
G
b
o
o
s
t
XGboost
XGboost进行训练,得出区分男女声音能力的声音特征权重排序为:
可知声音特征
m
e
a
n
f
u
n
meanfun
meanfun,
I
Q
R
IQR
IQR 对于区分性别的贡献较大,对于其他自变量,需要观察是否有多重共线性:
综合考虑贡献度和相关系数,我们选择
m
e
a
n
f
u
n
meanfun
meanfun,
I
Q
R
IQR
IQR作为声音特征自变量,
m
e
a
n
f
u
n
meanfun
meanfun为基频值,男生基频值约为
0
∼
150
h
z
0\sim150hz
0∼150hz,女生基频值约为
100
∼
500
h
z
100\sim500hz
100∼500hz,基频能很好的区分男女声音;
I
Q
R
IQR
IQR是声音频率的四分位距,反应的是剔除频率极端值的频率分布情况。
上图为随机抽取的一条声音的语谱图(spectrum),是基于短时傅里叶变换(STFT)的汇集了频率、时间和振幅三种信息的图像,横轴代表时间,纵轴代表频率,频率线的颜色深浅代表能量(反映振幅),这里采用了对数尺度。
还有线性尺度和梅尔尺度,之所以有不同的尺度,是因为人的耳朵对不同频率段声音的识别能力是不一样的,人类的耳朵对于低频率的如 500 ∼ 1000 h z 500\sim1000hz 500∼1000hz声音能很好地区分,但对于如 20000 ∼ 20500 h z 20000\sim20500hz 20000∼20500hz的声音就听不出来区别了,所以线性尺度就是简单地将频率变化展现出来,对数尺度没有那么陡峭,而梅尔尺度,是通过变换将各个频率段的变化都放缩到人耳能识别的相同尺度。
我们这里要提取的基频为上图最下端一条曲线,可以看到,该声音从约0.6秒到3.3秒,基频约为128左右,可以大概猜测出来是个男声。
现在就要遍历102600条语音,通过 l i b r o s a librosa librosa库提取出每条语音的 m e a n f u n meanfun meanfun和 I Q R IQR IQR,其计算逻辑如下:
1.文件中的语音采样率默认为 22050 h z 22050hz 22050hz,这里为了更精准地提取特征,在提取语音时将采样率转换为 44100 h z 44100hz 44100hz,然后将声音转换为单声道,每条语音只提取从第1秒到第4秒的声音
2. m e a n f u n meanfun meanfun 基频:每条语音会按照采样率( 44100 h z 44100hz 44100hz)将3秒的声音转换为336帧,每帧均有一个基频值,输出基频平均值
3. I Q R IQR IQR 四分位距:每条语音的梅尔频率倒谱系数矩阵,提取每列平均值,形成了一个336个元素的数组,按从小到大排列,输出四分位距
采样率:连续信号在时间(或空间)上以某种方式变化着,而采样过程则是在时间(或空间)上,以 T T T为单位间隔来测量连续信号的值。 T T T称为采样间隔。在实际中,如果信号是时间的函数,通常他们的采样间隔都很小,一般在毫秒、微秒的量级。采样过程产生一系列的数字,称为样本。样本代表了原来的信号。每一个样本都对应着测量这一样本的特定时间点,而采样间隔的倒数, 1 T \frac{1}{T} T1即为采样频率,即赫兹( h e r t z hertz hertz),所以采样间隔越小,那么其倒数采样率就越大,采样也越精细。
梅尔频率倒谱系数(Mel Frequency Cepstrum Coefficient, MFCC)考虑到了人类的听觉特征,先将线性频谱映射到基于听觉感知的Mel非线性频谱中,然后转换到倒谱上。倒谱(cepstrum)就是一种信号的傅里叶变换经对数运算后再进行傅里叶反变换得到的声谱,MFCC经常应用在语音识别上。
但由于102600条语音数据量太大,平均提取一条声音约为1秒钟,完成需要31个小时左右,故这里采用随机的方式,抽取2000条语音进行提取,以下为提取 m e a n f u n meanfun meanfun, I Q R IQR IQR的代码:
"""
# encoding: utf-8
#!/usr/bin/env python3
@Author : ZDZ
@Time : 2022/10/12 20:22
"""
import os
import csv
import librosa.display
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
from alive_progress import alive_bar
import random
# 输入文件地址,返回其目录下所有语音文件地址列表
def getFileName(file_path):
file_list = []
f_list = os.listdir(file_path)
for i in f_list:
wave_path = file_path + '\\' + i
file_list.append(wave_path)
return file_list
# 提取基频和IQR特征,实时写入到一个文件
def voice_feature(path_list):
with open("transient1.csv", "w", encoding="gbk", newline="") as f:
csv_writer = csv.writer(f)
csv_writer.writerow(["F0", "IQR"])
random.seed(1)
path_list_random = random.sample(path_list, 2000)
with alive_bar(2000, force_tty=True) as bar:
for i in range(2000):
y, sr = librosa.load(path_list_random[i], sr=44100, mono=True, offset=1, duration=2)
f0, voiced_flag, voiced_probs = librosa.pyin(y, fmin=librosa.note_to_hz('C0'),
fmax=librosa.note_to_hz('C7'))
f0 = [round(k, 3) for k in np.nan_to_num(f0) if k != 0]
f0_average = round(np.average(f0), 3)
mel = np.round(pd.DataFrame(librosa.feature.melspectrogram(y=y, sr=sr)), 3)
mel_average = [np.average(mel.iloc[:, j]) for j in range(len(mel.columns.values))]
IQR = round(np.percentile(mel_average, 75) - np.percentile(mel_average, 25), 3)
csv_writer.writerow([f0_average, IQR])
f.flush()
print(f"F0:{f0_average},IQR:{IQR}")
f0.clear()
mel_average.clear()
bar()
f.close()
return
def spec_show(path_list):
value, catch = librosa.load(path_list, sr=44100)
A = librosa.stft(value)
Adb = librosa.amplitude_to_db(abs(A))
plt.figure(figsize=(14, 11))
librosa.display.specshow(Adb, sr=catch, x_axis='time', y_axis='log')
plt.title('Spectrogram')
plt.colorbar()
plt.show()
return
if __name__ == '__main__':
wave_file_path = 'E:\\Study\\Training_Dataset\\voice_chinese\\ST-CMDS-20170001_1-OS'
wave_path_list = getFileName(wave_file_path)
voice_feature(wave_path_list)
# spec_show('20170001P00001A0001.wav')
这里引入了一个实时写入的功能,防止在跑数据的过程中因设备突发事件中断,导致前功尽弃的情况出现,同时提供了一个进度条,可以随时观察计算进度。
2.2.EM算法具体过程
"""
# encoding: utf-8
#!/usr/bin/env python3
@Author : ZDZ
@Time : 2022/10/14 10:11
"""
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from scipy.stats import multivariate_normal
from mpl_toolkits.mplot3d import Axes3D
import random
global pp1, pp2
# 二维高斯分布密度
def PDF(u, sigma2, p, x):
part1 = 1 / (2 * np.pi * np.power(sigma2[0], 0.5) * np.power(sigma2[1], 0.5) * np.power(1 - p ** 2, 0.5))
part2 = - 1 / (2 * (1 - p ** 2))
part3 = (np.power(x[0] - u[0], 2)) / sigma2[0] - (2 * p * (x[0] - u[0]) * (x[1] - u[1])) / (
np.power(sigma2[0], 0.5) * np.power(sigma2[1], 0.5)) + (np.power(x[1] - u[1], 2)) / sigma2[1]
if part1 * np.exp(part2 * part3) < 2.2250738585072014e-308:
pdf = 2.2250738585072014e-300
else:
pdf = part1 * np.exp(part2 * part3)
return pdf
# 带有隐变量的似然函数值
def Likelihood(data, z, u, sigma2, p):
data = pd.DataFrame(data)
number = len(data.index.values)
likelihood_list = []
for i in range(number):
pp_1, pp_2 = E(z, u, sigma2, p, data.iloc[i, :])
likelihood_single = pp_1 * PDF(u[:2], sigma2[:2], p[0], data.iloc[i, :]) + pp_2 * PDF(u[2:], sigma2[2:], p[1],
data.iloc[i, :])
likelihood_list.append(likelihood_single)
likelihood = sum(likelihood_list)
likelihood_list.clear()
return likelihood
# E步
def E(z, u, sigma2, p, x):
"""
:param x: 某个样本向量,在这里是二维的,因为是二维正态分布
:param z: 隐变量概率值向量
:param u: 均值向量,前两个是第一个分布的,后两个是第二个分布的
:param sigma2: 方差向量,前两个是第一个分布的,后两个是第二个分布的
:param p: 相关系数向量,前一个是第一个分布的,后一个是第二个分布的
:return: 在样本和未知参数给定的条件下,属于该隐变量的条件概率
"""
prior_probability1 = z[0] * PDF(u[:2], sigma2[:2], p[0], x)
prior_probability2 = z[1] * PDF(u[2:], sigma2[2:], p[1], x)
posterior_probability1 = prior_probability1 / (prior_probability1 + prior_probability2)
return posterior_probability1, 1 - posterior_probability1
# M步
def M(data, z, u, sigma2, p, N=10000, epsilon=1e-100):
global pp1, pp2
data = pd.DataFrame(data)
number = len(data.index.values)
n_iter = 0
while n_iter <= N:
print(f"迭代第{n_iter + 1}次")
last_z, last_u, last_sigma2, last_p = z, u, sigma2, p
pp1_sum, pp2_sum = [], []
pp1_1_sum, pp1_2_sum, pp2_1_sum, pp2_2_sum = [], [], [], []
pp1_1_s_sum, pp1_2_s_sum, pp2_1_s_sum, pp2_2_s_sum = [], [], [], []
for j in range(number):
pp1, pp2 = E(z, u, sigma2, p, data.iloc[j, :])
pp1_1, pp1_2 = pp1 * data.iloc[j, 0], pp1 * data.iloc[j, 1]
pp2_1, pp2_2 = pp2 * data.iloc[j, 0], pp2 * data.iloc[j, 1]
pp1_sum.append(pp1)
pp2_sum.append(pp2)
pp1_1_sum.append(pp1_1)
pp1_2_sum.append(pp1_2)
pp2_1_sum.append(pp2_1)
pp2_2_sum.append(pp2_2)
z1, z2 = sum(pp1_sum) / number, sum(pp2_sum) / number
u11, u12, u21, u22 = sum(pp1_1_sum) / sum(pp1_sum), sum(pp1_2_sum) / sum(pp1_sum), sum(pp2_1_sum) / sum(
pp2_sum), sum(pp2_2_sum) / sum(pp2_sum)
for k in range(number):
pp1, pp2 = E(z, u, sigma2, p, data.iloc[k, :])
pp1_1_s, pp1_2_s = pp1 * np.power(data.iloc[k, 0] - u11, 2), pp1 * np.power(data.iloc[k, 1] - u12, 2)
pp2_1_s, pp2_2_s = pp2 * np.power(data.iloc[k, 0] - u21, 2), pp2 * np.power(data.iloc[k, 1] - u22, 2)
pp1_1_s_sum.append(pp1_1_s)
pp1_2_s_sum.append(pp1_2_s)
pp2_1_s_sum.append(pp2_1_s)
pp2_2_s_sum.append(pp2_2_s)
sigma2_11 = sum(pp1_1_s_sum) / sum(pp1_sum)
sigma2_12 = sum(pp1_2_s_sum) / sum(pp1_sum)
sigma2_21 = sum(pp2_1_s_sum) / sum(pp2_sum)
sigma2_22 = sum(pp2_2_s_sum) / sum(pp2_sum)
z = [z1, z2]
u = [u11, u12, u21, u22]
sigma2 = [sigma2_11, sigma2_12, sigma2_21, sigma2_22]
p = [z1 * pd.Series(data.iloc[:, 1]).corr(pd.Series(data.iloc[:, 0]), method='pearson'),
z2 * pd.Series(data.iloc[:, 1]).corr(pd.Series(data.iloc[:, 0]), method='pearson')]
print(f"z的值为:{z}")
print(f"u的值为:{u}")
print(f"sigma2的值为:{sigma2}")
print(f"p的值为:{p}")
pp1_sum.clear()
pp2_sum.clear()
pp1_1_sum.clear()
pp1_2_sum.clear()
pp2_1_sum.clear()
pp2_2_sum.clear()
pp1_1_s_sum.clear()
pp1_2_s_sum.clear()
pp2_1_s_sum.clear()
pp2_2_s_sum.clear()
last_likelihood = Likelihood(data, last_z, last_u, last_sigma2, last_p)
new_likelihood = Likelihood(data, z, u, sigma2, p)
print(f"两次似然函数差值:{np.abs(new_likelihood - last_likelihood)}\n")
if np.abs(new_likelihood - last_likelihood) < epsilon:
break
n_iter += 1
pd.DataFrame(data={'u': u, 'sigma2': sigma2}).to_csv('feature_ult.csv')
pd.DataFrame(data={'p': p}).to_csv('p_ult.csv')
return z, u, sigma2, p
# 画图函数
def plot_gaussian(mean_sigma, p):
reference_data = pd.DataFrame(pd.read_csv(mean_sigma))
p_data = pd.DataFrame(pd.read_csv(p))
u_list = list(reference_data.iloc[:, 0])
sigma_list = list(reference_data.iloc[:, 1])
p_list = list(p_data.iloc[:, 0])
mean1 = [u_list[0], u_list[1]]
mean2 = [u_list[2], u_list[3]]
cov1 = [[sigma_list[0], p_list[0] * np.power(sigma_list[0], 0.5) * np.power(sigma_list[1], 0.5)],
[p_list[0] * np.power(sigma_list[0], 0.5) * np.power(sigma_list[1], 0.5), sigma_list[1]]]
cov2 = [[sigma_list[2], p_list[1] * np.power(sigma_list[2], 0.5) * np.power(sigma_list[3], 0.5)],
[p_list[1] * np.power(sigma_list[2], 0.5) * np.power(sigma_list[3], 0.5), sigma_list[3]]]
Gaussian1 = multivariate_normal(mean=mean1, cov=cov1)
Gaussian2 = multivariate_normal(mean=mean2, cov=cov2)
X, Y = np.meshgrid(np.linspace(25, 170, 50), np.linspace(-5, 15, 50))
d = np.dstack([X, Y])
Z1 = Gaussian1.pdf(d).reshape(50, 50)
Z2 = Gaussian2.pdf(d).reshape(50, 50)
fig = plt.figure()
ax = Axes3D(fig, auto_add_to_figure=False)
fig.add_axes(ax)
ax.plot_surface(X, Y, Z1, rstride=1, cstride=1, cmap='rainbow', alpha=0.6)
ax.plot_surface(X, Y, Z2, rstride=1, cstride=1, cmap='rainbow', alpha=0.6)
ax.set_xlabel('F0')
ax.set_ylabel('IQR')
ax.set_zlabel('pdf')
plt.show()
return
if __name__ == '__main__':
voice_data = pd.read_csv('transient1.csv')
z_init = [0.6, 0.4]
u_init = [129, 0.11, 69, 5.7]
sigma2_init = [25, 0.0035, 2.45, 3.78]
p_init = [0.5, 0.5]
# z_ult, u_ult, sigma2_ult, p_ult = M(voice_data, z_init, u_init, sigma2_init, p_init)
plot_gaussian('feature_ult.csv', 'p_ult.csv')
上述代码较为复杂,是因为没有采用向量方式的计算,而且在写代码的过程中,遇到了一些问题,就是实际数据( m e a n f u n meanfun meanfun, I Q R IQR IQR 值)以及设置的未知参数、隐变量初始值,在代入到二维高斯分布密度时,单独的一个密度函数值,会非常非常非常小,甚至小到低于了 P y t h o n Python Python可计算的最小值 2.2 e − 308 2.2e-308 2.2e−308,当出现这个情况时,就会将计算结果默认为0,在后续的后验概率计算过程中,当为0的密度函数值作为分母时,其后验概率就为 ∞ \infin ∞,不再可计算。
所以加入了当密度函数值超出 P y t h o n Python Python下界时,就将其转变为比下届大一点点,**但这里会产生一个疑问,如果时离散型的变量可以理解,但是连续型的高斯分布,为什么会将单独的一个样本值代入到密度函数中,**因为密度函数值并不代表概率,它的值也和方差有关,我在这里的理解是,在隐变量的加入后,隐变量 z z z的后验分布已经不是一个高斯分布了,因为高斯分布前面都有一个权重值,又经过贝叶斯公式的计算,所以这是一个混合高斯分布,密度函数值在这里的作用是只为了更新 Q i ( z ( i ) ) = p ( z ( i ) ∣ x ( i ) , θ ) Q_i(z^{(i)})=p(z^{(i)}|x^{(i)},\theta) Qi(z(i))=p(z(i)∣x(i),θ),即在样本 x x x和未知参数 θ \theta θ给定的条件下, z z z的条件概率权重。
经过296次迭代,可得到结果:
也就是,在样本中,属于女生声音的权重是0.38,属于男生声音权重是0.62
女生声音 x f e m a l e = ( x f e m a l e ( 1 ) , x f e m a l e ( 2 ) ) ∼ N ( 114.04 , 0.63 ; 141.45 , 0.26 ; − 0.01 ) x_{female}=(x_{female}^{(1)},x_{female}^{(2)})\sim N(114.04,0.63;141.45,0.26;-0.01) xfemale=(xfemale(1),xfemale(2))∼N(114.04,0.63;141.45,0.26;−0.01)
男生声音 x m a l e = ( x m a l e ( 1 ) , x m a l e ( 2 ) ) ∼ N ( 85.57 , 3.10 ; 822.38 , 9.90 ; − 0.02 ) x_{male}=(x_{male}^{(1)},x_{male}^{(2)})\sim N(85.57,3.10;822.38,9.90;-0.02) xmale=(xmale(1),xmale(2))∼N(85.57,3.10;822.38,9.90;−0.02)
也能得到区分后的二维高斯分布3D图:
2.3.麦克风实时录入并识别声音性别
读取电脑麦克风的声音需要用到 p y a u d i o pyaudio pyaudio库,这里将麦克风的声音提取后,导入前述提取 m e a n f u n meanfun meanfun 基频, I Q R IQR IQR 四分位距的函数,提取出实时的声音特征,然后,下面要计算出这个声音特征(是一个二维向量)属于哪个分布的概率大,但只有一个单独的样本,该如何计算出属于哪个分布的概率大呢?
这里原创了一种方法,就是将其样本邻域化,把录入的样本二维向量值,均作“减去0.05”和“加0.05”的邻域化,通过计算左邻域到右邻域分别在两个分布上的积分,即可以代表概率。
"""
# encoding: utf-8
#!/usr/bin/env python3
@Author : ZDZ
@Time : 2022/10/16 14:51
"""
import pyaudio
import wave
from tqdm import tqdm
import librosa.display
import pandas as pd
import numpy as np
from scipy.stats import multivariate_normal
def record_audio(wave_out_path, record_second):
CHUNK = 1024
FORMAT = pyaudio.paInt16
CHANNELS = 2
RATE = 44100
p = pyaudio.PyAudio()
stream = p.open(format=FORMAT,
channels=CHANNELS,
rate=RATE,
input=True,
frames_per_buffer=CHUNK)
wf = wave.open(wave_out_path, 'wb')
wf.setnchannels(CHANNELS)
wf.setsampwidth(p.get_sample_size(FORMAT))
wf.setframerate(RATE)
print("* recording")
for i in tqdm(range(0, int(RATE / CHUNK * record_second))):
data = stream.read(CHUNK)
wf.writeframes(data)
print("* done recording")
stream.stop_stream()
stream.close()
p.terminate()
wf.close()
def play_audio(wave_path):
CHUNK = 1024
wf = wave.open(wave_path, 'rb')
p = pyaudio.PyAudio()
stream = p.open(format=p.get_format_from_width(wf.getsampwidth()),
channels=wf.getnchannels(),
rate=wf.getframerate(),
output=True)
data = wf.readframes(CHUNK)
datas = []
while len(data) > 0:
data = wf.readframes(CHUNK)
datas.append(data)
for d in tqdm(datas):
stream.write(d)
stream.stop_stream()
stream.close()
p.terminate()
def predict_gender(voice_path):
y, sr = librosa.load(voice_path, sr=44100, mono=True, offset=1, duration=5)
f0, voiced_flag, voiced_probs = librosa.pyin(y, fmin=librosa.note_to_hz('C0'),
fmax=librosa.note_to_hz('C7'))
f0 = [round(k, 3) for k in np.nan_to_num(f0) if k != 0]
f0_average = round(np.average(f0), 3)
mel = np.round(pd.DataFrame(librosa.feature.melspectrogram(y=y, sr=sr)), 3)
mel_average = [np.average(mel.iloc[:, j]) for j in range(len(mel.columns.values))]
IQR = round(np.percentile(mel_average, 75) - np.percentile(mel_average, 25), 3)
print(f"你的声音基频为:{f0_average}")
print(f"你的声音频率四分位数为:{IQR}")
reference_data = pd.DataFrame(pd.read_csv('feature_ult.csv'))
p_data = pd.DataFrame(pd.read_csv('p_ult.csv'))
u_list = list(reference_data.iloc[:, 0])
sigma_list = list(reference_data.iloc[:, 1])
p_list = list(p_data.iloc[:, 0])
mean1 = [u_list[0], u_list[1]]
mean2 = [u_list[2], u_list[3]]
cov1 = [[sigma_list[0], p_list[0] * np.power(sigma_list[0], 0.5) * np.power(sigma_list[1], 0.5)],
[p_list[0] * np.power(sigma_list[0], 0.5) * np.power(sigma_list[1], 0.5), sigma_list[1]]]
cov2 = [[sigma_list[2], p_list[1] * np.power(sigma_list[2], 0.5) * np.power(sigma_list[3], 0.5)],
[p_list[1] * np.power(sigma_list[2], 0.5) * np.power(sigma_list[3], 0.5), sigma_list[3]]]
goal_data_l = [f0_average - 0.05, IQR - 0.05]
goal_data_r = [f0_average + 0.05, IQR + 0.05]
pro1_l = multivariate_normal.cdf(goal_data_l, mean=mean1, cov=cov1)
pro1_r = multivariate_normal.cdf(goal_data_r, mean=mean1, cov=cov1)
pro2_l = multivariate_normal.cdf(goal_data_l, mean=mean2, cov=cov2)
pro2_r = multivariate_normal.cdf(goal_data_r, mean=mean2, cov=cov2)
pro1 = abs(pro1_r - pro1_l)
pro2 = abs(pro2_r - pro2_l)
pro_female = pro1 / (pro1 + pro2)
pro_male = pro2 / (pro1 + pro2)
if pro_female >= pro_male:
play_audio('female_beep.wav')
print(f"你是女生的概率为:{pro_female}")
print(f"你是男生的概率为:{pro_male}")
print('你是女生')
else:
play_audio('male_beep.wav')
print(f"你是女生的概率为:{pro_female}")
print(f"你是男生的概率为:{pro_male}")
print('你是男生')
return
if __name__ == '__main__':
record_audio("output.wav", record_second=6)
play_audio("output.wav")
predict_gender('output.wav')
我们可以运行该程序,会自动开启麦克风,录制6秒钟的声音,输出判断。
该功能是对EM算法的简单应用,并没有考虑到声音的环境噪音对声音的影响、男生故意模仿女声或者女生故意模仿男声、声音信号其他特征在性别上不同展现等,会影响性别的识别,但在无噪音干扰的正常情况下,对男女识别的概率高达99%。