常见声音的时频统计特征的Python编程实现

最新推荐文章于 2023-12-03 21:02:00 发布

风雪夜回

最新推荐文章于 2023-12-03 21:02:00 发布

阅读量2.5k

点赞数 3

分类专栏：声学特征总结

转载请说明出处，欢迎讨论!

本文链接：https://blog.csdn.net/qq_30229253/article/details/95941119

版权

声学特征总结专栏收录该内容

7 篇文章 9 订阅

订阅专栏

文章目录

摘要
特征描述及Python代码
总结

摘要

常规的声音识别方案中最关键的两个环节：特征构建和分类模型选择。这次我们学习传统的声音信号的时频域统计特征及其Python代码实现。

特征描述及Python代码

编程环境：Python3.6/3.5、 librosa 0.7
以下程序大部分利用公式重写代码验证过，若有问题，请提出，谢谢！！！

一、Spectral centroid and spectral spread

In digital signal processing, the spectral centroid (SC) and the spectral spread (SS) are measures for characterising the distribution of the frequency components of a signal. The spectral centroid is defined as the ”center of mass” of the spectrum and is computed as follows:
$\frac{\sum^{L_F}_{i=1}i\frac{F_s}{L_F}\lvert X(i)\rvert}{\sum^{L_F}_{i=1}\lvert X(i)\rvert},$
while the spectral spread is computed as the dispersion of the frequency components of the signal around the centroid:
$\sqrt{\frac{\sum^{L_F}_{i=1}\lbrack i\frac{F_s}{L_F}-SC\rbrack ^2 \lvert X(i)\rvert}{\sum^{L_F}_{i=1}\lvert X(i)\rvert}}$
where $L_F$ and $\lvert X(k) \rvert$ are the length and the module of the $F F T$ of the imput signal $x (n)$ , respectively.

# 代码如下，已经过验证
import librosa
import numpy as np

path = 'scream.wav'
y,sr = librosa.load(path,sr=44100)
frame = librosa.util.frame(y,frame_length=1024,hop_length=512)
S = np.abs(librosa.stft(y,n_fft=1024,hop_length=512,center=False))
sc = librosa.feature.spectral_centroid(y,sr=sr,n_fft=1024,hop_length=512,center=False)
#spectral spread
ss = np.zeros((1,frame.shape[1]))
for i in range(frame.shape[1]):
    ss[:,i] = np.sqrt((sum((fre-sc[:,i])**2*S[:,i]))/sum(S[:,i]))

二、Spectral rolloff

The spectral rolloff is a measure of the skewness of the spectrum and is defined as the frequency fro at which the $P\%$ of the spectral components of the signal is at lower frequency. In our case, we consider $P = 90$ and determine the value $f_{ro}$ from the following relation:
$\sum^{f_{ro}}_{i=1}\lvert X(i) \rvert = \frac{P}{100}\sum^{F_{max}}_{i=1}\lvert X(i) \rvert$

# 利用librosa来提取特征，程序未验证
import librosa
path = 'scream.wav'
y,sr = librosa.load(path,sr=44100)
rolloff = librosa.feature.spectral_rolloff(y=y,sr=sr,n_fft=1024,hop_length=512,center=False,roll_percent=0.9)

三、Spectral flu

The spectral flux (SF) indicates how quickly the spectral information of a signal is changing and it is computed by considering the squared-difference between the spectra of two consecutive audio frames, as reported in the following equation:
$\sum^{L_F}_{i=1}\lbrack X_n(i) - X_{n-1}(i) \rbrack^2$

import librosa
import numpy as np
path = 'scream.wav'
y,sr = librosa.load(path,sr=44100)
sf = np.zeros((1,S.shape[1]-1))
for i in range(S.shape[1]-1):
    sf[:,i] = sum((S[:,i+1]-S[:,i])**2)

四、Energy ratios in sub-bands

The energy ratios in sub-bands (ERSB) give a rough approximation of the energy distribution of the spectrum. We divided the spectrum of the signal into four sub-bands,which are reported as follow, and for each sub-band we computed the ratio between the energy contained in that subband and the overall energy of the audio frame.
$ERSB_n = \frac{\sum^{k_{n2}}_{i=k_{n1}} \lvert X(i)\rvert ^2}{\sum^{F_{max}}_{i=1} \lvert X(i)\rvert ^2},$
where
在这里插入图片描述

import librosa
import numpy as np
path = 'scream.wav'
y,sr = librosa.load(path,sr=44100)
ersb = np.zeros((4,S.shape[1]))
sub_band = [14,39,102,512]
for i in range(S.shape[1]):
    ersb[0,i] = sum(S[:sub_band[0],i])/sum(S[:,i])
    ersb[1,i] = sum(S[sub_band[0]:sub_band[1],i])/sum(S[:,i])
    ersb[2,i] = sum(S[sub_band[1]:sub_band[2],i])/sum(S[:,i])
    ersb[3,i] = sum(S[sub_band[2]:sub_band[3],i])/sum(S[:,i])

五、Volume and energy

We calculate the volume feature (V) as the root mean square (RMS) of the amplitude value of the samples in an audio frame:
$\sqrt {\frac{1}{L}\sum^{L}_{i=1}x(i)^2}$
while the energy (E) is the squared-sum of the amplitude value of the audio samples:
$\sum^{L}_{i=1}x(i)^2$

import librosa
import numpy as np
path = 'scream.wav'
y,sr = librosa.load(path,sr=44100)
frame = librosa.util.frame(y,frame_length=1024,hop_length=512)
# Volumns and energy
V = librosa.feature.rms(y,frame_length=1024,hop_length=512,center=False)
Energy = np.sum(frame**2,axis=0)

六、Zero crossing rate

The zero crossing rate (ZCR) is the rate of the sign-changes along a frame and is especially used to characterise percussive sounds and environmental noise. For a frame $x (i)$ of $L$ samples, the ZCR is computed as follows:
$\frac{1}{2L}\sum^L_{i=1}\lvert sgn(x(i+1)) - sgn(x(i))\rvert$

import librosa
path = 'scream.wav'
y,sr = librosa.load(path,sr=44100)
zcr = librosa.feature.zero_crossing_rate(y,frame_length=1024,hop_length=512,center=False)