1 振幅与dB的转换关系(链接)
dB = 20*log10(x) = 10*log10(Y)
x: 表示振幅
Y: 表示能量
Y = x*x = x^2
2 提取Mel谱的方法(链接)
- 做STFT得到复数矩阵
- 将幅度转成分贝(db) 问:如果这里在mel spectrogram里面就把幅度转成db了,
那Log_mel_spectrogram跟它又有什么区别呢? - 将频率转到mel-scale
def log_mel_spectrogram(audio: Union[str, np.ndarray, torch.Tensor], n_mels: int = N_MELS):
"""
Compute the log-Mel spectrogram of
Parameters
----------
audio: Union[str, np.ndarray, torch.Tensor], shape = (*)
The path to audio or either a NumPy array or Tensor containing the audio waveform in 16 kHz
n_mels: int
The number of Mel-frequency filters, only 80 is supported
Returns
-------
torch.Tensor, shape = (80, n_frames)
A Tensor that contains the Mel spectrogram
"""
if not torch.is_tensor(audio):
if isinstance(audio, str):
audio = load_audio(audio)
audio = torch.from_numpy(audio)
window = torch.hann_window(N_FFT).to(audio.device)
# 提取Mel谱的方法:
# 1 做STFT(短时傅里叶变换)得到复数矩阵
# 2 将幅度转成分贝(db)
# 3 将频率转到mel-scale
stft = torch.stft(audio, N_FFT, HOP_LENGTH, window=window, return_complex=True)
# 平方
magnitudes = stft[:, :-1].abs() ** 2
# 梅尔滤波 在1000HZ以下为线性尺度,1K HZ以上为对数尺度,使得人耳对低频信号敏感,高频信号不敏感
filters = mel_filters(audio.device, n_mels)
# @是矩阵相乘的意思
mel_spec = filters @ magnitudes
# 20*log10(x)
log_spec = torch.clamp(mel_spec, min=1e-10).log10()
log_spec = torch.maximum(log_spec, log_spec.max() - 8.0)
log_spec = (log_spec + 4.0) / 4.0
return log_spec
3 pyaudio库安装(mac)fatal error: ‘portaudio.h‘ file not found(链接)
pip install --global-option='build_ext' --global-option='-I/opt/homebrew/Cellar/portaudio/19.7.0/include' --global-option='-L/opt/homebrew/Cellar/portaudio/19.7.0/lib' pyaudio
4 kenlm安装
git clone https://github.com/kpu/kenlm.git
brew install cmake boost eigen
mkdir -p build
cd build
cmake ..
make -j 4
4 运行python提示no module named sklearn的解决方法
pip install -U scikit-learn