python绘制频谱图,在音频分析中绘制频谱图

I am working on speech recognition using neural network. To do so I need to get the spectrograms of those training audio files (.wav) . How to get those spectrograms in python ?

解决方案

There are numerous ways to do so. The easiest is to check out the methods proposed in Kernels on Kaggle competition TensorFlow Speech Recognition Challenge (just sort by most voted). This one is particularly clear and simple and contains the following function. The input is a numeric vector of samples extracted from the wav file, the sample rate, the size of the frame in milliseconds, the step (stride or skip) size in milliseconds and a small offset.

from scipy.io import wavfile

from scipy import signal

import numpy as np

sample_rate, audio = wavfile.read(path_to_wav_file)

def log_specgram(audio, sample_rate, window_size=20,

step_size=10, eps=1e-10):

nperseg = int(round(window_size * sample_rate / 1e3))

noverlap = int(round(step_size * sample_rate / 1e3))

freqs, times, spec = signal.spectrogram(audio,

fs=sample_rate,

window='hann',

nperseg=nperseg,

noverlap=noverlap,

detrend=False)

return freqs, times, np.log(spec.T.astype(np.float32) + eps)

Outputs are defined in the SciPy manual, with an exception that the spectrogram is rescaled with a monotonic function (Log()), which depresses larger values much more than smaller values, while leaving the larger values still larger than the smaller values. This way no extreme value in spec will dominate the computation. Alternatively, one can cap the values at some quantile, but log (or even square root) are preferred. There are many other ways to normalize the heights of the spectrogram, i.e. to prevent extreme values from "bullying" the output :)

freq (f) : ndarray, Array of sample frequencies.

times (t) : ndarray, Array of segment times.

spec (Sxx) : ndarray, Spectrogram of x. By default, the last axis of Sxx corresponds to the segment times.

Alternatively, you can check the train.py and models.py code on github repo from the Tensorflow example on audio recognition.

Here is another thread that explains and gives code on building spectrograms in Python.

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值