在声音分类方面,我个人更喜欢光谱图作为神经网络的输入。这样,原始音频数据将转换为图像表示,您可以将其视为基本图像分类任务。
有多种方法可供选择,以下是我通常使用scipy做的事情,python_speech_features和pydub:
import numpy as np
import scipy.io.wavfile as wave
import python_speech_features as psf
from pydub import AudioSegment
#your sound file
filepath = 'my-sound.wav'
def convert(path):
#open file (supports all ffmpeg supported filetypes)
audio = AudioSegment.from_file(path, path.split('.')[-1].lower())
#set to mono
audio = audio.set_channels(1)
#set to 44.1 KHz
audio = audio.set_frame_rate(44100)
#save as wav
audio.export(path, format="wav")
def getSpectrogram(path, winlen=0.025, winstep=0.01, NFFT=512):
#open wav file
(rate,sig) = wave.read(path)
#get frames</