python声音分类_Python音频信号分类MFCC特征神经网络

本文探讨了使用MFCC特征和神经网络进行语音情感分类的问题,作者遇到训练效果不佳的情况,主要表现为对某些类别的预测始终错误。文章详细介绍了数据集划分、特征提取、网络结构和训练过程,并提供了示例代码。尽管尝试了不同网络结构和学习率,但错误率仍然较高,尤其是在某些特定类别上。
摘要由CSDN通过智能技术生成

我正在尝试将语音信号从语音到情感进行分类.为此,我要提取音频信号的MFCC特征并将其馈入一个简单的神经网络(由PyBrain的BackpropTrainer训练的FeedForwardNetwork).不幸的是,结果非常糟糕.因此,从5个班级中,网络似乎几乎总是提出相同的班级.

我有5种情感类别和大约7000个带有标签的音频文件,我将其划分为每种情感类别的80%用于训练网络,而20%用于测试网络.

想法是使用小窗口并从中提取MFCC功能,以生成许多训练示例.在评估中,将评估来自一个文件的所有窗口,并由多数投票决定预测标签.

Training examples per class:

{0: 81310, 1: 60809, 2: 58262, 3: 105907, 4: 73182}

Example of scaled MFCC features:

[ -6.03465056e-01 8.28665733e-01 -7.25728303e-01 2.88611116e-05

1.18677218e-02 -1.65316583e-01 5.67322809e-01 -4.92335095e-01

3.29816126e-01 -2.52946780e-01 -2.26147779e-01 5.27210979e-01

-7.36851560e-01]

Layers________________________: 13 20 5 (also tried 13 50 5 and 13 100 5)

Learning Rate_________________: 0.01 (also tried 0.1 and 0.3)

Training epochs_______________: 10 (error rate does not improve at all during training)

Truth table on test set:

[[ 0. 4. 0. 239. 99.]

[ 0. 41. 0. 157. 23.]

[ 0. 18. 0. 173. 18.]

[ 0. 12. 0. 299. 59.]

[ 0. 0. 0. 85. 132.]]

Success rate overall [%]: 34.7314201619

Success rate Class 0 [%]: 0.0

Success rate Class 1 [%]: 18.5520361991

Success rate Class 2 [%]: 0.0

Success rate Class 3 [%]: 80.8108108108

Success rate Class 4 [%]: 60.8294930876

好的,现在,您可以看到结果在各个类中的分布非常糟糕.永远不会预测0级和2级.我认为,这暗示了我的网络或更可能是我的数据存在问题.

我可以在这里发布很多代码,但是我认为在下图中显示我为获得MFCC功能所采取的所有步骤更加有意义.请注意,我使用整个信号时没有开窗只是为了说明.这样看起来还好吗? MFCC值非常大,不是应该小很多吗? (在将它们使用minmaxscaler将它们馈送到网络之前,我将它们按比例缩小,然后将所有数据传输到[-2,2],也尝试了[0,1])

这是我用于Melfilter库的代码,我在离散余弦变换之前直接应用此代码以提取MFCC特征(我从这里获得了它:stackoverflow):

def freqToMel(freq):

'''

Calculate the Mel frequency for a given frequency

'''

return 1127.01048 * math.log(1 + freq / 700.0)

def melToFreq(mel):

'''

Calculate the frequency for a given Mel frequency

'''

return 700 * (math.exp(freq / 1127.01048 - 1))

def melFilterBank(blockSize):

numBands = int(mfccFeatures)

maxMel = int(freqToMel(maxHz))

minMel = int(freqToMel(minHz))

# Create a matrix for triangular filters, one row per filter

filterMatrix = numpy.zeros((numBands, blockSize))

melRange = numpy.array(xrange(numBands + 2))

melCenterFilters = melRange * (maxMel - minMel) / (numBands + 1) + minMel

# each array index represent the center of each triangular filter

aux = numpy.log(1 + 1000.0 / 700.0) / 1000.0

aux = (numpy.exp(melCenterFilters * aux) - 1) / 22050

aux = 0.5 + 700 * blockSize * aux

aux = numpy.floor(aux) # Arredonda pra baixo

centerIndex = numpy.array(aux, int) # Get int values

for i in xrange(numBands):

start, centre, end = centerIndex[i:i + 3]

k1 = numpy.float32(centre - start)

k2 = numpy.float32(end - centre)

up = (numpy.array(xrange(start, centre)) - start) / k1

down = (end - numpy.array(xrange(centre, end))) / k2

filterMatrix[i][start:centre] = up

filterMatrix[i][centre:end] = down

return filterMatrix.transpose()

我该怎么做才能获得更好的预测结果?

解决方法:

在这里,我举了一个通过语音识别性别的例子.在此示例中,我使用了Hyke-dataset1.这只是一个快速实例.如果一个人想认真地进行性别认同,那么可能会做得更好.但总的来说,错误率会降低:

Build up data...

Train network...

Number of training patterns: 94956

Number of test patterns: 31651

Input and output dimensions: 13 2

Train network...

epoch: 0 train error: 62.24% test error: 61.84%

epoch: 1 train error: 34.11% test error: 34.25%

epoch: 2 train error: 31.11% test error: 31.20%

epoch: 3 train error: 30.34% test error: 30.22%

epoch: 4 train error: 30.76% test error: 30.75%

epoch: 5 train error: 30.65% test error: 30.72%

epoch: 6 train error: 30.81% test error: 30.79%

epoch: 7 train error: 29.38% test error: 29.45%

epoch: 8 train error: 31.92% test error: 31.92%

epoch: 9 train error: 29.14% test error: 29.23%

我使用了scikits.talkbox的MFCC实现.也许下面的代码对您有所帮助. (性别识别无疑比情感检测要容易得多.也许您需要更多不同的功能.)

import glob

from scipy.io.wavfile import read as wavread

from scikits.talkbox.features import mfcc

from pybrain.datasets import ClassificationDataSet

from pybrain.utilities import percentError

from pybrain.tools.shortcuts import buildNetwork

from pybrain.supervised.trainers import BackpropTrainer

from pybrain.structure.modules import SoftmaxLayer

def report_error(trainer, trndata, tstdata):

trnresult = percentError(trainer.testOnClassData(), trndata['class'])

tstresult = percentError(trainer.testOnClassData(dataset=tstdata), tstdata['class'])

print "epoch: %4d" % trainer.totalepochs, " train error: %5.2f%%" % trnresult, " test error: %5.2f%%" % tstresult

def main(auido_path, coeffs=13):

dataset = ClassificationDataSet(coeffs, 1, nb_classes=2, class_labels=['male', 'female'])

male_files = glob.glob("%s/male_audio/*/*_1.wav" % auido_path)

female_files = glob.glob("%s/female_audio/*/*_1.wav" % auido_path)

print "Build up data..."

for sex, files in enumerate([male_files, female_files]):

for f in files:

sr, signal = wavread(f)

ceps, mspec, spec = mfcc(signal, nwin=2048, nfft=2048, fs=sr, nceps=coeffs)

for i in range(ceps.shape[0]):

dataset.appendLinked(ceps[i], [sex])

tstdata, trndata = dataset.splitWithProportion(0.25)

trndata._convertToOneOfMany()

tstdata._convertToOneOfMany()

print "Number of training patterns: ", len(trndata)

print "Number of test patterns: ", len(tstdata)

print "Input and output dimensions: ", trndata.indim, trndata.outdim

print "Train network..."

fnn = buildNetwork(coeffs, int(coeffs*1.5), 2, outclass=SoftmaxLayer, fast=True)

trainer = BackpropTrainer(fnn, dataset=trndata, learningrate=0.005)

report_error(trainer, trndata, tstdata)

for i in range(100):

trainer.trainEpochs(1)

report_error(trainer, trndata, tstdata)

if __name__ == '__main__':

main("/path/to/hyke/audio_data")

1 Azarias Reda,Saurabh Panjwani和Edward Cutrell:Hyke:面向发展中地区的低成本远程出勤跟踪系统,第五届ACM发展中地区网络系统研讨会(NSDR).

标签:classification,neural-network,mfcc,audio,python

来源: https://codeday.me/bug/20191119/2038935.html

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值