使用librosa&SVM实现语言情感识别

任务:语言情感分类

音频处理库:librosa

libsora安装

Librosa官网提供了多种安装方法,详细如下:

最简单的方法就是进行pip安装,可以满足所有的依赖关系,命令如下:

pip install librosa

如果安装了Anaconda,可以通过conda命令安装:

conda install -c conda-forge librosa

数据集:casia

分类方法:sklearn-svm

vm是sklearn中一个关于支持向量机的包,比较常用,在使用过程中若是不熟悉各个参数的意义,总以默认参数进行机器学习,则不能做到最优化使用SVM,这就是一个较为遗憾的事情了。为了加深理解和方便调用,根据现有理解,结合官方文档,对其中的参数做一些记录,方便自己时常温习,也给阅读者进行一些粗浅的介绍,如果有理解错误的地方,希望阅读者能够指出。

class sklearn.svm.SVC(C=1.0, kernel=’rbf’, degree=3, gamma=’auto_deprecated’, coef0=0.0, shrinking=True, probability=False, tol=0.001, cache_size=200, class_weight=None, verbose=False, max_iter=-1, decision_function_shape=’ovr’, random_state=None)

C:误差项的惩罚参数

kernel:‘linear’, ‘poly’, ‘rbf’, ‘sigmoid’, ‘precomputed’ 或者自己指定。

degree:

gamma:

class_weight:

decision_function_shape : ‘ovo’, ‘ovr’, default=’ovr’

代码:

import librosa
import os
from random import shuffle
import numpy as np
from sklearn import svm
import sklearn


path = r'I:\CFL\cfl_python_speech_emotion\casia'
EMOTION_LABEL = {'angry': '1', 'fear': '2', 'happy': '3', 'neutral': '4', 'sad': '5', 'surprise': '6'}


def getData(mfcc_feature_num=16):
    wav_file_path = []
    person_dirs = os.listdir(path)
    for person in person_dirs:
        if person.endswith('.txt'):
            continue
        emotion_dir_path = os.path.join(path, person)
        emotion_dirs = os.listdir(emotion_dir_path)
        for emotion_dir in emotion_dirs:
            if emotion_dir.endswith('ini'):
                continue
            emotion_file_path = os.path.join(emotion_dir_path, emotion_dir)
            emotion_files = os.listdir(emotion_file_path)
            for file in emotion_files:
                if not file.endswith('wav'):
                    continue
                wav_path = os.path.join(emotion_file_path, file)
                wav_file_path.append(wav_path)
    shuffle(wav_file_path)#将语音文件随机排列
    data_feature = []
    data_labels = []

    for wav_file in wav_file_path:
        y, sr = librosa.load(wav_file)

        mfcc_feature = librosa.feature.mfcc(y, sr, n_mfcc=16)
        zcr_feature = librosa.feature.zero_crossing_rate(y)
        energy_feature = librosa.feature.rmse(y)
        rms_feature=librosa.feature.rmse(y)

        mfcc_feature = mfcc_feature.T.flatten()[:mfcc_feature_num]
        zcr_feature = zcr_feature.flatten()
        energy_feature = energy_feature.flatten()
        rms_feature=rms_feature.flatten()

        zcr_feature = np.array([np.mean(zcr_feature)])
        energy_feature = np.array([np.mean(energy_feature)])
        rms_feature=np.array([np.mean(rms_feature)])

        data_feature.append(np.concatenate((mfcc_feature, zcr_feature, energy_feature,rms_feature)))
        data_labels.append(int(EMOTION_LABEL[wav_file.split('\\')[-2]]))
    return np.array(data_feature), np.array(data_labels)


def test():
    best_acc = 0
    best_mfcc_feature_num = 0
    for i in range(100, 200):
        data_feature, data_labels = getData(i)
        gamma = gamma / 10
        split_num = 1100
        train_data = data_feature[:split_num, :]
        train_label = data_labels[:split_num]
        test_data = data_feature[split_num:, :]
        test_label = data_labels[split_num:]
        clf = svm.SVC(decision_function_shape='ovo', kernel='rbf', C=19, gamma=0.0001)
        clf.fit(train_data, train_label)
        acc = sklearn.metrics.accuracy_score(clf.predict(test_data), test_label)
        print(split_num, gamma, 'acc ', acc)
        if acc > best_acc:
            best_acc = acc
            best_mfcc_feature_num = i
            print()
            print('best_acc', best_acc)
            print('best_mfcc_feature_num', best_mfcc_feature_num)
            print()
    print('best_acc', best_acc)
    print('best_mfcc_feature_num', best_mfcc_feature_num)
    print()


best_acc = 0
best_mfcc_feature_num = 0
best_C = 0
for C in range(1, 20):
    for i in range(10, 50):
        data_feature, data_labels = getData(i)
        split_num = 500
        train_data = data_feature[:split_num, :]
        train_label = data_labels[:split_num]
        test_data = data_feature[split_num:, :]
        test_label = data_labels[split_num:]
        clf = svm.SVC(decision_function_shape='ovo', kernel='rbf', C=C, gamma=0.0001)
        clf.fit(train_data, train_label)
        print('Train Over')
        print(C, i)
        acc_dict = {}
        for test_x, test_y in zip(test_data, test_label):
            pre = clf.predict([test_x])[0]
            if pre in acc_dict.keys():
                continue
            acc_dict[pre] = test_y
        acc = sklearn.metrics.accuracy_score(clf.predict(test_data), test_label)
        if acc > best_acc:
            best_acc = acc
            best_C = C
            best_mfcc_feature_num = i
            print('best_acc', best_acc)
            print('best_C', best_C)
            print('best_mfcc_feature_num', best_mfcc_feature_num)
            print()


print('best_acc', best_acc)
print('best_C', best_C)
print('best_mfcc_feature_num', best_mfcc_feature_num)

提取到的特征:

输入:

组成:

特在提取过程理解。

        mfcc_feature = librosa.feature.mfcc(y, sr, n_mfcc=16)
        zcr_feature = librosa.feature.zero_crossing_rate(y)
        energy_feature = librosa.feature.rmse(y)
        rms_feature=librosa.feature.rmse(y)

        mfcc_feature = mfcc_feature.T.flatten()[:mfcc_feature_num]
        zcr_feature = zcr_feature.flatten()
        energy_feature = energy_feature.flatten()
        rms_feature=rms_feature.flatten()

        zcr_feature = np.array([np.mean(zcr_feature)])
        energy_feature = np.array([np.mean(energy_feature)])
        rms_feature=np.array([np.mean(rms_feature)])

        data_feature.append(np.concatenate((mfcc_feature, zcr_feature, energy_feature,rms_feature)))
        #标签获取有错
        #data_labels.append(int(EMOTION_LABEL[wav_file.split('\\')[-2]]))
        data_labels.append(int(EMOTION_LABEL[wav_file.split('/')[-2]]))
    #print (data_feature )
    #print (data_labels)   
    return np.array(data_feature), np.array(data_labels)
getData()

 

 

  • 1
    点赞
  • 18
    收藏
    觉得还不错? 一键收藏
  • 4
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 4
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值