基于深度学习的的语音情感识别(Speech Emotion Recognition)

1.前言

言语是表达我们作为人类最自然的方式。因此,将这种通信媒介扩展到计算机应用程序是很自然的。我们将语音情感识别 (SER) 系统定义为一组方法,用于处理和分类语音信号以检测嵌入的情绪。SER 并不是一个新领域,它已经存在了二十多年,并且由于最近的进步而重新受到关注。这些新颖的研究利用了计算和技术所有领域的进步,因此有必要更新使 SER 成为可能的当前方法和技术。

2.数据集

这里有 4 个最受欢迎的英文数据集:Crema、Ravdess、Savee 和 Tess。它们中的每一个都包含 .wav 格式的音频,并带有一些主要标签。

Ravdess:

Here is the filename identifiers as per the official RAVDESS website:

  • Modality (01 = full-AV, 02 = video-only, 03 = audio-only).
  • Vocal channel (01 = speech, 02 = song).
  • Emotion (01 = neutral, 02 = calm, 03 = happy, 04 = sad, 05 = angry, 06 = fearful, 07 = disgust, 08 = surprised).
  • Emotional intensity (01 = normal, 02 = strong). NOTE: There is no strong intensity for the 'neutral' emotion.
  • Statement (01 = "Kids are talking by the door", 02 = "Dogs are sitting by the door").
  • Repetition (01 = 1st repetition, 02 = 2nd repetition).
  • Actor (01 to 24. Odd numbered actors are male, even numbered actors are female).

So, here's an example of an audio filename. 02-01-06-01-02-01-12.wav This means the meta data for the audio file is:

  • Video-only (02)
  • Speech (01)
  • Fearful (06)
  • Normal intensity (01)
  • Statement "dogs" (02)
  • 1st Repetition (01)
  • 12th Actor (12) - Female (as the actor ID number is even)

Crema:

The third component is responsible for the emotion label:

  • SAD - sadness;
  • ANG - angry;
  • DIS - disgust;
  • FEA - fear;
  • HAP - happy;
  • NEU - neutral.

Tess:

Very similar to Crema - label of emotion is contained in the name of file.

Savee:

The audio files in this dataset are named in such a way that the prefix letters describes the emotion classes as follows:

  • 'a' = 'anger'
  • 'd' = 'disgust'
  • 'f' = 'fear'
  • 'h' = 'happiness'
  • 'n' = 'neutral'
  • 'sa' = 'sadness'
  • 'su' = 'surprise'

关于此目录

add_comment添加建议

这是 Crema 数据集。

SAD - 悲伤;
ANG - 生气;
DIS - 厌恶;
FEA - 恐惧;
HAP - 快乐;
NEU - 中性。

        语音情感识别使用一维卷积神经网络¶  
在这个实验中,我尝试识别短语音消息中的情感(< 3秒)。我将使用4个数据集,这些数据集包含一些由专业演员配音的英文短语:Ravee、Crema、Savee 和 Tess。

        首先,让我们定义一下SER,即语音情感识别。  
语音情感识别(SER)是指尝试从语音中识别人的情感和情感状态。这一过程利用了声音往往通过音调和音高反映潜在情感的事实。类似的现象也存在于动物中,比如狗和马,它们利用这种机制来理解人的情感。

本项目使用的数据集包含约7种主要情感:快乐、恐惧、愤怒、厌恶、惊讶、悲伤或中性。

Importing libraries

import os
import re

import librosa
import librosa.display
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import seaborn as sns
from IPython.display import Audio
# from entropy import spectral_entropy
from keras import layers
from keras import models
from keras.utils import np_utils
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder, StandardScaler
from tensorflow.python.keras.callbacks import EarlyStopping, ReduceLROnPlateau
import keras
import itertools

 第2步:

# Paths to
Ravdess = "../input/speech-emotion-recognition-en/Ravdess/audio_speech_actors_01-24"
Crema = "../input/speech-emotion-recognition-en/Crema"
Savee = "../input/speech-emotion-recognition-en/Savee"
Tess = "../input/speech-emotion-recognition-en/Tess"

数据准备¶  
Ravdess 数据集  
以下是根据官方 RAVDESS 网站的文件名标识符:

- 模态(01 = 完整音视频,02 = 仅视频,03 = 仅音频)。
- 声音频道(01 = 语音,02 = 歌曲)。
- 情感(01 = 中性,02 = 平静,03 = 快乐,04 = 悲伤,05 = 愤怒,06 = 恐惧,07 = 厌恶,08 = 惊讶)。
- 情感强度(01 = 正常,02 = 强烈)。注意:'中性'情感没有强烈强度。
- 语句(01 = "孩子们在门边说话",02 = "狗坐在门边")。
- 重复(01 = 第1次重复,02 = 第2次重复)。
- 演员(01到24。奇数编号的演员为男性,偶数编号的演员为女性)。

例如,一个音频文件名为 02-01-06-01-02-01-12.mp4,这意味着音频文件的元数据为:

- 仅视频(02)
- 语音(01)
- 恐惧(06)
- 正常强度(01)
- 语句 "狗"(02)
- 第1次重复(01)
- 第12号演员(12)- 女性(因为演员ID是偶数)

 第3步: 

ravdess_directory_list = os.listdir(Ravdess)

emotion_df = []

for dir in ravdess_directory_list:
    actor = os.listdir(os.path.join(Ravdess, dir))

    for wav in actor:
        info = wav.partition(".wav")[0].split("-")
        emotion = int(info[2])
        emotion_df.append((emotion, os.path.join(Ravdess, dir, wav)))
Ravdess_df = pd.DataFrame.from_dict(emotion_df)
Ravdess_df.rename(columns={1 : "Path", 0 : "Emotion"}, inplace=True)
Ravdess_df.head()

Ravdess_df.Emotion.replace({1:'neutral', 2:'neutral', 3:'happy', 4:'sad', 5:'angry', 6:'fear', 7:'disgust', 8:'surprise'}, inplace=True)
Ravdess_df.head()

 第4步:Crema dataset

emotion_df = []

for wav in os.listdir(Crema):
    info = wav.partition(".wav")[0].split("_")
    if info[2] == 'SAD':
        emotion_df.append(("sad", Crema + "/" + wav))
    elif info[2] == 'ANG':
        emotion_df.append(("angry", Crema + "/" + wav))
    elif info[2] == 'DIS':
        emotion_df.append(("disgust", Crema + "/" + wav))
    elif info[2] == 'FEA':
        emotion_df.append(("fear", Crema + "/" + wav))
    elif info[2] == 'HAP':
        emotion_df.append(("happy", Crema + "/" + wav))
    elif info[2] == 'NEU':
        emotion_df.append(("neutral", Crema + "/" + wav))
    else:
        emotion_df.append(("unknown", Crema + "/" + wav))


Crema_df = pd.DataFrame.from_dict(emotion_df)
Crema_df.rename(columns={1 : "Path", 0 : "Emotion"}, inplace=True)

Crema_df.head()

 读取指定目录下的音频文件,根据文件名中的情感标签将文件路径和情感类别存入列表,并最终将其转换为数据框,方便后续分析和处理。

 第5步:TESS dataset

tess_directory_list = os.listdir(Tess)

emotion_df = []

for dir in tess_directory_list:
    for wav in os.listdir(os.path.join(Tess, dir)):
        info = wav.partition(".wav")[0].split("_")
        emo = info[2]
        if emo == "ps":
            emotion_df.append(("surprise", os.path.join(Tess, dir, wav)))
        else:
            emotion_df.append((emo, os.path.join(Tess, dir, wav)))


Tess_df = pd.DataFrame.from_dict(emotion_df)
Tess_df.rename(columns={1 : "Path", 0 : "Emotion"}, inplace=True)

Tess_df.head()

Savee dataset

The audio files in this dataset are named in such a way that the prefix letters describes the emotion classes as follows:

  • 'a' = 'anger'
  • 'd' = 'disgust'
  • 'f' = 'fear'
  • 'h' = 'happiness'
  • 'n' = 'neutral'
  • 'sa' = 'sadness'
  • 'su' = 'surprise'
savee_directiory_list = os.listdir(Savee)

emotion_df = []

for wav in savee_directiory_list:
    info = wav.partition(".wav")[0].split("_")[1].replace(r"[0-9]", "")
    emotion = re.split(r"[0-9]", info)[0]
    if emotion=='a':
        emotion_df.append(("angry", Savee + "/" + wav))
    elif emotion=='d':
        emotion_df.append(("disgust", Savee + "/" + wav))
    elif emotion=='f':
        emotion_df.append(("fear", Savee + "/" + wav))
    elif emotion=='h':
        emotion_df.append(("happy", Savee + "/" + wav))
    elif emotion=='n':
        emotion_df.append(("neutral", Savee + "/" + wav))
    elif emotion=='sa':
        emotion_df.append(("sad", Savee + "/" + wav))
    else:
        emotion_df.append(("surprise", Savee + "/" + wav))


Savee_df = pd.DataFrame.from_dict(emotion_df)
Savee_df.rename(columns={1 : "Path", 0 : "Emotion"}, inplace=True)

Savee_df.head()

ravdess_final_data = augmented_data.iloc[0:5760,]
crema_final_data = augmented_data.iloc[5760:35528,]
tess_final_data = augmented_data.iloc[35528:46728,]
savee_final_data = augmented_data.iloc[46728:,]
%matplotlib inline
plt.style.use("ggplot")
plt.title("Count of emotions:")
sns.countplot(x=df["Emotion"])
sns.despine(top=True, right=True, left=False, bottom=False)

 

def create_waveplot(data, sr, e):
    plt.figure(figsize=(10, 3))
    plt.title(f'Waveplot for audio with {e} emotion', size=15)
    librosa.display.waveplot(data, sr=sr)
    plt.show()

def create_spectrogram(data, sr, e):
    # stft function converts the data into short term fourier transform
    X = librosa.stft(data)
    Xdb = librosa.amplitude_to_db(abs(X))
    plt.figure(figsize=(12, 3))
    plt.title('Spectrogram for audio with {} emotion'.format(e), size=15)
    librosa.display.specshow(Xdb, sr=sr, x_axis='time', y_axis='hz')
    #librosa.display.specshow(Xdb, sr=sr, x_axis='time', y_axis='log')
    plt.colorbar()
emotion='fear'
path = np.array(df.Path[df.Emotion==emotion])[1]
data, sampling_rate = librosa.load(path)
create_waveplot(data, sampling_rate, emotion)
create_spectrogram(data, sampling_rate, emotion)
Audio(path)

emotion='angry'
path = np.array(df.Path[df.Emotion==emotion])[1]
data, sampling_rate = librosa.load(path)
create_waveplot(data, sampling_rate, emotion)
create_spectrogram(data, sampling_rate, emotion)
Audio(path)

 

emotion='sad'
path = np.array(df.Path[df.Emotion==emotion])[1]
data, sampling_rate = librosa.load(path)
create_waveplot(data, sampling_rate, emotion)
create_spectrogram(data, sampling_rate, emotion)
Audio(path)

 

 

 第6步:Data augmentation

We have some ways for data augmentation in sound data:

  1. Noise injection
  2. Stretching
  3. Shifting
  4. Pitching
def noise(data, random=False, rate=0.035, threshold=0.075):
    """Add some noise to sound sample. Use random if you want to add random noise with some threshold.
    Or use rate Random=False and rate for always adding fixed noise."""
    if random:
        rate = np.random.random() * threshold
    noise_amp = rate*np.random.uniform()*np.amax(data)
    data = data + noise_amp*np.random.normal(0,1,size=data.size)
    return data

def stretch(data, rate=0.8):
    """Stretching data with some rate."""
    return librosa.effects.time_stretch(data, rate)

def shift(data, rate=1000):
    """Shifting data with some rate"""
    shift_range = int(np.random.uniform(low=-5, high = 5)*rate)
    return np.roll(data, shift_range)

def pitch(data, sampling_rate, pitch_factor=0.7, random=False):
    """"Add some pitch to sound sample. Use random if you want to add random pitch with some threshold.
    Or use pitch_factor Random=False and rate for always adding fixed pitch."""
    if random:
        pitch_factor=np.random.random() * pitch_factor
    return librosa.effects.pitch_shift(data, sampling_rate, pitch_factor)
df.head()

 第7步:

path = df[df["Emotion"] == "happy"]["Path"].iloc[0]
data, sampling_rate = librosa.load(path)

Adding white noise

white_noised_data = add_white_noise(data, 0.1)
plt.figure(figsize=(14,4))
librosa.display.waveplot(y=white_noised_data, sr=sampling_rate)
Audio(white_noised_data, rate=sampling_rate)

plt.figure(figsize=(14,4))
librosa.display.waveplot(data, sampling_rate)
Audio(path)

 

我们的数据增强,将使用噪声注入、音高调整以及这两者的组合。 

 第8步:特征提取

特征提取¶

以下是一些可能有用的特征:

1. **零交叉率** (Zero Crossing Rate)  
   信号在特定帧内的符号变化率。

2. **能量** (Energy)  
   信号值的平方和,按相应的帧长度进行归一化。

3. **能量熵** (Entropy of Energy)  
   子帧归一化能量的熵,可作为突变的度量。

4. **谱质心** (Spectral Centroid)  
   频谱的重心位置。

5. **谱宽度** (Spectral Spread)  
   频谱的二阶中心矩。

6. **谱熵** (Spectral Entropy)  
   一组子帧归一化谱能量的熵。

7. **谱流量** (Spectral Flux)  
   两个连续帧的谱幅度的平方差。

8. **谱滚降** (Spectral Rolloff)  
   频谱中90%幅度分布集中在该频率以下。

9. **梅尔频率倒谱系数** (MFCCs)  
   梅尔频率倒谱系数形成的倒谱表示,其中频带不是线性的,而是根据梅尔尺度分布。

n_fft = 2048
hop_length = 512
def chunks(data, frame_length, hop_length):
    for i in range(0, len(data), hop_length):
        yield data[i:i+frame_length]

# Zero Crossing Rate
def zcr(data, frame_length=2048, hop_length=512):
    zcr = librosa.feature.zero_crossing_rate(y=data, frame_length=frame_length, hop_length=hop_length)
    return np.squeeze(zcr)


def energy(data, frame_length=2048, hop_length=512):
    en = np.array([np.sum(np.power(np.abs(data[hop:hop+frame_length]), 2)) for hop in range(0, data.shape[0], hop_length)])
    return en / frame_length


def rmse(data, frame_length=2048, hop_length=512):
    rmse = librosa.feature.rms(y=data, frame_length=frame_length, hop_length=hop_length)
    return np.squeeze(rmse)


def entropy_of_energy(data, frame_length=2048, hop_length=512):
    energies = energy(data, frame_length, hop_length)
    energies /= np.sum(energies)

    entropy = 0.0
    entropy -= energies * np.log2(energies)
    return entropy


def spc(data, sr, frame_length=2048, hop_length=512):
    spectral_centroid = librosa.feature.spectral_centroid(y=data, sr=sr, n_fft=frame_length, hop_length=hop_length)
    return np.squeeze(spectral_centroid)


# def spc_entropy(data, sr):
#     spc_en = spectral_entropy(data, sf=sr, method="fft")
#     return spc_en

def spc_flux(data):
    isSpectrum = data.ndim == 1
    if isSpectrum:
        data = np.expand_dims(data, axis=1)

    X = np.c_[data[:, 0], data]
    af_Delta_X = np.diff(X, 1, axis=1)
    vsf = np.sqrt((np.power(af_Delta_X, 2).sum(axis=0))) / X.shape[0]

    return np.squeeze(vsf) if isSpectrum else vsf


def spc_rollof(data, sr, frame_length=2048, hop_length=512):
    spcrollof = librosa.feature.spectral_rolloff(y=data, sr=sr, n_fft=frame_length, hop_length=hop_length)
    return np.squeeze(spcrollof)


def chroma_stft(data, sr, frame_length=2048, hop_length=512, flatten: bool = True):
    stft = np.abs(librosa.stft(data))
    chroma_stft = librosa.feature.chroma_stft(S=stft, sr=sr)
    return np.squeeze(chroma_stft.T) if not flatten else np.ravel(chroma_stft.T)


def mel_spc(data, sr, frame_length=2048, hop_length=512, flatten: bool = True):
    mel = librosa.feature.melspectrogram(y=data, sr=sr)
    return np.squeeze(mel.T) if not flatten else np.ravel(mel.T)

def mfcc(data, sr, frame_length=2048, hop_length=512, flatten: bool = True):
    mfcc_feature = librosa.feature.mfcc(y=data, sr=sr)
    return np.squeeze(mfcc_feature.T) if not flatten else np.ravel(mfcc_feature.T)

print("ZCR: ", zcr(data).shape)
print("Energy: ", energy(data).shape)
print("Entropy of Energy :", entropy_of_energy(data).shape)
print("RMS :", rmse(data).shape)
print("Spectral Centroid :", spc(data, sampling_rate).shape)
# print("Spectral Entropy: ", spc_entropy(data, sampling_rate).shape)
print("Spectral Flux: ", spc_flux(data).shape)
print("Spectral Rollof: ", spc_rollof(data, sampling_rate).shape)
print("Chroma STFT: ", chroma_stft(data, sampling_rate).shape)
print("MelSpectrogram: ", mel_spc(data, sampling_rate).shape)
print("MFCC: ", mfcc(data, sampling_rate).shape)

实验中决定仅使用三个主要特征来完成此任务:零交叉率(ZCR)、均方根(RMS)和梅尔频率倒谱系数(MFCC)。

此外,实验中决定仅使用2.5秒的时长和0.6秒的偏移量——在数据集中,前0.6秒不包含有关情感的信息,而且大多数样本长度小于3秒。

核心代码如下:

def extract_features(data, sr, frame_length=2048, hop_length=512):
    result = np.array([])
    result = np.hstack((result,
                        zcr(data, frame_length, hop_length),
                        # np.mean(energy(data, frame_length, hop_length),axis=0),
                        # np.mean(entropy_of_energy(data, frame_length, hop_length), axis=0),
                        rmse(data, frame_length, hop_length),
                        # spc(data, sr, frame_length, hop_length),
                        # spc_entropy(data, sr),
                        # spc_flux(data),
                        # spc_rollof(data, sr, frame_length, hop_length),
                        # chroma_stft(data, sr, frame_length, hop_length),
                        # mel_spc(data, sr, frame_length, hop_length, flatten=True)
                        mfcc(data, sr, frame_length, hop_length)
                                    ))
    return result
def get_features_with_augmentation(path, duration=2.5, offset=0.6):
    # duration and offset are used to take care of the no audio in start and the ending of each audio files as seen above.
    data, sample_rate = librosa.load(path, duration=duration, offset=offset)

     # without augmentation
    res1 = extract_features(data, sample_rate)
    result = np.array(res1)

    # data with noise
    noise_data = noise(data, random=True)
    res2 = extract_features(noise_data, sample_rate)
    result = np.vstack((result, res2)) # stacking vertically

    # data with pitching
    pitched_data = pitch(data, sample_rate, random=True)
    res3 = extract_features(pitched_data, sample_rate)
    result = np.vstack((result, res3)) # stacking vertically

    # data with pitching and white_noise
    new_data = pitch(data, sample_rate, random=True)
    data_noise_pitch = noise(new_data, random=True)
    res3 = extract_features(data_noise_pitch, sample_rate)
    result = np.vstack((result, res3)) # stacking vertically

    return result

def get_features_without_augmentation(path, duration=2.5, offset=0.6):
    # duration and offset are used to take care of the no audio in start and the ending of each audio files as seen above.
    data, sample_rate = librosa.load(path, duration=duration, offset=offset)

     # without augmentation
    res1 = extract_features(data, sample_rate)
    
    
    return res1
X, Y = [], []
print("Feature processing...")
for path, emotion, ind in zip(df.Path, df.Emotion, range(df.Path.shape[0])):
    features = get_features_with_augmentation(path)
    if ind % 100 == 0:
        print(f"{ind} samples has been processed...")
    for ele in features:
        X.append(ele)
        # appending emotion 3 times as we have made 3 augmentation techniques on each audio file.
        Y.append(emotion)
print("Done.")

 

 第9步:

数据准备¶

目前我们已经提取了数据,接下来需要对数据进行归一化处理,并将数据拆分为训练集和测试集。

X = extracted_df.drop(labels="labels", axis=1)
Y = extracted_df["labels"]
X = x_train
Y = y_train
lb = LabelEncoder()
Y = np_utils.to_categorical(lb.fit_transform(Y))
print(lb.classes_)

 

 第10步:  Let's define our model:

earlystopping = EarlyStopping(monitor ="val_acc",
                              mode = 'auto', patience = 5,
                              restore_best_weights = True)
learning_rate_reduction = ReduceLROnPlateau(monitor='val_acc',
                                            patience=3,
                                            verbose=1,
                                            factor=0.5,
                                            min_lr=0.00001)
model = models.Sequential()
model.add(layers.Conv1D(512, kernel_size=5, strides=1,
                        padding="same", activation="relu",
                        input_shape=(X_train.shape[1], 1)))
model.add(layers.BatchNormalization())
model.add(layers.MaxPool1D(pool_size=5, strides=2, padding="same"))

model.add(layers.Conv1D(512, kernel_size=5, strides=1,
                        padding="same", activation="relu"))
model.add(layers.BatchNormalization())
model.add(layers.MaxPool1D(pool_size=5, strides=2, padding="same"))

model.add(layers.Conv1D(256, kernel_size=5, strides=1,
                        padding="same", activation="relu"))
model.add(layers.BatchNormalization())
model.add(layers.MaxPool1D(pool_size=5, strides=2, padding="same"))

model.add(layers.Conv1D(256, kernel_size=3, strides=1, padding='same', activation='relu'))
model.add(layers.BatchNormalization())
model.add(layers.MaxPooling1D(pool_size=5, strides = 2, padding = 'same'))

model.add(layers.Conv1D(128, kernel_size=3, strides=1, padding='same', activation='relu'))
model.add(layers.BatchNormalization())
model.add(layers.MaxPooling1D(pool_size=3, strides = 2, padding = 'same'))

model.add(layers.Flatten())
model.add(layers.Dense(512, activation='relu'))
model.add(layers.BatchNormalization())
model.add(layers.Dense(7, activation="softmax"))

model.compile(optimizer="rmsprop", loss="categorical_crossentropy", metrics=["acc",keras.metrics.Recall(),keras.metrics.Precision()])
model.summary()

EPOCHS = 50
batch_size = 64
history = model.fit(X_train, y_train, validation_data=(X_val, y_val),
                    epochs=EPOCHS, batch_size=batch_size,
                    callbacks=[earlystopping, learning_rate_reduction])

print("Accuracy of our model on test data : " , model.evaluate(X_test,y_test)[1]*100 , "%")

fig , ax = plt.subplots(1,2)
train_acc = history.history['acc']
train_loss = history.history['loss']
test_acc = history.history['val_acc']
test_loss = history.history['val_loss']

fig.set_size_inches(20,6)
ax[0].plot(train_loss, label = 'Training Loss')
ax[0].plot(test_loss , label = 'Testing Loss')
ax[0].set_title('Training & Testing Loss')
ax[0].legend()
ax[0].set_xlabel("Epochs")

ax[1].plot(train_acc, label = 'Training Accuracy')
ax[1].plot(test_acc , label = 'Testing Accuracy')
ax[1].set_title('Training & Testing Accuracy')
ax[1].legend()
ax[1].set_xlabel("Epochs")
plt.show()
77/77 [==============================] - 2s 23ms/step - loss: 3.4681 - acc: 0.5783 - recall_1: 0.5734 - precision_1: 0.5827
Accuracy of our model on test data :  57.82983899116516 %

第11步:CREMA ACCURACY

print("Accuracy of our model on test data : " , model.evaluate(X_test,y_test)[1]*100 , "%")

fig , ax = plt.subplots(1,2)
train_acc = history.history['acc']
train_loss = history.history['loss']
test_acc = history.history['val_acc']
test_loss = history.history['val_loss']

fig.set_size_inches(20,6)
ax[0].plot(train_loss, label = 'Training Loss')
ax[0].plot(test_loss , label = 'Testing Loss')
ax[0].set_title('Training & Testing Loss')
ax[0].legend()
ax[0].set_xlabel("Epochs")

ax[1].plot(train_acc, label = 'Training Accuracy')
ax[1].plot(test_acc , label = 'Testing Accuracy')
ax[1].set_title('Training & Testing Accuracy')
ax[1].legend()
ax[1].set_xlabel("Epochs")
plt.show()
187/187 [==============================] - 4s 23ms/step - loss: 0.2776 - acc: 0.9483 - f1_m: 0.9489 - recall_6: 0.9476 - precision_6: 0.9498
Accuracy of our model on test data :  94.82700824737549 %

 第12步:SAVEE ACCURACY

print("Accuracy of our model on test data : " , model.evaluate(X_test,y_test)[1]*100 , "%")

fig , ax = plt.subplots(1,2)
train_acc = history.history['acc']
train_loss = history.history['loss']
test_acc = history.history['val_acc']
test_loss = history.history['val_loss']

fig.set_size_inches(20,6)
ax[0].plot(train_loss, label = 'Training Loss')
ax[0].plot(test_loss , label = 'Testing Loss')
ax[0].set_title('Training & Testing Loss')
ax[0].legend()
ax[0].set_xlabel("Epochs")

ax[1].plot(train_acc, label = 'Training Accuracy')
ax[1].plot(test_acc , label = 'Testing Accuracy')
ax[1].set_title('Training & Testing Accuracy')
ax[1].legend()
ax[1].set_xlabel("Epochs")
plt.show()
12/12 [==============================] - 0s 23ms/step - loss: 0.3354 - acc: 0.9245 - f1_m: 0.9279 - recall_1: 0.9193 - precision_1: 0.9363
Accuracy of our model on test data :  92.44791865348816 %

第13步:TESS ACCURACY

print("Accuracy of our model on test data : " , model.evaluate(X_test,y_test)[1]*100 , "%")

fig , ax = plt.subplots(1,2)
train_acc = history.history['acc']
train_loss = history.history['loss']
test_acc = history.history['val_acc']
test_loss = history.history['val_loss']

fig.set_size_inches(20,6)
ax[0].plot(train_loss, label = 'Training Loss')
ax[0].plot(test_loss , label = 'Testing Loss')
ax[0].set_title('Training & Testing Loss')
ax[0].legend()
ax[0].set_xlabel("Epochs")

ax[1].plot(train_acc, label = 'Training Accuracy')
ax[1].plot(test_acc , label = 'Testing Accuracy')
ax[1].set_title('Training & Testing Accuracy')
ax[1].legend()
ax[1].set_xlabel("Epochs")
plt.show()
70/70 [==============================] - 2s 23ms/step - loss: 5.8600e-04 - acc: 1.0000 - f1_m: 1.0000 - recall_4: 1.0000 - precision_4: 1.0000
Accuracy of our model on test data :  100.0 %

第14步骤:CREMA CONFUSION MATRIX

cm_plot_labels = ['angry', 'disgust', 'fear', 'happy', 'neutral', 'sad', 'surprise']
plot_confusion_matrix(cm=cm, classes=cm_plot_labels, title='Confusion Matrix')
Confusion matrix, without normalization
[[953   9   7  13   4   0]
 [ 12 979   9  15  13  19]
 [ 11  10 930  11   6  22]
 [ 21  13  15 946   6   4]
 [  0   5  11  10 863  18]
 [  1   8   8   2  25 975]]

 

SAVEE CONFUSION MATRIX

TESS CONFUSION MATRIX

3.交流与联系

由于篇幅和时间有点,下次继续学习与更新(有事可以添加好友交流学习)

QQ微信

 

深度学习语音情感识别是一种利用深度学习算法来识别语音中的情感信息的技术。它可以通过对语音信号进行特征提取和模型训练,来自动识别出语音中所表达的情感状态。根据引用\[2\]的研究,通过利用面向LSTM-CTC时序深度学习语音情感识别模型和联结主义时间分类(CTC)方法自动对齐能力,可以将情感标签对齐到语音中的情感帧上。这种方法可以识别出多种情感特征,如高兴、悲伤、中性、生气、惊奇、害怕和恐惧。研究结果表明,与传统的语音情感识别方法相比,深度学习语音情感识别可以提高1.8%至2.3%的精度。 在深度学习语音情感识别中,数据集的选择和预处理也是非常重要的。根据引用\[3\]的推荐,一些常用的语音情绪识别数据集包括Emodb、SAVEE、EMOVO和BTNRH等。这些数据集包含了不同情绪的语音样本,可以用于训练和评估深度学习模型的性能。 此外,深度学习语音情感识别还可以采用一些优化方法来提高识别精度和系统的稳定性。例如,引用\[2\]中提到的群体决策优化神经网络方法,可以通过选取在测试集上识别效果最好的模型组成一个神经网络群,然后使用投票的方式给出最终的识别结果。这种方法可以提高识别精度和系统的稳定性。 总的来说,深度学习语音情感识别是一种利用深度学习算法来自动识别语音中情感信息的技术。通过选择合适的数据集、进行数据预处理和采用优化方法,可以提高识别精度和系统的稳定性。 #### 引用[.reference_title] - *1* [基于深度学习的语音情绪识别 Speech emotion recognition based on Deep Learning(二)](https://blog.csdn.net/qq_44554428/article/details/104587893)[target="_blank" data-report-click={"spm":"1018.2226.3001.9630","extra":{"utm_source":"vip_chatgpt_common_search_pc_result","utm_medium":"distribute.pc_search_result.none-task-cask-2~all~insert_cask~default-1-null.142^v91^insertT0,239^v3^insert_chatgpt"}} ] [.reference_item] - *2* [检信ALLEMOTION语音情感识别关键技术介绍](https://blog.csdn.net/weixin_42466538/article/details/123438571)[target="_blank" data-report-click={"spm":"1018.2226.3001.9630","extra":{"utm_source":"vip_chatgpt_common_search_pc_result","utm_medium":"distribute.pc_search_result.none-task-cask-2~all~insert_cask~default-1-null.142^v91^insertT0,239^v3^insert_chatgpt"}} ] [.reference_item] - *3* [基于深度学习的语音情绪识别 Speech emotion recognition based on Deep Learning](https://blog.csdn.net/qq_44554428/article/details/104236591)[target="_blank" data-report-click={"spm":"1018.2226.3001.9630","extra":{"utm_source":"vip_chatgpt_common_search_pc_result","utm_medium":"distribute.pc_search_result.none-task-cask-2~all~insert_cask~default-1-null.142^v91^insertT0,239^v3^insert_chatgpt"}} ] [.reference_item] [ .reference_list ]
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

扫地僧985

喜欢就支持一下,谢谢老板!

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值