1.前言
言语是表达我们作为人类最自然的方式。因此,将这种通信媒介扩展到计算机应用程序是很自然的。我们将语音情感识别 (SER) 系统定义为一组方法,用于处理和分类语音信号以检测嵌入的情绪。SER 并不是一个新领域,它已经存在了二十多年,并且由于最近的进步而重新受到关注。这些新颖的研究利用了计算和技术所有领域的进步,因此有必要更新使 SER 成为可能的当前方法和技术。
2.数据集
这里有 4 个最受欢迎的英文数据集:Crema、Ravdess、Savee 和 Tess。它们中的每一个都包含 .wav 格式的音频,并带有一些主要标签。
Ravdess:
Here is the filename identifiers as per the official RAVDESS website:
- Modality (01 = full-AV, 02 = video-only, 03 = audio-only).
- Vocal channel (01 = speech, 02 = song).
- Emotion (01 = neutral, 02 = calm, 03 = happy, 04 = sad, 05 = angry, 06 = fearful, 07 = disgust, 08 = surprised).
- Emotional intensity (01 = normal, 02 = strong). NOTE: There is no strong intensity for the 'neutral' emotion.
- Statement (01 = "Kids are talking by the door", 02 = "Dogs are sitting by the door").
- Repetition (01 = 1st repetition, 02 = 2nd repetition).
- Actor (01 to 24. Odd numbered actors are male, even numbered actors are female).
So, here's an example of an audio filename. 02-01-06-01-02-01-12.wav This means the meta data for the audio file is:
- Video-only (02)
- Speech (01)
- Fearful (06)
- Normal intensity (01)
- Statement "dogs" (02)
- 1st Repetition (01)
- 12th Actor (12) - Female (as the actor ID number is even)
Crema:
The third component is responsible for the emotion label:
- SAD - sadness;
- ANG - angry;
- DIS - disgust;
- FEA - fear;
- HAP - happy;
- NEU - neutral.
Tess:
Very similar to Crema - label of emotion is contained in the name of file.
Savee:
The audio files in this dataset are named in such a way that the prefix letters describes the emotion classes as follows:
- 'a' = 'anger'
- 'd' = 'disgust'
- 'f' = 'fear'
- 'h' = 'happiness'
- 'n' = 'neutral'
- 'sa' = 'sadness'
- 'su' = 'surprise'
关于此目录
add_comment添加建议
这是 Crema 数据集。
SAD - 悲伤;
ANG - 生气;
DIS - 厌恶;
FEA - 恐惧;
HAP - 快乐;
NEU - 中性。
语音情感识别使用一维卷积神经网络¶
在这个实验中,我尝试识别短语音消息中的情感(< 3秒)。我将使用4个数据集,这些数据集包含一些由专业演员配音的英文短语:Ravee、Crema、Savee 和 Tess。
首先,让我们定义一下SER,即语音情感识别。
语音情感识别(SER)是指尝试从语音中识别人的情感和情感状态。这一过程利用了声音往往通过音调和音高反映潜在情感的事实。类似的现象也存在于动物中,比如狗和马,它们利用这种机制来理解人的情感。
本项目使用的数据集包含约7种主要情感:快乐、恐惧、愤怒、厌恶、惊讶、悲伤或中性。
Importing libraries
import os
import re
import librosa
import librosa.display
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import seaborn as sns
from IPython.display import Audio
# from entropy import spectral_entropy
from keras import layers
from keras import models
from keras.utils import np_utils
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder, StandardScaler
from tensorflow.python.keras.callbacks import EarlyStopping, ReduceLROnPlateau
import keras
import itertools
第2步:
# Paths to
Ravdess = "../input/speech-emotion-recognition-en/Ravdess/audio_speech_actors_01-24"
Crema = "../input/speech-emotion-recognition-en/Crema"
Savee = "../input/speech-emotion-recognition-en/Savee"
Tess = "../input/speech-emotion-recognition-en/Tess"
数据准备¶
Ravdess 数据集
以下是根据官方 RAVDESS 网站的文件名标识符:
- 模态(01 = 完整音视频,02 = 仅视频,03 = 仅音频)。
- 声音频道(01 = 语音,02 = 歌曲)。
- 情感(01 = 中性,02 = 平静,03 = 快乐,04 = 悲伤,05 = 愤怒,06 = 恐惧,07 = 厌恶,08 = 惊讶)。
- 情感强度(01 = 正常,02 = 强烈)。注意:'中性'情感没有强烈强度。
- 语句(01 = "孩子们在门边说话",02 = "狗坐在门边")。
- 重复(01 = 第1次重复,02 = 第2次重复)。
- 演员(01到24。奇数编号的演员为男性,偶数编号的演员为女性)。
例如,一个音频文件名为 02-01-06-01-02-01-12.mp4,这意味着音频文件的元数据为:
- 仅视频(02)
- 语音(01)
- 恐惧(06)
- 正常强度(01)
- 语句 "狗"(02)
- 第1次重复(01)
- 第12号演员(12)- 女性(因为演员ID是偶数)
第3步:
ravdess_directory_list = os.listdir(Ravdess)
emotion_df = []
for dir in ravdess_directory_list:
actor = os.listdir(os.path.join(Ravdess, dir))
for wav in actor:
info = wav.partition(".wav")[0].split("-")
emotion = int(info[2])
emotion_df.append((emotion, os.path.join(Ravdess, dir, wav)))
Ravdess_df = pd.DataFrame.from_dict(emotion_df)
Ravdess_df.rename(columns={1 : "Path", 0 : "Emotion"}, inplace=True)
Ravdess_df.head()
Ravdess_df.Emotion.replace({1:'neutral', 2:'neutral', 3:'happy', 4:'sad', 5:'angry', 6:'fear', 7:'disgust', 8:'surprise'}, inplace=True)
Ravdess_df.head()
第4步:Crema dataset
emotion_df = []
for wav in os.listdir(Crema):
info = wav.partition(".wav")[0].split("_")
if info[2] == 'SAD':
emotion_df.append(("sad", Crema + "/" + wav))
elif info[2] == 'ANG':
emotion_df.append(("angry", Crema + "/" + wav))
elif info[2] == 'DIS':
emotion_df.append(("disgust", Crema + "/" + wav))
elif info[2] == 'FEA':
emotion_df.append(("fear", Crema + "/" + wav))
elif info[2] == 'HAP':
emotion_df.append(("happy", Crema + "/" + wav))
elif info[2] == 'NEU':
emotion_df.append(("neutral", Crema + "/" + wav))
else:
emotion_df.append(("unknown", Crema + "/" + wav))
Crema_df = pd.DataFrame.from_dict(emotion_df)
Crema_df.rename(columns={1 : "Path", 0 : "Emotion"}, inplace=True)
Crema_df.head()
读取指定目录下的音频文件,根据文件名中的情感标签将文件路径和情感类别存入列表,并最终将其转换为数据框,方便后续分析和处理。
第5步:TESS dataset
tess_directory_list = os.listdir(Tess)
emotion_df = []
for dir in tess_directory_list:
for wav in os.listdir(os.path.join(Tess, dir)):
info = wav.partition(".wav")[0].split("_")
emo = info[2]
if emo == "ps":
emotion_df.append(("surprise", os.path.join(Tess, dir, wav)))
else:
emotion_df.append((emo, os.path.join(Tess, dir, wav)))
Tess_df = pd.DataFrame.from_dict(emotion_df)
Tess_df.rename(columns={1 : "Path", 0 : "Emotion"}, inplace=True)
Tess_df.head()
Savee dataset
The audio files in this dataset are named in such a way that the prefix letters describes the emotion classes as follows:
- 'a' = 'anger'
- 'd' = 'disgust'
- 'f' = 'fear'
- 'h' = 'happiness'
- 'n' = 'neutral'
- 'sa' = 'sadness'
- 'su' = 'surprise'
savee_directiory_list = os.listdir(Savee)
emotion_df = []
for wav in savee_directiory_list:
info = wav.partition(".wav")[0].split("_")[1].replace(r"[0-9]", "")
emotion = re.split(r"[0-9]", info)[0]
if emotion=='a':
emotion_df.append(("angry", Savee + "/" + wav))
elif emotion=='d':
emotion_df.append(("disgust", Savee + "/" + wav))
elif emotion=='f':
emotion_df.append(("fear", Savee + "/" + wav))
elif emotion=='h':
emotion_df.append(("happy", Savee + "/" + wav))
elif emotion=='n':
emotion_df.append(("neutral", Savee + "/" + wav))
elif emotion=='sa':
emotion_df.append(("sad", Savee + "/" + wav))
else:
emotion_df.append(("surprise", Savee + "/" + wav))
Savee_df = pd.DataFrame.from_dict(emotion_df)
Savee_df.rename(columns={1 : "Path", 0 : "Emotion"}, inplace=True)
Savee_df.head()
ravdess_final_data = augmented_data.iloc[0:5760,]
crema_final_data = augmented_data.iloc[5760:35528,]
tess_final_data = augmented_data.iloc[35528:46728,]
savee_final_data = augmented_data.iloc[46728:,]
%matplotlib inline
plt.style.use("ggplot")
plt.title("Count of emotions:")
sns.countplot(x=df["Emotion"])
sns.despine(top=True, right=True, left=False, bottom=False)
def create_waveplot(data, sr, e):
plt.figure(figsize=(10, 3))
plt.title(f'Waveplot for audio with {e} emotion', size=15)
librosa.display.waveplot(data, sr=sr)
plt.show()
def create_spectrogram(data, sr, e):
# stft function converts the data into short term fourier transform
X = librosa.stft(data)
Xdb = librosa.amplitude_to_db(abs(X))
plt.figure(figsize=(12, 3))
plt.title('Spectrogram for audio with {} emotion'.format(e), size=15)
librosa.display.specshow(Xdb, sr=sr, x_axis='time', y_axis='hz')
#librosa.display.specshow(Xdb, sr=sr, x_axis='time', y_axis='log')
plt.colorbar()
emotion='fear'
path = np.array(df.Path[df.Emotion==emotion])[1]
data, sampling_rate = librosa.load(path)
create_waveplot(data, sampling_rate, emotion)
create_spectrogram(data, sampling_rate, emotion)
Audio(path)
emotion='angry'
path = np.array(df.Path[df.Emotion==emotion])[1]
data, sampling_rate = librosa.load(path)
create_waveplot(data, sampling_rate, emotion)
create_spectrogram(data, sampling_rate, emotion)
Audio(path)
emotion='sad'
path = np.array(df.Path[df.Emotion==emotion])[1]
data, sampling_rate = librosa.load(path)
create_waveplot(data, sampling_rate, emotion)
create_spectrogram(data, sampling_rate, emotion)
Audio(path)
第6步:Data augmentation
We have some ways for data augmentation in sound data:
- Noise injection
- Stretching
- Shifting
- Pitching
def noise(data, random=False, rate=0.035, threshold=0.075):
"""Add some noise to sound sample. Use random if you want to add random noise with some threshold.
Or use rate Random=False and rate for always adding fixed noise."""
if random:
rate = np.random.random() * threshold
noise_amp = rate*np.random.uniform()*np.amax(data)
data = data + noise_amp*np.random.normal(0,1,size=data.size)
return data
def stretch(data, rate=0.8):
"""Stretching data with some rate."""
return librosa.effects.time_stretch(data, rate)
def shift(data, rate=1000):
"""Shifting data with some rate"""
shift_range = int(np.random.uniform(low=-5, high = 5)*rate)
return np.roll(data, shift_range)
def pitch(data, sampling_rate, pitch_factor=0.7, random=False):
""""Add some pitch to sound sample. Use random if you want to add random pitch with some threshold.
Or use pitch_factor Random=False and rate for always adding fixed pitch."""
if random:
pitch_factor=np.random.random() * pitch_factor
return librosa.effects.pitch_shift(data, sampling_rate, pitch_factor)
df.head()
第7步:
path = df[df["Emotion"] == "happy"]["Path"].iloc[0]
data, sampling_rate = librosa.load(path)
Adding white noise
white_noised_data = add_white_noise(data, 0.1)
plt.figure(figsize=(14,4))
librosa.display.waveplot(y=white_noised_data, sr=sampling_rate)
Audio(white_noised_data, rate=sampling_rate)
plt.figure(figsize=(14,4))
librosa.display.waveplot(data, sampling_rate)
Audio(path)
我们的数据增强,将使用噪声注入、音高调整以及这两者的组合。
第8步:特征提取
特征提取¶
以下是一些可能有用的特征:
1. **零交叉率** (Zero Crossing Rate)
信号在特定帧内的符号变化率。
2. **能量** (Energy)
信号值的平方和,按相应的帧长度进行归一化。
3. **能量熵** (Entropy of Energy)
子帧归一化能量的熵,可作为突变的度量。
4. **谱质心** (Spectral Centroid)
频谱的重心位置。
5. **谱宽度** (Spectral Spread)
频谱的二阶中心矩。
6. **谱熵** (Spectral Entropy)
一组子帧归一化谱能量的熵。
7. **谱流量** (Spectral Flux)
两个连续帧的谱幅度的平方差。
8. **谱滚降** (Spectral Rolloff)
频谱中90%幅度分布集中在该频率以下。
9. **梅尔频率倒谱系数** (MFCCs)
梅尔频率倒谱系数形成的倒谱表示,其中频带不是线性的,而是根据梅尔尺度分布。
n_fft = 2048
hop_length = 512
def chunks(data, frame_length, hop_length):
for i in range(0, len(data), hop_length):
yield data[i:i+frame_length]
# Zero Crossing Rate
def zcr(data, frame_length=2048, hop_length=512):
zcr = librosa.feature.zero_crossing_rate(y=data, frame_length=frame_length, hop_length=hop_length)
return np.squeeze(zcr)
def energy(data, frame_length=2048, hop_length=512):
en = np.array([np.sum(np.power(np.abs(data[hop:hop+frame_length]), 2)) for hop in range(0, data.shape[0], hop_length)])
return en / frame_length
def rmse(data, frame_length=2048, hop_length=512):
rmse = librosa.feature.rms(y=data, frame_length=frame_length, hop_length=hop_length)
return np.squeeze(rmse)
def entropy_of_energy(data, frame_length=2048, hop_length=512):
energies = energy(data, frame_length, hop_length)
energies /= np.sum(energies)
entropy = 0.0
entropy -= energies * np.log2(energies)
return entropy
def spc(data, sr, frame_length=2048, hop_length=512):
spectral_centroid = librosa.feature.spectral_centroid(y=data, sr=sr, n_fft=frame_length, hop_length=hop_length)
return np.squeeze(spectral_centroid)
# def spc_entropy(data, sr):
# spc_en = spectral_entropy(data, sf=sr, method="fft")
# return spc_en
def spc_flux(data):
isSpectrum = data.ndim == 1
if isSpectrum:
data = np.expand_dims(data, axis=1)
X = np.c_[data[:, 0], data]
af_Delta_X = np.diff(X, 1, axis=1)
vsf = np.sqrt((np.power(af_Delta_X, 2).sum(axis=0))) / X.shape[0]
return np.squeeze(vsf) if isSpectrum else vsf
def spc_rollof(data, sr, frame_length=2048, hop_length=512):
spcrollof = librosa.feature.spectral_rolloff(y=data, sr=sr, n_fft=frame_length, hop_length=hop_length)
return np.squeeze(spcrollof)
def chroma_stft(data, sr, frame_length=2048, hop_length=512, flatten: bool = True):
stft = np.abs(librosa.stft(data))
chroma_stft = librosa.feature.chroma_stft(S=stft, sr=sr)
return np.squeeze(chroma_stft.T) if not flatten else np.ravel(chroma_stft.T)
def mel_spc(data, sr, frame_length=2048, hop_length=512, flatten: bool = True):
mel = librosa.feature.melspectrogram(y=data, sr=sr)
return np.squeeze(mel.T) if not flatten else np.ravel(mel.T)
def mfcc(data, sr, frame_length=2048, hop_length=512, flatten: bool = True):
mfcc_feature = librosa.feature.mfcc(y=data, sr=sr)
return np.squeeze(mfcc_feature.T) if not flatten else np.ravel(mfcc_feature.T)
print("ZCR: ", zcr(data).shape)
print("Energy: ", energy(data).shape)
print("Entropy of Energy :", entropy_of_energy(data).shape)
print("RMS :", rmse(data).shape)
print("Spectral Centroid :", spc(data, sampling_rate).shape)
# print("Spectral Entropy: ", spc_entropy(data, sampling_rate).shape)
print("Spectral Flux: ", spc_flux(data).shape)
print("Spectral Rollof: ", spc_rollof(data, sampling_rate).shape)
print("Chroma STFT: ", chroma_stft(data, sampling_rate).shape)
print("MelSpectrogram: ", mel_spc(data, sampling_rate).shape)
print("MFCC: ", mfcc(data, sampling_rate).shape)
实验中决定仅使用三个主要特征来完成此任务:零交叉率(ZCR)、均方根(RMS)和梅尔频率倒谱系数(MFCC)。
此外,实验中决定仅使用2.5秒的时长和0.6秒的偏移量——在数据集中,前0.6秒不包含有关情感的信息,而且大多数样本长度小于3秒。
核心代码如下:
def extract_features(data, sr, frame_length=2048, hop_length=512):
result = np.array([])
result = np.hstack((result,
zcr(data, frame_length, hop_length),
# np.mean(energy(data, frame_length, hop_length),axis=0),
# np.mean(entropy_of_energy(data, frame_length, hop_length), axis=0),
rmse(data, frame_length, hop_length),
# spc(data, sr, frame_length, hop_length),
# spc_entropy(data, sr),
# spc_flux(data),
# spc_rollof(data, sr, frame_length, hop_length),
# chroma_stft(data, sr, frame_length, hop_length),
# mel_spc(data, sr, frame_length, hop_length, flatten=True)
mfcc(data, sr, frame_length, hop_length)
))
return result
def get_features_with_augmentation(path, duration=2.5, offset=0.6):
# duration and offset are used to take care of the no audio in start and the ending of each audio files as seen above.
data, sample_rate = librosa.load(path, duration=duration, offset=offset)
# without augmentation
res1 = extract_features(data, sample_rate)
result = np.array(res1)
# data with noise
noise_data = noise(data, random=True)
res2 = extract_features(noise_data, sample_rate)
result = np.vstack((result, res2)) # stacking vertically
# data with pitching
pitched_data = pitch(data, sample_rate, random=True)
res3 = extract_features(pitched_data, sample_rate)
result = np.vstack((result, res3)) # stacking vertically
# data with pitching and white_noise
new_data = pitch(data, sample_rate, random=True)
data_noise_pitch = noise(new_data, random=True)
res3 = extract_features(data_noise_pitch, sample_rate)
result = np.vstack((result, res3)) # stacking vertically
return result
def get_features_without_augmentation(path, duration=2.5, offset=0.6):
# duration and offset are used to take care of the no audio in start and the ending of each audio files as seen above.
data, sample_rate = librosa.load(path, duration=duration, offset=offset)
# without augmentation
res1 = extract_features(data, sample_rate)
return res1
X, Y = [], []
print("Feature processing...")
for path, emotion, ind in zip(df.Path, df.Emotion, range(df.Path.shape[0])):
features = get_features_with_augmentation(path)
if ind % 100 == 0:
print(f"{ind} samples has been processed...")
for ele in features:
X.append(ele)
# appending emotion 3 times as we have made 3 augmentation techniques on each audio file.
Y.append(emotion)
print("Done.")
第9步:
数据准备¶
目前我们已经提取了数据,接下来需要对数据进行归一化处理,并将数据拆分为训练集和测试集。
X = extracted_df.drop(labels="labels", axis=1)
Y = extracted_df["labels"]
X = x_train
Y = y_train
lb = LabelEncoder()
Y = np_utils.to_categorical(lb.fit_transform(Y))
print(lb.classes_)
第10步: Let's define our model:
earlystopping = EarlyStopping(monitor ="val_acc",
mode = 'auto', patience = 5,
restore_best_weights = True)
learning_rate_reduction = ReduceLROnPlateau(monitor='val_acc',
patience=3,
verbose=1,
factor=0.5,
min_lr=0.00001)
model = models.Sequential()
model.add(layers.Conv1D(512, kernel_size=5, strides=1,
padding="same", activation="relu",
input_shape=(X_train.shape[1], 1)))
model.add(layers.BatchNormalization())
model.add(layers.MaxPool1D(pool_size=5, strides=2, padding="same"))
model.add(layers.Conv1D(512, kernel_size=5, strides=1,
padding="same", activation="relu"))
model.add(layers.BatchNormalization())
model.add(layers.MaxPool1D(pool_size=5, strides=2, padding="same"))
model.add(layers.Conv1D(256, kernel_size=5, strides=1,
padding="same", activation="relu"))
model.add(layers.BatchNormalization())
model.add(layers.MaxPool1D(pool_size=5, strides=2, padding="same"))
model.add(layers.Conv1D(256, kernel_size=3, strides=1, padding='same', activation='relu'))
model.add(layers.BatchNormalization())
model.add(layers.MaxPooling1D(pool_size=5, strides = 2, padding = 'same'))
model.add(layers.Conv1D(128, kernel_size=3, strides=1, padding='same', activation='relu'))
model.add(layers.BatchNormalization())
model.add(layers.MaxPooling1D(pool_size=3, strides = 2, padding = 'same'))
model.add(layers.Flatten())
model.add(layers.Dense(512, activation='relu'))
model.add(layers.BatchNormalization())
model.add(layers.Dense(7, activation="softmax"))
model.compile(optimizer="rmsprop", loss="categorical_crossentropy", metrics=["acc",keras.metrics.Recall(),keras.metrics.Precision()])
model.summary()
EPOCHS = 50
batch_size = 64
history = model.fit(X_train, y_train, validation_data=(X_val, y_val),
epochs=EPOCHS, batch_size=batch_size,
callbacks=[earlystopping, learning_rate_reduction])
print("Accuracy of our model on test data : " , model.evaluate(X_test,y_test)[1]*100 , "%")
fig , ax = plt.subplots(1,2)
train_acc = history.history['acc']
train_loss = history.history['loss']
test_acc = history.history['val_acc']
test_loss = history.history['val_loss']
fig.set_size_inches(20,6)
ax[0].plot(train_loss, label = 'Training Loss')
ax[0].plot(test_loss , label = 'Testing Loss')
ax[0].set_title('Training & Testing Loss')
ax[0].legend()
ax[0].set_xlabel("Epochs")
ax[1].plot(train_acc, label = 'Training Accuracy')
ax[1].plot(test_acc , label = 'Testing Accuracy')
ax[1].set_title('Training & Testing Accuracy')
ax[1].legend()
ax[1].set_xlabel("Epochs")
plt.show()
77/77 [==============================] - 2s 23ms/step - loss: 3.4681 - acc: 0.5783 - recall_1: 0.5734 - precision_1: 0.5827 Accuracy of our model on test data : 57.82983899116516 %
第11步:CREMA ACCURACY
print("Accuracy of our model on test data : " , model.evaluate(X_test,y_test)[1]*100 , "%")
fig , ax = plt.subplots(1,2)
train_acc = history.history['acc']
train_loss = history.history['loss']
test_acc = history.history['val_acc']
test_loss = history.history['val_loss']
fig.set_size_inches(20,6)
ax[0].plot(train_loss, label = 'Training Loss')
ax[0].plot(test_loss , label = 'Testing Loss')
ax[0].set_title('Training & Testing Loss')
ax[0].legend()
ax[0].set_xlabel("Epochs")
ax[1].plot(train_acc, label = 'Training Accuracy')
ax[1].plot(test_acc , label = 'Testing Accuracy')
ax[1].set_title('Training & Testing Accuracy')
ax[1].legend()
ax[1].set_xlabel("Epochs")
plt.show()
187/187 [==============================] - 4s 23ms/step - loss: 0.2776 - acc: 0.9483 - f1_m: 0.9489 - recall_6: 0.9476 - precision_6: 0.9498 Accuracy of our model on test data : 94.82700824737549 %
第12步:SAVEE ACCURACY
print("Accuracy of our model on test data : " , model.evaluate(X_test,y_test)[1]*100 , "%")
fig , ax = plt.subplots(1,2)
train_acc = history.history['acc']
train_loss = history.history['loss']
test_acc = history.history['val_acc']
test_loss = history.history['val_loss']
fig.set_size_inches(20,6)
ax[0].plot(train_loss, label = 'Training Loss')
ax[0].plot(test_loss , label = 'Testing Loss')
ax[0].set_title('Training & Testing Loss')
ax[0].legend()
ax[0].set_xlabel("Epochs")
ax[1].plot(train_acc, label = 'Training Accuracy')
ax[1].plot(test_acc , label = 'Testing Accuracy')
ax[1].set_title('Training & Testing Accuracy')
ax[1].legend()
ax[1].set_xlabel("Epochs")
plt.show()
12/12 [==============================] - 0s 23ms/step - loss: 0.3354 - acc: 0.9245 - f1_m: 0.9279 - recall_1: 0.9193 - precision_1: 0.9363 Accuracy of our model on test data : 92.44791865348816 %
第13步:TESS ACCURACY
print("Accuracy of our model on test data : " , model.evaluate(X_test,y_test)[1]*100 , "%")
fig , ax = plt.subplots(1,2)
train_acc = history.history['acc']
train_loss = history.history['loss']
test_acc = history.history['val_acc']
test_loss = history.history['val_loss']
fig.set_size_inches(20,6)
ax[0].plot(train_loss, label = 'Training Loss')
ax[0].plot(test_loss , label = 'Testing Loss')
ax[0].set_title('Training & Testing Loss')
ax[0].legend()
ax[0].set_xlabel("Epochs")
ax[1].plot(train_acc, label = 'Training Accuracy')
ax[1].plot(test_acc , label = 'Testing Accuracy')
ax[1].set_title('Training & Testing Accuracy')
ax[1].legend()
ax[1].set_xlabel("Epochs")
plt.show()
70/70 [==============================] - 2s 23ms/step - loss: 5.8600e-04 - acc: 1.0000 - f1_m: 1.0000 - recall_4: 1.0000 - precision_4: 1.0000 Accuracy of our model on test data : 100.0 %
第14步骤:CREMA CONFUSION MATRIX
cm_plot_labels = ['angry', 'disgust', 'fear', 'happy', 'neutral', 'sad', 'surprise']
plot_confusion_matrix(cm=cm, classes=cm_plot_labels, title='Confusion Matrix')
Confusion matrix, without normalization [[953 9 7 13 4 0] [ 12 979 9 15 13 19] [ 11 10 930 11 6 22] [ 21 13 15 946 6 4] [ 0 5 11 10 863 18] [ 1 8 8 2 25 975]]
SAVEE CONFUSION MATRIX¶
TESS CONFUSION MATRIX
3.交流与联系
由于篇幅和时间有点,下次继续学习与更新(有事可以添加好友交流学习)
微信 | |