Keras多层感知机Demo----噪声分类训练及识别（一）

最新推荐文章于 2024-06-20 09:49:01 发布

置顶 Okay6

最新推荐文章于 2024-06-20 09:49:01 发布

阅读量1.7k

点赞数 3

文章标签：深度学习神经网络机器学习

本文链接：https://blog.csdn.net/weixin_41862148/article/details/105740297

版权

Keras多层感知机Demo----噪声分类训练及识别（一）

为什么写这个Demo
简介
- NoiseX-92 数据集简介
- Keras 简介
基本思路
- 数据处理
- 模型构造
环境准备
操作流程
总结
- 问题总结
- 经验总结
后续

为什么写这个Demo

1.Keras入门Demo是众所周知的MNIST手写字符识别Demo，这个Demo已经被很多机器学习的入门教程拿来做例子，真的是很没新意了.
2.MNIST数据集提供了已经处理好的数据(数据清洗、归一化、特征标签)，新手拿到数据后往往对数据集是如何处理的感到很懵。
3.分享一些在学习过程中遇到的问题和积累的经验。

简介

NoiseX-92 数据集简介

NoiseX-92数据集(github:NoiseX-92)，其中有15种不同类别的噪音数据，文件格式为wav，每段音频长度3min25s,具体分类见下图：

噪声类别说明：

English	中文
babble	嘈杂人声
buccaneer1	掠夺者攻击机噪声 (1)
buccaneer2	掠夺者攻击机噪声 (2)
destroyerengine	驱动舰发动机噪声
destroyerops	驱动舰动力室噪声
f16	F16战斗机噪声
factory1	工厂噪声(1)
factory2	工厂噪声(2)
hfchannel	调频噪声
leopard	豹式坦克噪声
m190	M190战斗机噪声
machinegun	机关枪噪声
volvo	沃尔沃发动机声音
pink	粉红噪声 (特定频率噪声)
white	白噪声（特定频率噪声）

Keras 简介

快速搭建深度神经网络的高层框架，Google大神开发，后端（底层）可衔接Tensorflow，非常简洁，中文官网：Keras中文站。

基本思路

数据处理

数据切分：原始数据只有15个wav文件，这个量级是无法进行模型训练的，而且3min25s的音频数据也不可能直接送入程序进行训练，所以切分是必须的。
特征提取：我们要做的是分类训练及噪声类别识别，所以提取噪声特征是必须的。这里将使用音频中的MFCC参数作为每段音频的主要特征。
MFCC：梅尔倒谱系数。是音频处理中表示音频特征的重要参数。想了解的同学=>MFCC简介
数据-标签字典生成：提取到噪声特征后，我们要为数据集生成对应标签集，并且保证二者的对应关系的准确映射。
训练集-测试集分割：深度学习必不可少的一步，训练集与测试集分别用于模型训练与模型评估。
数据归一化处理：将张量数据全部归一化，这样做可以减少程序运算量，加快梯度下降速度，提高精准度。
数据持久化：我们要将处理好的音频持久化到本地磁盘，这样更加方便数据加载，减少磁盘IO与数据集预处理的时间，毕竟不能每次调试都重新去分割、生成数据集。
注意：我们的数据集最后应该有四部分：

x_train	y_train	x_test	y_test
训练特征数据	训练标签数据	测试特征数据	测试标签数据

模型构造

本次噪声识别训练模型架构为最简单的MLP(多层感知机)。共计分为5个layer。
第一层：Dense全连接层作为输入层；
第二层：Dropout层随机丢弃神经元防止过拟合；
第三层：Dense全连接层作为中间隐藏层；
第四层：Dropout层随机丢弃神经元防止过拟合；
第五层：Dropout全连接层作为输出层。

环境准备

1.操作系统：最好带Nividia显卡，GPU加速很重要，不过我们的数据集很小，CPU也可以run起来。
2.python环境：python3.6，建议使用anconda虚拟环境。
3.python模块：

tensorflow-gpu(tensorflow-cpu)版本
keras最新版本
pydub 音频处理库
librosa MFCC音频特征提取

4.代码编辑器：pycharm

操作流程

数据处理

1.下载数据集到本地，将NoiseX-92数据集下载到本地目录。
2.编写wav文件分割程序，代码如下:


def data_pre_process(data_dir_path):
    """
    数据预处理
    :param data_dir_path:数据存放目录
    :return:
    """
    if not os.path.isdir(data_dir_path):
        sys.stderr.write('数据存放目录参数错误')
        sys.exit(-1)
    processed_data_dir = os.path.join(data_dir_path, 'tmp')
    if os.path.exists(processed_data_dir):
        shutil.rmtree(processed_data_dir)
        os.mkdir(processed_data_dir)
    else:
        os.mkdir(processed_data_dir)
    for wav_file in os.listdir(data_dir_path):
        if wav_file != 'tmp':
            spit_wav_file(os.path.join(data_dir_path, wav_file), processed_data_dir, 1)
    print('====== wav文件分割完毕 ======')
    return processed_data_dir


def spit_wav_file(wav_file_path, save_path, step):
    """
    按步长分割音频文件
    :param wav_file_path: 音频文件路径
    :param save_path: 文件保存路径
    :param step: 步长
    :return:
    """
    AudioSegment.converter = r'ffmpeg.exe'
    label_dir = os.path.join(save_path, os.path.basename(wav_file_path).split('.')[0])
    os.mkdir(label_dir)
    wav_file = AudioSegment.from_wav(wav_file_path)
    for i in range(0, int(wav_file.duration_seconds), step):
        wav_file[i * 1000:(i + step) * 1000].export(os.path.join(label_dir, str(uuid.uuid4()) + '.wav'), format='wav')

– 代码解释：

首先建立tmp目录，用于存放切分的文件。
我们将NoiseX-92中的15份噪声文件按秒切分，并且放到以对应类别命名的文件夹下面。大家在分割文件时可以修改step参数，按想要的步长进行分割。

3.编写音频特征提取程序，代码如下：

def feature_exact(wav_file):
    """
    wav文件MFCC特征提取
    :param wav_file:wav文件路径
    :return:(20,40)形状张量数据
    """
    y, sr = librosa.load(wav_file, sr=None)
    mfcc = librosa.feature.mfcc(y=y, sr=sr)
    if mfcc.shape[1] == 40:
        return mfcc
    else:
        return None

– 代码解释：

传入wav文件路径，返回对应MFCC特征，在此过程中发现少量形状不一致的数据，所以统一只返回20D40层张量数据。

4.编写主要处理程序，包括数据-标签字典生成，训练集与测试集划分、归一化处理、数据持久化，代码如下：

def gen_data(data_dir_path):
    """
    加载训练数据与测试数据
    :param data_dir_path:数据存放目录
    :return:(x_train,x_test),(y_train,y_test)
    """
    print('===== 正在进行wav数据转矩阵处理 =====')
    data_dict = {}
    label_dict = {}
    counter = 0
    for label in os.listdir(data_dir_path):
        label_dict[label] = counter
        counter += 1
    for label in os.listdir(data_dir_path):
        counter = 0
        for wav_file in os.listdir(os.path.join(data_dir_path, label)):
            wav_matrix = wav_2_matrix(os.path.join(os.path.join(data_dir_path, label), wav_file))
            if wav_matrix is not None:
                if counter % 5 == 0:
                    data_dict[str(uuid.uuid4())] = {'data': wav_matrix, 'label': label_dict.get(label), 'type': 'test'}
                else:
                    data_dict[str(uuid.uuid4())] = {'data': wav_matrix, 'label': label_dict.get(label), 'type': 'train'}
                counter += 1
    print('===== wav数据转矩阵完成 =====')
    print('===== 数据长度 {} ====='.format(len(data_dict.values())))
    print('===== 正在分割数据集 ======')
    x_train_list = []
    x_test_list = []
    y_train_list = []
    y_test_list = []
    scaler = preprocessing.MinMaxScaler(feature_range=(0, 1))
    for k, v in data_dict.items():
        if v.get('type') == 'train':
            x_train_list.append(scaler.fit_transform(v.get('data')))
            y_train_list.append(v.get('label'))
        else:
            x_test_list.append(scaler.fit_transform(v.get('data')))
            y_test_list.append(v.get('label'))

    x_train = np.array(x_train_list)
    x_test = np.array(x_test_list)
    y_train = np.array(y_train_list)
    y_test = np.array(y_test_list)
    print('===== 正在进行归一化处理 ======')
    
    x_train = scaler.fit_transform(x_train.reshape(x_train.shape[0], x_train.shape[1] * x_train.shape[2]))
    x_test = scaler.fit_transform(x_test.reshape(x_test.shape[0], x_test.shape[1] * x_test.shape[2]))
    x_train = scaler.fit_transform(x_train)
    x_test = scaler.fit_transform(x_test)
    print('===== 正在写入分类字典 ======')
    if os.path.exists('label_dict.txt'):
        os.remove('label_dict.txt')
    with open('label_dict.txt', 'w+') as label_dict_file:
        label_dict_file.write(str(label_dict))
        label_dict_file.close()
    print('===== 正在保存数据集 ======')
    np.savez('data/noise.npz', x_train=x_train, y_train=y_train, x_test=x_test, y_test=y_test)

– 代码解释：

此程序参数为第一个程序中的tmp目录路径，程序会遍历所有噪声类别的文件夹，生成对应文件的数据-标签字典，并且每隔5steps分割为测试数据,同时生成label_dict.txt标签字典文件供后续验证查询使用，因为标签分类经过one-hot处理后变为0-15内的整数，所以想知道对应分类需要到此文件查询，不过本文并未涉及。
最后，程序会将生成的x_train,y_train,x_test,y_test数据通过numpy的savez方法持久化到data/noise.npz文件中。

4.编写加载数据的简单调用方法，代码如下：

def load_data_set(data_path):
    """
    从本地磁盘加载训练数据集
    :param data_path:
    :return:(x_train,y_train),(x_test,y_test)
    """
    data = np.load(data_path)
    return (data['x_train'], data['y_train']), (data['x_test'], data['y_test'])

– 代码解释：

传入noise.npz路径，调用numpy.load()方法加载数据，返回所需的
x_train,y_train,x_test,y_test张量数据。

5.编写数据处理代码，生成数据集并持久化到磁盘，代码如下：

data_dir_path = 'D:/experiment/Noises/NoiseX-92'
gen_data(data_pre_process(data_dir_path))

– 代码解释：

读取wav噪声文件，生成数据集并持久化到磁盘

下面给出数据处理工具类完整代码：

# -*- coding: UTF-8 -*-
"""
Keras 噪音识别训练工具类
"""
import os
import shutil
import sys
import uuid
import wave

import numpy as np
from pydub import AudioSegment
from sklearn import preprocessing
from sklearn.model_selection import train_test_split
import librosa


def data_pre_process(data_dir_path):
    """
    数据预处理
    :param data_dir_path:数据存放目录
    :return:
    """
    if not os.path.isdir(data_dir_path):
        sys.stderr.write('数据存放目录参数错误')
        sys.exit(-1)
    processed_data_dir = os.path.join(data_dir_path, 'tmp')
    if os.path.exists(processed_data_dir):
        shutil.rmtree(processed_data_dir)
        os.mkdir(processed_data_dir)
    else:
        os.mkdir(processed_data_dir)
    for wav_file in os.listdir(data_dir_path):
        if wav_file != 'tmp':
            spit_wav_file(os.path.join(data_dir_path, wav_file), processed_data_dir, 1)
    print('====== wav文件分割完毕 ======')
    return processed_data_dir


def spit_wav_file(wav_file_path, save_path, step):
    """
    按步长分割音频文件
    :param wav_file_path: 音频文件路径
    :param save_path: 文件保存路径
    :param step: 步长
    :return:
    """
    AudioSegment.converter = r'ffmpeg.exe'
    label_dir = os.path.join(save_path, os.path.basename(wav_file_path).split('.')[0])
    os.mkdir(label_dir)
    wav_file = AudioSegment.from_wav(wav_file_path)
    for i in range(0, int(wav_file.duration_seconds), step):
        wav_file[i * 1000:(i + step) * 1000].export(os.path.join(label_dir, str(uuid.uuid4()) + '.wav'), format='wav')


def gen_data(data_dir_path):
    """
    生成训练数据与测试数据
    :param data_dir_path:数据存放目录
    :return:(x_train,x_test),(y_train,y_test)
    """
    print('===== 正在进行wav数据转矩阵处理 =====')
    data_dict = {}
    label_dict = {}
    counter = 0
    for label in os.listdir(data_dir_path):
        label_dict[label] = counter
        counter += 1
    for label in os.listdir(data_dir_path):
        counter = 0
        for wav_file in os.listdir(os.path.join(data_dir_path, label)):
            wav_matrix = wav_2_matrix(os.path.join(os.path.join(data_dir_path, label), wav_file))
            if wav_matrix is not None:
                if counter % 5 == 0:
                    data_dict[str(uuid.uuid4())] = {'data': wav_matrix, 'label': label_dict.get(label), 'type': 'test'}
                else:
                    data_dict[str(uuid.uuid4())] = {'data': wav_matrix, 'label': label_dict.get(label), 'type': 'train'}
                counter += 1
    print('===== wav数据转矩阵完成 =====')
    print('===== 数据长度 {} ====='.format(len(data_dict.values())))
    print('===== 正在分割数据集 ======')
    x_train_list = []
    x_test_list = []
    y_train_list = []
    y_test_list = []
    scaler = preprocessing.MinMaxScaler(feature_range=(0, 1))
    for k, v in data_dict.items():
        if v.get('type') == 'train':
            x_train_list.append(scaler.fit_transform(v.get('data')))
            y_train_list.append(v.get('label'))
        else:
            x_test_list.append(scaler.fit_transform(v.get('data')))
            y_test_list.append(v.get('label'))

    x_train = np.array(x_train_list)
    x_test = np.array(x_test_list)
    y_train = np.array(y_train_list)
    y_test = np.array(y_test_list)
    print('===== 正在进行归一化处理 ======')

    x_train = scaler.fit_transform(x_train.reshape(x_train.shape[0], x_train.shape[1] * x_train.shape[2]))
    x_test = scaler.fit_transform(x_test.reshape(x_test.shape[0], x_test.shape[1] * x_test.shape[2]))
    x_train = scaler.fit_transform(x_train)
    x_test = scaler.fit_transform(x_test)
    print('===== 正在写入分类字典 ======')
    if os.path.exists('label_dict.txt'):
        os.remove('label_dict.txt')
    with open('label_dict.txt', 'w+') as label_dict_file:
        label_dict_file.write(str(label_dict))
        label_dict_file.close()
    print('===== 数据加载完成 ======')
    np.savez('data/noise_cnn.npz', x_train=x_train, y_train=y_train, x_test=x_test, y_test=y_test)


def feature_exact(wav_file):
    """
    wav文件转矩阵
    :param wav_file:wav文件路径
    :return:矩阵格式wav数据
    """
    y, sr = librosa.load(wav_file, sr=None)
    mfcc = librosa.feature.mfcc(y=y, sr=sr)
    if mfcc.shape[1] == 40:
        return mfcc
    else:
        return None


def load_data_set(data_path):
    """
    从本地磁盘加载训练数据集
    :param data_path:
    :return:(x_train,y_train),(x_test,y_test)
    """
    data = np.load(data_path)
    return (data['x_train'], data['y_train']), (data['x_test'], data['y_test'])

# 生成数据集并持久化到磁盘
data_dir_path = 'D:/experiment/Noises/NoiseX-92'
gen_data(data_pre_process(data_dir_path))

– 代码说明

pydub 需要用到ffmpeg.exe(音频处理程序)，还请自行网上下载。FFmpeg官网

模型搭建

1.使用Keras提供的Sequential API可以迅速构建神经网络,首先给出model构建代码：

# 构建顺序模型
model = Sequential()
# 第一层全连接层 输入层
model.add(Dense(units=40, activation='relu', input_shape=(800,)))
model.add(Dropout(0.25))
# 第二层全连接层 隐藏层
model.add(Dense(units=40, activation='relu'))
model.add(Dropout(0.25))
# 第三层全连接层 输出层
model.add(Dense(num_classes, activation='softmax'))

model.summary()

model.compile(loss='categorical_crossentropy',
              optimizer=RMSprop(lr=0.001),
              metrics=['accuracy'])

– 代码解释：

第一层，40个神经元，激活函数relu，输入形状(800,);
中间Dropout层以0.25的速率随机丢弃神经元，防止过拟合；
第二层，40个神经元，激活函数relu；
中间Dropout层以0.25的速率随机丢弃神经元，防止过拟合；
第三层，输出层，神经元个数与分类数量相同(15),激活函数使用多分类的softmax函数
model.summary()打印模型摘要
模型编译参数：
– loss函数使用分类任务中的交叉熵函数
– 优化器使用 RMSprop，学习速率设为0.001

2.在对分类任务进行训练前，要对标签集进行one-hot(独热)编码处理，代码如下：

# one-hot编码
y_train = keras.utils.to_categorical(y_train, num_classes)
y_test = keras.utils.to_categorical(y_test, num_classes)

– 代码解释：

使用Keras自带工具类对标签集y_train,y_test进行one_hot编码。one-hot编码将向量形式的标签数据转为张量形式，或者说转为神经网络计算的基本数据格式。

3.下面给出模型构建程序的完整代码：

# -*- coding: UTF-8 -*-
"""
 Keras 噪声识别训练程序
"""
from copy import deepcopy

import keras
import numpy as np
from keras import Sequential
from keras.layers import Dense, Dropout
from keras.optimizers import RMSprop

from practice.noise_recognize_util import load_data_set

batch_size = 80
num_classes = 15
epochs = 50

# 预处理并加载数据
(x_train, y_train), (x_test, y_test) = load_data_set('data/noise.npz')
y_test_backup = deepcopy(y_test)
print(x_train.shape[0], '训练数据量')
print(x_test.shape[0], '测试数据量')
# one-hot编码
y_train = keras.utils.to_categorical(y_train, num_classes)
y_test = keras.utils.to_categorical(y_test, num_classes)

print(y_train, '训练标签数据')
print(y_test, '测试标签数据')

# 构建顺序模型
model = Sequential()
# 第一层全连接层 输入层
model.add(Dense(units=40, activation='relu', input_shape=(800,)))
model.add(Dropout(0.25))
# 第二层全连接层 隐藏层
model.add(Dense(units=40, activation='relu'))
model.add(Dropout(0.25))
# 第三层全连接层 输出层
model.add(Dense(num_classes, activation='softmax'))

model.summary()

model.compile(loss='categorical_crossentropy',
              optimizer=RMSprop(lr=0.001),
              metrics=['accuracy'])

model.fit(x_train, y_train, batch_size=batch_size, epochs=epochs, validation_data=(x_test, y_test))

score = model.evaluate(x_test, y_test, verbose=0)

print('Test loss:', score[0])

print('Test accuracy:', score[1])

– 代码解释：

batch_size = 80 num_classes = 15 epochs = 50，预设模型训练超参数
(x_train, y_train), (x_test, y_test) = load_data_set('data/noise.npz')，有了前面的处理步骤，我们在这里加载数据就很轻松了
history = model.fit(x_train, y_train, batch_size=batch_size, epochs=epochs, validation_data=(x_test, y_test))，调用模型的fit方法，传入训练参数进行训练
score = model.evaluate(x_test, y_test, verbose=0)，调用模型的evaluate方法对模型进行评估

模型训练

运行主程序进行模型训练，经过20epochs的训练，最终模型评估结果如下:
模型训练结果
经过20epochs的迭代训练，模型预测准确率达到0.97。效果还是比较理想的。

总结

问题总结

训练集与测试集分布过于集中，存在过拟合。因为我们的训练集和测试集是从同一段音频分割出来的，所以在特征上的离散性比较差。导致预测精度偏高。

经验总结

必须保证训练数据与对应标签的准确映射，最好使用字典来存储对应关系。
要学会使用numpy的持久化数据方法，因为处理原始数据生成张量很耗费时间。将数据集保存到磁盘方便提升数据加载速度。

后续

本次Keras噪声识别训练并没有添加验证集(valid set),而且并没有做后续模型保存、重载、预测等演示。所以后面会再在本次噪声识别训练基础上进行改进。
如果文中有任何问题，欢迎指出。

Okay6

关注

3
点赞
踩
16

收藏

觉得还不错? 一键收藏
2
评论
Keras多层感知机Demo----噪声分类训练及识别（一）

不一样的Keras入门Demo----噪声分类训练及识别为什么写这个DemoNoiseX-92 数据集介绍功能快捷键合理的创建标题，有助于目录的生成如何改变文本的样式插入链接与图片如何插入一段漂亮的代码片生成一个适合你的列表创建一个表格设定内容居中、居左、居右SmartyPants创建一个自定义列表如何创建一个注脚注释也是必不可少的KaTeX数学公式新的甘特图功能，丰富你的文章UML 图表FLow...
复制链接

扫一扫