【语音识别】MFCC+IPC特征+SVM中英语种识别【含Matlab源码 612期】

✅博主简介:热爱科研的Matlab仿真开发者,修心和技术同步精进,Matlab项目合作可私信。
🍎个人主页:海神之光
🏆代码获取方式:
海神之光Matlab王者学习之路—代码获取方式
⛳️座右铭:行百里者,半于九十。

更多Matlab仿真内容点击👇
Matlab图像处理(进阶版)
路径规划(Matlab)
神经网络预测与分类(Matlab)
优化求解(Matlab)
语音处理(Matlab)
信号处理(Matlab)
车间调度(Matlab)

⛄一、语种识别音频处理简介

1 基本原理
语种识别,根据一段音频判断该音频是英语、中语还是法语,即判断音频的语种。语种识别项目的整体思想就是把语音数据转换成相应的语谱图或者MFCC特征,再对特征进行分析,从而判断出该语音数据的语种类别。

2 公开数据集
Topcoder 竞赛 数据(44.1khz 的 mp3 录音,每条 10 秒,176 种语言合计 66176(176*376)条数据,诸多小语种)。

3 基本音频处理流程
语音输入,然后音频信号特征提取,然后进行特征分析处理,最终得到结果,其中音频特征提取多半采用频谱图或者MFCC特征。

4 详解
4.1 语音输入
wav(波形音频文件)mp3 文件或是麦克风中输入的音频信号输入音频

4.2 音频信号特证提取
语音信号处理的目的是弄清语音中各个频率成分的分布。常用的数学工具是傅里叶变换,而傅里叶变换要求输入信号是平稳的,需要对语音信号进行分帧处理,截取出来的一小段信号(通常 20-30ms)就叫一帧。【微观里断定输入信号是平稳的】
语音分帧→每一帧分别 FFT( 离散傅立叶变换) →求取 FFT 之后的幅度/能量,这些数值都是正值,类似图像的像素点,显示出来就是语谱图。
其中语谱图的 x 是时间,y 轴是频率。利用语谱图可以查看指定频率端的能量分布。

⛄二、部分源代码

clc;
clear;
load traindata Myfeature
A1=zeros(1,30);
A2=ones(1,30);
Group=[A1,A2];
TrainData=Myfeature;
SVMStruct = svmtrain(TrainData,Group);

N=5.3;
Tw = 25; % analysis frame duration (ms)
Ts = 10; % analysis frame shift (ms)
alpha = 0.97; % preemphasis coefficient
R = [ 300 3700 ]; % frequency range to consider
M = 20; % number of filterbank channels
C = 13; % number of cepstral coefficients
L = 22; % cepstral sine lifter parameter
fs = 16000;
hamming = @(N)(0.54-0.46cos(2pi*[0:N-1].'/(N-1)));

[filename, pathname] = uigetfile({‘.’;‘.flac’; '.wav’; '.mp3’; }, ‘选择语音’);
% %没有图像
if filename == 0
return;
end
[speech,fs] = audioread([pathname, filename]);
[voice,fs]=extractvoice_simple(speech,-30, -20,0.2);
voicex=voice(1:N
16000);
[ mfccs, FBEs, frames ] = …
mfcc( voicex, fs, Tw, Ts, alpha, hamming, R, M, C, L );
ceps_mfccx=mfccs(😃;
[cep,ER]=lpces(voicex,17,256,256); ceps_lpc=cep(2:17,:);%LPC

        %[lpc,ER]=lpces(voice,12,256,256);
        %ceps_lpcc=lpc2lpcc(cep);%LPCC
        ceps_lpcx=ceps_lpc(:);
        ceps=[ceps_mfccx(1000:2000);ceps_lpcx(1:2000)];
        TestData = ceps';

languagex=svmclassify(SVMStruct,TestData);
if languagex == 1
language=‘Chinese’
else
language=‘English’
end
% t=[1:2000];
% figure
% scatter(t,ceps_lpcx(1:2000),50,‘r’);
% xlabel(‘sample point’);
% ylabel(‘LPC’);
% title(‘LPC features’);
% hold on
% [filename, pathname] = uigetfile({‘.’;‘.flac’; '.wav’; '.mp3’; }, ‘选择语音’);
% % %没有图像
% if filename == 0
% return;
% end
% [speech,fs] = audioread([pathname, filename]);
% [voice,fs]=extractvoice_simple(speech,-30, -20,0.2);
% voicex=voice(1:N
16000);
% [ mfccs, FBEs, frames ] = …
% mfcc( voicex, fs, Tw, Ts, alpha, hamming, R, M, C, L );
% ceps_mfccx=mfccs(😃;
% [cep,ER]=lpces(voicex,17,256,256); ceps_lpc=cep(2:17,:);%LPC
%
function [ H, f, c ] = trifbank( M, K, R, fs, h2w, w2h )
% TRIFBANK Triangular filterbank.
%
% [H,F,C]=TRIFBANK(M,K,R,FS,H2W,W2H) returns matrix of M triangular filters
% (one per row), each K coefficients long along with a K coefficient long
% frequency vector F and M+2 coefficient long cutoff frequency vector C.
% The triangular filters are between limits given in R (Hz) and are
% uniformly spaced on a warped scale defined by forward (H2W) and backward
% (W2H) warping functions.
%
% Inputs
% M is the number of filters, i.e., number of rows of H
%
% K is the length of frequency response of each filter
% i.e., number of columns of H
%
% R is a two element vector that specifies frequency limits (Hz),
% i.e., R = [ low_frequency high_frequency ];
%
% FS is the sampling frequency (Hz)
%
% H2W is a Hertz scale to warped scale function handle
%
% W2H is a wared scale to Hertz scale function handle
%
% Outputs
% H is a M by K triangular filterbank matrix (one filter per row)
%
% F is a frequency vector (Hz) of 1xK dimension
%
% C is a vector of filter cutoff frequencies (Hz),
% note that C(2:end) also represents filter center frequencies,
% and the dimension of C is 1x(M+2)
%
% Example
% fs = 16000; % sampling frequency (Hz)
% nfft = 2^12; % fft size (number of frequency bins)
% K = nfft/2+1; % length of each filter
% M = 23; % number of filters
%
% hz2mel = @(hz)(1127log(1+hz/700)); % Hertz to mel warping function
% mel2hz = @(mel)(700
exp(mel/1127)-700); % mel to Hertz warping function
%
% % Design mel filterbank of M filters each K coefficients long,
% % filters are uniformly spaced on the mel scale between 0 and Fs/2 Hz
% [ H1, freq ] = trifbank( M, K, [0 fs/2], fs, hz2mel, mel2hz );
%
% % Design mel filterbank of M filters each K coefficients long,
% % filters are uniformly spaced on the mel scale between 300 and 3750 Hz
% [ H2, freq ] = trifbank( M, K, [300 3750], fs, hz2mel, mel2hz );
%
% % Design mel filterbank of 18 filters each K coefficients long,
% % filters are uniformly spaced on the Hertz scale between 4 and 6 kHz
% [ H3, freq ] = trifbank( 18, K, [4 6]*1E3, fs, @(h)(h), @(h)(h) );
%
% hfig = figure(‘Position’, [25 100 800 600], ‘PaperPositionMode’, …
% ‘auto’, ‘Visible’, ‘on’, ‘color’, ‘w’); hold on;
% subplot( 3,1,1 );
% plot( freq, H1 );
% xlabel( ‘Frequency (Hz)’ ); ylabel( ‘Weight’ ); set( gca, ‘box’, ‘off’ );
%
% subplot( 3,1,2 );
% plot( freq, H2 );
% xlabel( ‘Frequency (Hz)’ ); ylabel( ‘Weight’ ); set( gca, ‘box’, ‘off’ );
%
% subplot( 3,1,3 );
% plot( freq, H3 );
% xlabel( ‘Frequency (Hz)’ ); ylabel( ‘Weight’ ); set( gca, ‘box’, ‘off’ );
%
% Reference
% [1] Huang, X., Acero, A., Hon, H., 2001. Spoken Language Processing:
% A guide to theory, algorithm, and system development.
% Prentice Hall, Upper Saddle River, NJ, USA (pp. 314-315).

% Author Kamil Wojcicki, UTD, June 2011

if( nargin~= 6 ), help trifbank; return; end; % very lite input validation

f_min = 0;          % filter coefficients start at this frequency (Hz)
f_low = R(1);       % lower cutoff frequency (Hz) for the filterbank 
f_high = R(2);      % upper cutoff frequency (Hz) for the filterbank 
f_max = 0.5*fs;     % filter coefficients end at this frequency (Hz)
f = linspace( f_min, f_max, K ); % frequency range (Hz), size 1xK
fw = h2w( f );

% filter cutoff frequencies (Hz) for all filters, size 1x(M+2)
c = w2h( h2w(f_low)+[0:M+1]*((h2w(f_high)-h2w(f_low))/(M+1)) );
cw = h2w( c );

H = zeros( M, K );                  % zero otherwise
for m = 1:M 

    % implements Eq. (6.140) on page 314 of [1] 
    % k = f>=c(m)&f<=c(m+1); % up-slope
    % H(m,k) = 2*(f(k)-c(m)) / ((c(m+2)-c(m))*(c(m+1)-c(m)));
    % k = f>=c(m+1)&f<=c(m+2); % down-slope
    % H(m,k) = 2*(c(m+2)-f(k)) / ((c(m+2)-c(m))*(c(m+2)-c(m+1)));

    % implements Eq. (6.141) on page 315 of [1]
    k = f>=c(m)&f<=c(m+1); % up-slope
    H(m,k) = (f(k)-c(m))/(c(m+1)-c(m));
    k = f>=c(m+1)&f<=c(m+2); % down-slope
    H(m,k) = (c(m+2)-f(k))/(c(m+2)-c(m+1));

end

⛄三、运行结果

在这里插入图片描述
在这里插入图片描述

⛄四、matlab版本及参考文献

1 matlab版本
2014a

2 参考文献
[1]韩纪庆,张磊,郑铁然.语音信号处理(第3版)[M].清华大学出版社,2019.
[2]柳若边.深度学习:语音识别技术实践[M].清华大学出版社,2019.

3 备注
简介此部分摘自互联网,仅供参考,若侵权,联系删除

🍅 仿真咨询
1 各类智能优化算法改进及应用

生产调度、经济调度、装配线调度、充电优化、车间调度、发车优化、水库调度、三维装箱、物流选址、货位优化、公交排班优化、充电桩布局优化、车间布局优化、集装箱船配载优化、水泵组合优化、解医疗资源分配优化、设施布局优化、可视域基站和无人机选址优化

2 机器学习和深度学习方面
卷积神经网络(CNN)、LSTM、支持向量机(SVM)、最小二乘支持向量机(LSSVM)、极限学习机(ELM)、核极限学习机(KELM)、BP、RBF、宽度学习、DBN、RF、RBF、DELM、XGBOOST、TCN实现风电预测、光伏预测、电池寿命预测、辐射源识别、交通流预测、负荷预测、股价预测、PM2.5浓度预测、电池健康状态预测、水体光学参数反演、NLOS信号识别、地铁停车精准预测、变压器故障诊断

3 图像处理方面
图像识别、图像分割、图像检测、图像隐藏、图像配准、图像拼接、图像融合、图像增强、图像压缩感知

4 路径规划方面
旅行商问题(TSP)、车辆路径问题(VRP、MVRP、CVRP、VRPTW等)、无人机三维路径规划、无人机协同、无人机编队、机器人路径规划、栅格地图路径规划、多式联运运输问题、车辆协同无人机路径规划、天线线性阵列分布优化、车间布局优化

5 无人机应用方面
无人机路径规划、无人机控制、无人机编队、无人机协同、无人机任务分配

6 无线传感器定位及布局方面
传感器部署优化、通信协议优化、路由优化、目标定位优化、Dv-Hop定位优化、Leach协议优化、WSN覆盖优化、组播优化、RSSI定位优化

7 信号处理方面
信号识别、信号加密、信号去噪、信号增强、雷达信号处理、信号水印嵌入提取、肌电信号、脑电信号、信号配时优化

8 电力系统方面
微电网优化、无功优化、配电网重构、储能配置

9 元胞自动机方面
交通流 人群疏散 病毒扩散 晶体生长

10 雷达方面
卡尔曼滤波跟踪、航迹关联、航迹融合

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值