Matlab2013a学习之男女的声音识别

最新推荐文章于 2024-04-27 12:39:59 发布

无落

最新推荐文章于 2024-04-27 12:39:59 发布

阅读量1.3w

点赞数 16

分类专栏： wuluo Matlab2013a 文章标签：基音频率识别 matlab2013

博客仅供学习使用，其它用途，请联系博主，转载请标明链接，

本文链接：https://blog.csdn.net/qq_43433255/article/details/89342923

版权

wuluo 同时被 2 个专栏收录

96 篇文章 13 订阅

订阅专栏

Matlab2013a

9 篇文章 0 订阅

订阅专栏

人能够很容易的听出说话人的性别，我们能不能让机器也像人一样，听声辨别性别？这个答案是肯定的，特别是随着人工智能算法的发展，识别性能是不断的提升。识别男女声，也变的相对容易了。

人类基音的范围约为70～350Hz左右，由于生理结构的不同，男性与女性的声音呈现出不同的听觉特征，男声的基音频率大都在100—200HZ之间，而女声则在200—350HZ之间；在会话中，同一发音者的基音频率变化的统计结果，如图一所示。女声与男声相比，前者的平均值、标准差都为后者的两倍左右。不同发音者的基音频率分布如图二所示，在对数频率轴上男声，女声分别呈现正态分布，男声的基音频率的平均值和标准差分别为125HZ及其20HZ。女声约为男声的2倍。鉴于男女声存在基音频率的明显差异，基音频率可作为男女声识别的依据。

代码分为几个部分，不同的部分实现不同的功能；
通过录入一段音频；代码名称：luru.m

fs=16000;
fprintf('testing...\n');

y=audiorecorder(fs, 16, 1);  % 16000Hz  16bit  单声道
recordblocking(y,5);%录制5秒

rbd=get(con_rbd,'value') ;
if (rbd)
    delete('test_record/*.wav');
    m=1;%从头开始
end

name=strcat('test_record\',...
           num2str(m),'.wav');
y1 = getaudiodata(y,'int16');
audiowrite(name,y1,fs);  %生成音频文件 1.wav
cut(name);
result = PitchDetect(name);
disp(result);
m=m+1;
set(con_text,'string',result);

剪掉静音时间段，代码名称：jiandiao.m

function y1=cut(s_address)

y=audioread(s_address);
h=hamming(320);

% 计算短时平均能量SAE（short average energe）
%信号的平方在与窗函数相卷
% E(n)＝[x(m)]^2*h(n-m),m从负无穷到正无穷求和，h(n-m)为汉明窗
e=conv(y.*y,h);    % y.*2对y中各元素平方；conv(u,v) 求u与v的卷积

% 对语音信号进行切割，当SAE小于能量大值的1/100时，认为是起点或终点

mx=max(e);
n=length(e);
y(n)=0; % 将原始语音信号矩阵扩充至n维
for i=1:n
    if e(i)<mx*0.01
        e(i)=0;
   else e(i)=1;    % e中非0的数用1来代替
   end
end
y1=y.*e;
y1(find(y1==0))=[]; % 把0元素剔除
fs=16000;
audiowrite(s_address,y1,fs);

男女声基因频率识别，代码名称：shibie.m

function   pd=PitchDetect(s_address)


waveFile = s_address;
% fs = 16000
% y = cut(s_address);
[y, fs] = audioread(waveFile);
time=(1:length(y))/fs;
frameSize=floor(40*fs/1000);     %帧长40ms 一共640个点   floor不大于x的最大整数
startIndex=round(7000);         %起始序号
endIndex=startIndex+frameSize-1; %结束序号
frame = y(startIndex:endIndex);  %取出该帧
frameSize=length(frame);
frame2=frame.*hamming(length(frame));  % 加hamming窗
rwy = rceps(frame2);                   % 求倒谱
ylen=length(rwy);
cepstrum=rwy(1:ylen/2); %基音检测
LF=floor(fs/500);     %设置基音搜索的范围  点数
HF=floor(fs/70);      %设置基音搜索的范围  点数
cn=cepstrum(LF:HF);   %求倒谱
[mx_cep ind]=max(cn); %设置门限，找到峰值位置
if mx_cep > 0.08 & ind >LF  
    a= fs/(LF+ind);
else
    a=0;
end
figure(2);
plot(time, y); title(waveFile); axis tight
ylim=get(gca, 'ylim');
line([time(startIndex), time(startIndex)], ylim, 'color', 'r');
line([time(endIndex), time(endIndex)], ylim, 'color', 'r');
title('语音波形');
figure(3);
subplot(2,1,1);
plot(frame);
title('取出帧的波形');
subplot(2,1,2);
plot(cepstrum);
title('倒谱图');

[x,sr]=audioread(s_address);
meen=mean(x);
x= x - meen;
updRate=floor(20*sr/1000);          %每20ms更新
fRate=floor(40*sr/1000);            %40ms一帧
n_samples=length(x);
nFrames=floor(n_samples/updRate)-1; %帧数
k=1;
pitch=zeros(1,nFrames);
f0=zeros(1,nFrames);
LF=floor(sr/500);
HF=floor(sr/70);
m=1;
avgF0=0;
for t=1:nFrames
       yin=x(k:k+fRate-1);
       cn1=rceps(yin);
       cn=cn1(LF:HF);
       [mx_cep ind]=max(cn);
       if mx_cep > 0.08 & ind >LF
              a= sr/(LF+ind);
       else
              a=0;
       end
       f0(t)=a;
       if t>2 & nFrames>3   %中值滤波对基音轨迹图进行平滑
              z=f0(t-2:t);
              md=median(z);
              pitch(t-2)=md;
              if md > 0
                     avgF0=avgF0+md;
                     m=m+1;
              end
       else
              if nFrames<=3
              pitch(t)=a;
              avgF0=avgF0+a;
              m=m+1;
       end
     end
   k=k+updRate;
end
figure(4)
subplot(211);
plot((1:length(x))/sr, x);
ylabel('幅度');
xlabel('时间');
subplot(212);
xt=1:nFrames;
xt=20*xt;
plot(xt,pitch)
xlim([0,3]);
axis([xt(1) xt(nFrames) 0 max(pitch)+50]);
ylabel('基音频率/HZ');
xlabel('时间');

Mypitch = max(pitch)
if Mypitch>220
    pd = ['Woman  ', num2str(Mypitch)];
elseif Mypitch<200
    pd = ['Man  ', num2str(Mypitch)];
else pd = ['Sorry  ', num2str(Mypitch)];
end

一个非常简陋的界面，不得不说MATLAB功能还是比较强大的，代码名称：UI.m

clear;clc;close all;
global n;
n=1;

set(0,'defaultfigurecolor','w');
%归一化图形界面
con_car=figure('position',[400 200 680 380],...
               'numbertitle','off',...
               'name','Man or Woman');
set(con_car,'defaultuicontrolunits','normalized');

rbd=0;
con_rbd=uicontrol('Style','radiobutton',...
               'Position',[0.15  0.62  0.15  0.05],...
               'Value',rbd,...   rbd的值为0或1，选中为1，未选中为0
               'String','重新测试','backgroundcolor',get(gcf,'color'));

% 关闭按钮
con_close=uicontrol('style','pushbutton','position',[0.5 0.6 0.2 0.1],...
            'string','关闭','callback','close');


% 测试按钮
con_test=uicontrol('style','pushbutton','position',[0.3 0.6 0.2 0.1],...
           'string','测试');       % [left bottom width height]

% 显示字符串‘请一直说话’和测试结果
con_text=uicontrol('style','text','position',[0.3 0.1 0.4 0.4],...
         'FontSize',30,'string','请一直说话','backgroundcolor',get(gcf,'color'));

% 调用录音测试程序
set(con_test,'callback','test_record');

最后的运行结果：
在这里插入图片描述

但在测试过程中，有时也会存在误判，这多半与说话的方式的有关，建议说数字0-9，正确率比较高！

无落

关注

16
点赞
踩
173

收藏

觉得还不错? 一键收藏
7
评论
Matlab2013a学习之男女的声音识别

人能够很容易的听出说话人的性别，我们能不能让机器也像人一样，听声辨别性别？这个答案是肯定的，特别是随着人工智能算法的发展，识别性能是不断的提升。识别男女声，也变的相对容易了。人类基音的范围约为70～350Hz左右，由于生理结构的不同，男性与女性的声音呈现出不同的听觉特征，男声的基音频率大都在100—200HZ之间，而女声则在200—350HZ之间；在会话中，同一发音者的基音频率变化的统计结果，如...
复制链接

扫一扫

专栏目录