短时傅里叶变换的原理与应用：电话拨号声分析（3）

最新推荐文章于 2023-04-25 17:09:50 发布

qcyfred

最新推荐文章于 2023-04-25 17:09:50 发布

阅读量2.9k

点赞数 2

分类专栏：音乐音频文章标签： matlab 短时傅里叶变换特征提取短时能量

本文链接：https://blog.csdn.net/qcyfred/article/details/53063594

版权

音乐音频专栏收录该内容

22 篇文章 42 订阅

订阅专栏

上一篇关于电话号码分析的文章，提供了一种实现思路，还放了一些PPT上来。

Review：电话拨号声分析之大体思路

剩下的就是实际动手实现了。

我把动手实践的内容拆成3个部分来写。特征提取占2/3，模式匹配占1/3。

今天先写第1个部分。也就是特征提取里的第一步。

大致思路是这样的。

1. 读入音频x，长这样。

2. 对x整体做傅里叶变换肯定是不行了。只能看见x中所有的频率成分的含量，不能看见什么时候出现的。

就会像这样。

3. 于是时频分析，语谱图（Spectrogram）。可以调用matlab自带的spectrogram函数。参数可自行百度。

但为了更深刻地理解STFT，这里自己写代码来计算语谱图，然后把结果存到mySpeechSpectrum中。

mySpeechSpectrum的第m列是第m帧，第n行是FFT以后频率的第n个序号。

如何把横坐标由帧换成时间（second）？又如何把纵坐标的频率序号换成真实的物理频率（Hz）？

这就涉及到一个坐标对应的关系。

前面有信号分帧的文章，讲了帧与时间的对应关系。

至于频率序号与真实的物理频率的关系，留作一个思考，以后揭晓答案。

今天就先写这么多。特征提取的后一部分，请见后文。谢谢观赏！

附上这一部分的代码。

clear;
close all;
clc

filename = '我的手机号录音.wav';
[x,fs] = audioread(filename);
len = length(x);
T = (len-1)/fs;
t = linspace(0,T,len)';
% sound(x,fs);
f = -fs/2 : 1/T : fs/2;

X = fft(x);
X_shift = fftshift(X);

figure(1);
subplot(311);
plot(t,x./max(abs(x)));
axis([0,7.5,-1.1,1.1]);
grid on;
xlabel('时间(s)');
ylabel('归一化幅度(a.u.)');
title('手机号录音');

subplot(312);
plot(f,abs(X_shift)./max(abs(X_shift)));
grid on;
title('频谱密度');
xlabel('频率(Hz)');
ylabel('归一化幅度(a.u.)');
axis([0,2000,-0.1,1.1]);

figure(2);
spectrogram(x);
title('语谱图');

%% STFT计算语谱图

% 每一帧:0.125秒，步进:0.125秒
win = 0.2; sStep = 0.1;
signal = x;
% convert window length and step from seconds to samples:
windowLength = round(win * fs);
step = round(sStep * fs);

% 当前游标的位置
curPos = 1;
L = length(signal);
T_length = L/fs;
% compute the total number of frames:
numOfFrames = floor((L-windowLength)/step) + 1;
% number of features to be computed:
Ham = window(@hamming, windowLength);

%% 逐帧处理
mySpeechSpectrum = [];
freq_show = [];
for k_frame =1:numOfFrames % for each frame
    % get current frame:
    frame  = signal(curPos:curPos+windowLength-1);
    frame_windowed  = frame .* Ham;
    [frameFFT,Freq] = getDFT(frame_windowed, fs);
    freq_show = Freq.';
    
    mySpeechSpectrum(:,k_frame) = frameFFT;

    curPos = curPos + step;
    frameFFTPrev = frameFFT;
end

mySpeechSpectrum = mySpeechSpectrum ./ max(max(mySpeechSpectrum));

% 最原始的STFT图
figure(3);
imagesc(mySpeechSpectrum);
xlabel('时间(帧)');
ylabel('频率(Hz)');
colorbar

% 对应坐标的STFT图
x_axis = 0 : sStep : T_length-sStep;
y_axis = freq_show;
figure(4);
imagesc(x_axis,y_axis,mySpeechSpectrum);
xlabel('时间（s）');
ylabel('频率（Hz）');
set(gca,'ylim',[y_axis(1),2e3]);
colorbar

其中，getDFT是获取信号的DFT幅度。

function [FFT, Freq] = getDFT(signal, Fs, PLOT)

%
% function [FFT, Freq] = getDFT(signal, Fs, PLOT)
%
% This function returns the DFT of a discrete signal and the 
% respective frequency range.
% 
% ARGUMENTS:
% - signal: vector containing the samples of the signal
% - Fs:     the sampling frequency
% - PLOT:   use this argument if the FFT (and the respective 
%           frequency values) need to be returned in the 
%           [-fs/2..fs/2] range. Otherwise, only half of 
%           the spectrum is returned.
%
% RETURNS:
% - FFT:    the magnitude of the DFT coefficients
% - Freq:   the corresponding frequencies (in Hz)
%

N = length(signal);  % length of signal
% compute the magnitude of the spectrum
% (and normalize by the number of samples):
FFT = abs(fft(signal)) / N;

if nargin==2 % return the first half of the spectrum:
    FFT = FFT(1:ceil(N/2));    
    Freq = (Fs/2) * (1:ceil(N/2)) / ceil(N/2);  % define the frequency axis
else
    if (nargin==3) 
        % ... or return the whole spectrum 
        %     (in the range -fs/2 to fs/2)
        FFT = fftshift(FFT);        
        if mod(N,2)==0                      % define the frequency axis:
            Freq = -N/2:N/2-1;              % if N is even
        else
            Freq = -(N-1)/2:(N-1)/2;        % if N is odd
        end
        Freq = (Fs/2) * Freq ./ ceil(N/2);
    end
end