相位声码器

1. 原理

短时傅立叶变换中的每一个小块 X ( m , k ) X(m,k) X(m,k) m m m代表帧索引, k k k代表频带索引),都表征了第 m m m 帧中,一个模长为 ∣ X ( m , k ) ∣ |X(m,k)| X(m,k), 相位为 φ ( m , k ) φ(m,k) φ(m,k) ,数字频率为 F p r e d = k N ∗ F s Fpred=kN∗Fs Fpred=kNFs 的复指数序列成分。然而受到短时傅立叶变换中频谱分辨率的影响, s t f t stft stft 无法分析出原始信号中数字角频率位于频带范围中的频率成分。举个例子,有一个520hz,16k采样率的正弦信号。使用帧长为64(频率分辨率250hz),帧偏移32个采样点的 s t f t stft stft来分析,频段为 [ 0   250 ] [ 250   500 ] [ 500   750 ] . . . . . . . [0~250][250~500][500~750]....... [0 250][250 500][500 750].......;那么520hz的频率分量体现在语谱图上,仅仅在[500~750]这个频段区间。
图1. 520hz正弦信号语谱图

图1. 520hz正弦信号语谱图
从图1中可以看出,受分析帧长等参数影响, stft 仅仅能给出一个粗略的频率[500~750]hz区间,而不能给出信号音频成分的精确值。除了模长信息, stft 还提供了每一帧中各个频率成分的初始相位信息。我们可以利用 stft 的帧偏移、相位信息,来对stft给出的大致频率进行优化,从而得到信号中精确的瞬时分辨率;这种方法称为相位声码器。

在这里插入图片描述

图2. 相位声码器的大致原理

图2给出stft变换中的一个示意图来理解相位声码器的大致原理。X(m,k)是第m帧,第k个频带对应的正弦分量;频带对应频率的计算方式:

F p r e d ( k ) = k F s N F_{pred}(k)=\frac{kF_s}{N} Fpred(k)=NkFs
其中, k k k代表频带索引, F s F_s Fs代表音频的采样频率, N N N代表 s t f t stft stft的分析帧长。对应的预估角频率 ω p r e d \omega_{pred} ωpred F p r e d ∗ 2 π F_{pred}*2\pi Fpred2π;体现在图2中,即就是两帧中红色和绿色对应的角频率,这个频率是不精确有误差的,我们需要对它进行精确化处理。 X ( m + 1 , k ) X(m+1,k) X(m+1,k)对应的是第 m + 1 m+1 m+1帧,第 k k k个频带对应的正弦分量。两帧起始点的采样点间隔即就是stft中的帧偏移 H a H_a Ha,该间隔对应的时间间隔即就是 H a / F s ( s ) H_a / F_s(s) Ha/Fs(s)。通过stft,还可以得到m和m+1帧中,k频段正弦分量对应的初始相位,分别表示为 φ 1 = φ ( m , k ) \varphi_1=\varphi(m,k) φ1=φ(m,k) φ 2 = φ ( m + 1 , k ) \varphi_2=\varphi(m+1,k) φ2=φ(m+1,k);通过预估的角频率 ω \omega ω以及m帧的初始相位 φ 1 \varphi_1 φ1,我们可以估计出m+1帧的起始时刻( t 2 t_2 t2)的相位,即就是图2中蓝色椭圆内红色线在 t 2 t_2 t2时刻的相位。可以发现如果预估的角频率和真实的角频率一致的话,t2时刻预估的相位 φ p r e d \varphi_{pred} φpred应该和m+1帧的初始相位 φ 2 \varphi_2 φ2保持一致(差值是 2 × π 2×\pi 2×π的正整数倍),即就是蓝色椭圆内,红色点和绿色点应该重合。定义t2时刻的预测相位和真实相位之间的差值为 φ e r r = φ 2 − φ p r e d \varphi_{err}=\varphi_2 - \varphi_{pred} φerr=φ2φpred,该差值由误差角频率(真实角频率-预估角频率)在帧偏移这段时间的转动相位。
φ e r r = ( ω r e a l − ω p r e d ) ∗ Δ t \varphi_{err} = (\omega_{real} - \omega_{pred}) * \Delta_t φerr=(ωrealωpred)Δt
因此,真实角频率即就是预估角频率加上误差角频率
ω r e a l = ω p r e d + φ e r r Δ t \omega_{real} = \omega_{pred} + \frac{\varphi_{err}}{\Delta_t} ωreal=ωpred+Δtφerr

2. 仿真

2.1.参数

参数
帧长64
帧偏移32
真实频率(hz)520
分析音频文件
kmd_inout_imag ### 2.2.代码 代码对原始信号进行短时傅立叶变换处理,对每两帧间隔进行真实频率估计,并对所有帧得到的预估频率求均值;
# -*- coding: utf-8 -*-
# @Time    : 2021-05-13
# @Author  : Zhangjieyang
# @Email   : jieyangzhang@outlook.com
# @File    : phase_vocoder_demo.py
# @Software: vscode

import numpy as np
import matplotlib.pyplot as plt
from nara_wpe.utils import stft
import soundfile as sf

def main():
    N = 64 # 分帧长度
    shift = 32 # 偏移
    stft_options = dict(size=N, shift=shift)
    audio_name = '520hz_16k.wav'
    y, freq = sf.read(audio_name)
    Y = stft(y, **stft_options, window=np.ones)
    magnitude = np.abs(Y) # 模长
    phase = np.arctan2(Y.imag, Y.real) # 相位
    k = 16
    predict_freq_list = []
    for k in range(Y.shape[0]):
        base = np.argmax(magnitude[k])
        base_freq = base / N * freq
        previous_phase = phase[k - 1][base]
        current_phase = phase[k][base]
        pred_phase = previous_phase + shift / freq * (2 * np.pi * base_freq)
        err_phase = (current_phase - pred_phase) % (2 * np.pi)
        # if err_phase >  np.pi:
        #     err_phase = 2 * np.pi - err_phase
        estimate_freq = base_freq + err_phase / (shift / freq) / (2 * np.pi)
        predict_freq_list.append(estimate_freq)
        print('frame:%d -- base freq:%d predict phase:%f, current phase:%f, previous phase:%f, err phase:%f, predict freq:%f'
            % (k, base_freq, pred_phase, current_phase, previous_phase, err_phase, estimate_freq))

    # 计算预估频率均值
    average_predict_freq = np.average(predict_freq_list)
    print(average_predict_freq)
    # 绘制语谱图
    half = N / 2 + 1
    ylocs = np.linspace(0, half, int(half))
    freq_solution = freq / 2 / half
    plt.yticks(ylocs, ["%.02f" % l for l in (ylocs * freq_solution)])
    plt.ylabel('freq (hz)')
    plt.xlabel("frame")
    plt.imshow(np.transpose(np.abs(Y)), origin="lower", cmap="jet", aspect="auto", interpolation="none")
    plt.title(audio_name)
    plt.rcParams['ytick.direction'] = 'in'
    plt.show()


if __name__ == "__main__":
    main()

2.3.结果总结

截取一段帧区域的预测结果,有:

frame:29987 -- base freq:500 predict phase:7.456728, current phase:1.432367, previous phase:1.173543, err phase:0.258823, predict freq:520.596510
frame:29988 -- base freq:500 predict phase:7.715552, current phase:1.693486, previous phase:1.432367, err phase:0.261119, predict freq:520.779217
frame:29989 -- base freq:500 predict phase:7.976671, current phase:1.954352, previous phase:1.693486, err phase:0.260866, predict freq:520.759074
frame:29990 -- base freq:500 predict phase:8.237537, current phase:2.212500, previous phase:1.954352, err phase:0.258147, predict freq:520.542723
frame:29991 -- base freq:500 predict phase:8.495685, current phase:2.466256, previous phase:2.212500, err phase:0.253757, predict freq:520.193311
frame:29992 -- base freq:500 predict phase:8.749441, current phase:2.715126, previous phase:2.466256, err phase:0.248870, predict freq:519.804421
frame:29993 -- base freq:500 predict phase:8.998311, current phase:2.959832, previous phase:2.715126, err phase:0.244706, predict freq:519.473091
frame:29994 -- base freq:500 predict phase:9.243017, current phase:-3.081169, previous phase:2.959832, err phase:0.242184, predict freq:519.272386
frame:29995 -- base freq:500 predict phase:3.202016, current phase:-2.839341, previous phase:-3.081169, err phase:0.241828, predict freq:519.244056
frame:29996 -- base freq:500 predict phase:3.443844, current phase:-2.595632, previous phase:-2.839341, err phase:0.243710, predict freq:519.393789
frame:29997 -- base freq:500 predict phase:3.687553, current phase:-2.348190, previous phase:-2.595632, err phase:0.247442, predict freq:519.690841
frame:29998 -- base freq:500 predict phase:3.934996, current phase:-2.095976, previous phase:-2.348190, err phase:0.252213, predict freq:520.070504
frame:29999 -- base freq:500 predict phase:4.187209, current phase:-1.839067, previous phase:-2.095976, err phase:0.256909, predict freq:520.444147

预估的均值为:519.9993333555548hz和原始信号基本一致。

3.参考文献

A Review of Time-Scale Modification of Music Signals—5.3

评论 6
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值