TD-PSOLA算法

1. 图解说明

  • 对原始信号进行分析,拿到基音周期
  • 对基音周期进行修改(重叠想加,放缩)
  • 对修改基音周期后的参数进行语音重建
    在这里插入图片描述
2. 参考代码实现

//代码对于rate<1的实现有问题

"""
Author: Sanna Wager
Created on: 9/18/19
This script provides an implementation of pitch shifting using the "time-domain pitch synchronous
overlap and add (TD-PSOLA)" algorithm. The original PSOLA algorithm was introduced in [1].
Description
The main script td_psola.py takes raw audio as input and applies steps similar to those described in [2].
First, it locates the time-domain peaks using auto-correlation. It then shifts windows centered at the
peaks closer or further apart in time to change the periodicity of the signal, which shifts the pitch
without affecting the formant. It applies linear cross-fading as introduced in [3] and implemented in
[4], the algorithm used for Audacity's simple pitch shifter.
Notes:
- Some parameters in the program related to frequency are hardcoded for singing voice. They can be
    adjusted for other usages.
- The program is designed to process sounds whose pitch does not vary too much, as this could result
    in glitches in peak detection (e.g., octave errors). Processing audio in short segment (e.g.,
    notes or words) is recommended. Another option would be to use a more robust peak detection
    algorithm, for example, pYIN [5]
- Small pitch shifts (e.g., up to 700 cents) should not produce many artifacts. Sound quality degrades
    if the shift is too large.
- The signal is expected to be voiced. Unexpected results may occur in the case of unvoiced signals
References:
Overlap and add algorithm exercise from UIUC
[1] F. Charpentier and M. Stella. "Diphone synthesis using an overlap-add technique for speech waveforms
    concatenation." In Int. Conf. Acoustics, Speech, and Signal Processing (ICASSP). Vol. 11. IEEE, 1986.
[2] https://courses.engr.illinois.edu/ece420/lab5/lab/#overlap-add-algorithm
[3] https://www.surina.net/article/time-and-pitch-scaling.html
[4] https://gitlab.com/soundtouch
[5] https://code.soundsoftware.ac.uk/projects/pyin
"""

import numpy as np
from numpy.fft import fft, ifft
import matplotlib.pyplot as plt
import librosa


def shift_pitch(signal, fs, f_ratio):
    """
    Calls psola pitch shifting algorithm
    :param signal: original signal in the time-domain
    :param fs: sample rate
    :param f_ratio: ratio by which the frequency will be shifted
    :return: pitch-shifted signal
    """
    peaks = find_peaks(signal, fs)
    new_signal = psola(signal, peaks, f_ratio)
    return new_signal


def find_peaks(signal, fs, max_hz=950, min_hz=75, analysis_win_ms=40, max_change=1.005, min_change=0.995):
    """
    Find sample indices of peaks in time-domain signal
    :param max_hz: maximum measured fundamental frequency
    :param min_hz: minimum measured fundamental frequency
    :param analysis_win_ms: window size used for autocorrelation analysis
    :param max_change: restrict periodicity to not increase by more than this ratio from the mean
    :param min_change: restrict periodicity to not decrease by more than this ratio from the mean
    :return: peak indices
    """
    N = len(signal)
    min_period = fs // max_hz
    max_period = fs // min_hz

    # compute pitch periodicity
    sequence = int(analysis_win_ms / 1000 * fs)  # analysis sequence length in samples
    periods = compute_periods_per_sequence(signal, sequence, min_period, max_period)

    # simple hack to avoid octave error: assume that the pitch should not vary much, restrict range
    mean_period = np.mean(periods)
    max_period = int(mean_period * 1.1)
    min_period = int(mean_period * 0.9)
    periods = compute_periods_per_sequence(signal, sequence, min_period, max_period)

    # find the peaks
    peaks = [np.argmax(signal[:int(periods[0]*1.1)])]
    while True:
        prev = peaks[-1]
        idx = prev // sequence  # current autocorrelation analysis window
        if prev + int(periods[idx] * max_change) >= N:
            break
        # find maximum near expected location
        peaks.append(prev + int(periods[idx] * min_change) +
                np.argmax(signal[prev + int(periods[idx] * min_change): prev + int(periods[idx] * max_change)]))
    return np.array(peaks)


def compute_periods_per_sequence(signal, sequence, min_period, max_period):
    """
    Computes periodicity of a time-domain signal using autocorrelation
    :param sequence: analysis window length in samples. Computes one periodicity value per window
    :param min_period: smallest allowed periodicity
    :param max_period: largest allowed periodicity
    :return: list of measured periods in windows across the signal
    """
    offset = 0  # current sample offset
    periods = []  # period length of each analysis sequence

    while offset < N:
        fourier = fft(signal[offset: offset + sequence])
        fourier[0] = 0  # remove DC component
        autoc = ifft(fourier * np.conj(fourier)).real
        autoc_peak = min_period + np.argmax(autoc[min_period: max_period])
        periods.append(autoc_peak)
        offset += sequence
    return periods


def psola(signal, peaks, f_ratio):
    """
    Time-Domain Pitch Synchronous Overlap and Add
    :param signal: original time-domain signal
    :param peaks: time-domain signal peak indices
    :param f_ratio: pitch shift ratio
    :return: pitch-shifted signal
    """
    N = len(signal)
    # Interpolate
    new_signal = np.zeros(N)
    new_peaks_ref = np.linspace(0, len(peaks) - 1, len(peaks) * f_ratio)
    new_peaks = np.zeros(len(new_peaks_ref)).astype(int)

    for i in range(len(new_peaks)):
        weight = new_peaks_ref[i] % 1
        left = np.floor(new_peaks_ref[i]).astype(int)
        right = np.ceil(new_peaks_ref[i]).astype(int)
        new_peaks[i] = int(peaks[left] * (1 - weight) + peaks[right] * weight)

    # PSOLA
    for j in range(len(new_peaks)):
        # find the corresponding old peak index
        i = np.argmin(np.abs(peaks - new_peaks[j]))
        # get the distances to adjacent peaks
        P1 = [new_peaks[j] if j == 0 else new_peaks[j] - new_peaks[j-1],
              N - 1 - new_peaks[j] if j == len(new_peaks) - 1 else new_peaks[j+1] - new_peaks[j]]
        # edge case truncation
        if peaks[i] - P1[0] < 0:
            P1[0] = peaks[i]
        if peaks[i] + P1[1] > N - 1:
            P1[1] = N - 1 - peaks[i]
        # linear OLA window
        window = list(np.linspace(0, 1, P1[0] + 1)[1:]) + list(np.linspace(1, 0, P1[1] + 1)[1:])
        # center window from original signal at the new peak
        new_signal[new_peaks[j] - P1[0]: new_peaks[j] + P1[1]] += window * signal[peaks[i] - P1[0]: peaks[i] + P1[1]]
    return new_signal


if __name__=="__main__":
    # Load audio
    orig_signal, fs = librosa.load("female_scale.wav", sr=44100)
    N = len(orig_signal)

    # Pitch shift amount as a ratio
    f_ratio = 2 ** (-2 / 12)

    # Shift pitch
    new_signal = shift_pitch(orig_signal, fs, f_ratio)

    # Plot
    plt.style.use('ggplot')
    plt.plot(orig_signal[:-1])
    plt.show()
    plt.plot(new_signal[:-1])
    plt.show()

    # Write to disk
    librosa.output.write_wav("female_scale_transposed_{:01.2f}.wav".format(f_ratio), new_signal, fs)

  • 1
    点赞
  • 9
    收藏
    觉得还不错? 一键收藏
  • 1
    评论
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值