TD-PSOLA算法

最新推荐文章于 2025-03-04 14:46:05 发布

林林宋

最新推荐文章于 2025-03-04 14:46:05 发布

阅读量2.6k

点赞数 1

分类专栏：信号处理

本文链接：https://blog.csdn.net/qq_40168949/article/details/111666131

版权

信号处理专栏收录该内容

3 篇文章

订阅专栏

博客介绍了对原始信号的处理过程，先分析拿到基音周期，接着对基音周期进行重叠相加、放缩等修改，最后用修改后的参数进行语音重建，还给出参考代码并指出 rate<1 时实现有问题。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

1. 图解说明

对原始信号进行分析，拿到基音周期
对基音周期进行修改（重叠想加，放缩）
对修改基音周期后的参数进行语音重建

2. 参考代码实现

//代码对于rate<1的实现有问题

"""
Author: Sanna Wager
Created on: 9/18/19
This script provides an implementation of pitch shifting using the "time-domain pitch synchronous
overlap and add (TD-PSOLA)" algorithm. The original PSOLA algorithm was introduced in [1].
Description
The main script td_psola.py takes raw audio as input and applies steps similar to those described in [2].
First, it locates the time-domain peaks using auto-correlation. It then shifts windows centered at the
peaks closer or further apart in time to change the periodicity of the signal, which shifts the pitch
without affecting the formant. It applies linear cross-fading as introduced in [3] and implemented in
[4], the algorithm used for Audacity's simple pitch shifter.
Notes:
- Some parameters in the program related to frequency are hardcoded for singing voice. They can be
    adjusted for other usages.
- The program is designed to process sounds whose pitch does not vary too much, as this could result
    in glitches in peak detection (e.g., octave errors). Processing audio in short segment (e.g.,
    notes or words) is recommended. Another option would be to use a more robust peak detection
    algorithm, for example, pYIN [5]
- Small pitch shifts (e.g., up to 700 cents) should not produce many artifacts. Sound quality degrades
    if the shift is too large.
- The signal is expected to be voiced. Unexpected results may occur in the case of unvoiced signals
References:
Overlap and add algorithm exercise from UIUC
[1] F. Charpentier and M. Stella. "Diphone synthesis using an overlap-add technique for speech waveforms
    concatenation." In Int. Conf. Acoustics, Speech, and Signal Processing (ICASSP). Vol. 11. IEEE, 1986.
[2] https://courses.engr.illinois.edu/ece420/lab5/lab/#overlap-add-algorithm
[3] https://www.surina.net/article/time-and-pitch-scaling.html
[4] https://gitlab.com/soundtouch
[5] https://code.soundsoftware.ac.uk/projects/pyin
"""

import numpy as np
from numpy.fft import fft, ifft
import matplotlib.pyplot as plt
import librosa


def shift_pitch(signal, fs, f_ratio):
    """
    Calls psola pitch shifting algorithm
    :param signal: original signal in the time-domain
    :param fs: sample rate
    :param f_ratio: ratio by which the frequency will be shifted
    :return: pitch-shifted signal
    """
    peaks = find_peaks(signal, fs)
    new_signal = psola(signal, peaks, f_ratio)
    return new_signal


def find_peaks(signal, fs, max_hz=950, min_hz=75, analysis_win_ms=40, max_change=1.005, min_change=0.995):
    """
    Find sample indices of peaks in time-domain signal
    :param max_hz: maximum measured fundamental frequency
    :param min_hz: minimum measured fundamental frequency
    :param analysis_win_ms: window size used for autocorrelation analysis
    :param max_change: restrict periodicity to not increase by more than this ratio from the mean
    :param min_change: restrict periodicity to not decrease by more than this ratio from the mean
    :return: peak indices
    """
    N = len(signal)
    min_period = fs // max_hz
    max_period = fs // min_hz

    # compute pitch periodicity
    sequence = int(analysis_win_ms / 1000 * fs)  # analysis sequence length in samples
    periods = compute_periods_per_sequence(signal, sequence, min_period, max_period)

    # simple hack to avoid octave error: assume that the pitch should not vary much, restrict range
    mean_period = np.mean(periods)
    max_period = int(mean_period * 1.1)
    min_period = int(mean_period * 0.9)
    periods = compute_periods_per_sequence(signal, sequence, min_period, max_period)

    # find the peaks
    peaks = [np.argmax(signal[:int(periods[0]*1.1)])]
    while True:
        prev = peaks[-1]
        idx = prev // sequence  # current autocorrelation analysis window
        if prev + int(periods[idx] * max_change) >= N:
            break
        # find maximum near expected location
        peaks.append(prev + int(periods[idx] * min_change) +
                np.argmax(signal[prev + int(periods[idx] * min_change): prev + int(periods[idx] * max_change)]))
    return np.array(peaks)


def compute_periods_per_sequence(signal, sequence, min_period, max_period):
    """
    Computes periodicity of a time-domain signal using autocorrelation
    :param sequence: analysis window length in samples. Computes one periodicity value per window
    :param min_period: smallest allowed periodicity
    :param max_period: largest allowed periodicity
    :return: list of measured periods in windows across the signal
    """
    offset = 0  # current sample offset
    periods = []  # period length of each analysis sequence

    while offset < N:
        fourier = fft(signal[offset: offset + sequence])
        fourier[0] = 0  # remove DC component
        autoc = ifft(fourier * np.conj(fourier)).real
        autoc_peak = min_period + np.argmax(autoc[min_period: max_period])
        periods.append(autoc_peak)
        offset += sequence
    return periods


def psola(signal, peaks, f_ratio):
    """
    Time-Domain Pitch Synchronous Overlap and Add
    :param signal: original time-domain signal
    :param peaks: time-domain signal peak indices
    :param f_ratio: pitch shift ratio
    :return: pitch-shifted signal
    """
    N = len(signal)
    # Interpolate
    new_signal = np.zeros(N)
    new_peaks_ref = np.linspace(0, len(peaks) - 1, len(peaks) * f_ratio)
    new_peaks = np.zeros(len(new_peaks_ref)).astype(int)

    for i in range(len(new_peaks)):
        weight = new_peaks_ref[i] % 1
        left = np.floor(new_peaks_ref[i]).astype(int)
        right = np.ceil(new_peaks_ref[i]).astype(int)
        new_peaks[i] = int(peaks[left] * (1 - weight) + peaks[right] * weight)

    # PSOLA
    for j in range(len(new_peaks)):
        # find the corresponding old peak index
        i = np.argmin(np.abs(peaks - new_peaks[j]))
        # get the distances to adjacent peaks
        P1 = [new_peaks[j] if j == 0 else new_peaks[j] - new_peaks[j-1],
              N - 1 - new_peaks[j] if j == len(new_peaks) - 1 else new_peaks[j+1] - new_peaks[j]]
        # edge case truncation
        if peaks[i] - P1[0] < 0:
            P1[0] = peaks[i]
        if peaks[i] + P1[1] > N - 1:
            P1[1] = N - 1 - peaks[i]
        # linear OLA window
        window = list(np.linspace(0, 1, P1[0] + 1)[1:]) + list(np.linspace(1, 0, P1[1] + 1)[1:])
        # center window from original signal at the new peak
        new_signal[new_peaks[j] - P1[0]: new_peaks[j] + P1[1]] += window * signal[peaks[i] - P1[0]: peaks[i] + P1[1]]
    return new_signal


if __name__=="__main__":
    # Load audio
    orig_signal, fs = librosa.load("female_scale.wav", sr=44100)
    N = len(orig_signal)

    # Pitch shift amount as a ratio
    f_ratio = 2 ** (-2 / 12)

    # Shift pitch
    new_signal = shift_pitch(orig_signal, fs, f_ratio)

    # Plot
    plt.style.use('ggplot')
    plt.plot(orig_signal[:-1])
    plt.show()
    plt.plot(new_signal[:-1])
    plt.show()

    # Write to disk
    librosa.output.write_wav("female_scale_transposed_{:01.2f}.wav".format(f_ratio), new_signal, fs)