librosa.load() 参数如下:(得到的音频序列是numpy序列)
"""音频路径"""
path : string, int, pathlib.Path or file-like object
path to the input file.
Any codec supported by `soundfile` or `audioread` will work.
Any string file paths, or any object implementing Python's
file interface (e.g. `pathlib.Path`) are supported as `path`.
If the codec is supported by `soundfile`, then `path` can also be
an open file descriptor (int).
"""采样率"""
sr : number > 0 [scalar]
target sampling rate
'None' uses the native sampling rate
"""是否设置为单声道或者多声道"""
mono : bool
convert signal to mono
"""音频的起始点(单位是秒)"""
offset : float
start reading after this time (in seconds)
""""读取音频的时长(单位为秒)"""
duration : float
only load up to this much audio (in seconds)
"""设置音频序列的类型"""
dtype : numeric type
data type of `y`
"""设置重采样类型"""
res_type : str
resample type (see note)
.. note::
By default, this uses `resampy`'s high-quality mode ('kaiser_best').
For alternative resampling modes, see `resample`
.. note::
`audioread` may truncate the precision of the audio data to 16 bits.
See https://librosa.github.io/librosa/ioformats.html for alternate
loading methods.
"""返回值两个值(前为音频序列,后为采样率)"""
Returns
-------
y : np.ndarray [shape=(n,) or (2, n)]
audio time series
sr : number > 0 [scalar]
sampling rate of `y`
torchaudio.load(): 下载的得到的音频序列是tensor类型
常见用法:
clean_s, fs = torchaudio.load( filepath, frame_offset, num_frames)
filepath 是音频文件路径;
frame_offset 是音频起始点,和librosa不同的是,这里的起始点是采样点数;
num_frames 是音频的帧数;计算公式(wav_length - frame_len)/ frame_hop +1 = num_frames