本次记录wav文件读写的部分实现方式;
torchaudio
- 读取音频
import torchaudio
wav_file = "music/jamendo/music-jamendo-0080.wav"
waveform, sample_rate = torchaudio.load(wav_file)
- type(waveform): <class ‘torch.Tensor’>
- waveform.shape: torch.Size([1, 3354112])
- waveform.size(): torch.Size([1, 3354112])
- waveform.dtype: torch.float32
- sample_rate: 16000
- tensor([[ 3.0518e-05, 3.0518e-05, 3.0518e-05, …, -3.0518e-05,
0.0000e+00, 0.0000e+00]])
- 备注:music/jamendo/music-jamendo-0080.wav音频文件为单声道、16位、16000HZ采样;
总结
- 可以读取各种格式的数据,返回音频数据和采样率,维度为2,第一维度是通道,第二维度是数据,所以可以读取多通道数据;
- 读取之后的数据为float32类型,在-1到1之间,为采样值/32768得到的;
- 写音频
略。
librosa
librosa安装: conda install -c conda-forge librosa
- 读取音频
import librosa
wav_file="music/jamendo/music-jamendo-0080.wav"
sample_rate=16000
data = librosa.core.load(wav_file, sr=sample_rate)
import librosa
wav_file="立体声_缩混.wav"
sample_rate=16000
data = librosa.core.load(wav_file, sr=sample_rate, mono=False)[0]
data[0]
data[1]
- type(data): <class ‘tuple’>
- data[0]: (array([ 3.0517578e-05, 3.0517578e-05, 3.0517578e-05, …, -3.0517578e-05, 0.0000000e+00, 0.0000000e+00],
dtype=float32), 16000)- type(data[0]) : <class ‘numpy.ndarray’>
- data[0].dtype: dtype(‘float32’)
总结
-
有一点很方便的是,不管原始采样率多少,都可以按照指定的采样率读取,也就是内部会进行采样率转换;
-
读取之后的数据是float32类型的,数值在-1到1之间,同torchaudio;默认采样率为22050; 默认为单通道数据(多通道会被采样为单通道),设置mono=False可以读取多通道数据;
-
写音频
略。
soundfile
- 读取音频
import soundfile as sf
import numpy as np
wav_path='music/jamendo/music-jamendo-0080.wav'
data, sr = sf.read(wav_path)
data = data.astype(np.float32)
data
data.dtype
sr
import soundfile as sf
import numpy as np
wav_path='立体声_缩混.wav'
data, sr = sf.read(wav_path)
data = data.transpose()
data = data.astype(np.float32)
data[0]
data[1]
data.dtype
sr
- type(data) : <class ‘numpy.ndarray’>
- fr: 16000
- 写音频
import soundfile as sf
write_wav_path=‘’
sf.write(write_wav_path, data, sample_rate, ‘PCM_16’)
总结
- 和librosa差不多, 自己很少用;
- 读取单通道音频得到的date维度为1;读取多通道音频得到的data为度为2,多通道数据需要进行transpose,再使用;如果数据多样,处理时要判断时单通道还是多通道时,不如其它的接口方便;
- 读取的数据为-1到1之间经过/32768之后的数据,但是默认是float64,通常使用时转为float32
wave
- 读取数据
import wave
import numpy as np
wav_file = "立体声_缩混.wav"
with wave.open(wav_file, 'rb') as fr:
params = fr.getparams()
nchannel, sampwidth, samplerate, nframs = params[:4]
strdata = fr.readframes(nframs)
# 读取的是二进制数据
data = np.frombuffer(strdata, dtype=np.int16)
# 将读取的二进制数据转为int16数据
data = data/32768
data = data.astype(np.float32)
new_data = data.reshape(-1, nchannel).transpose()
left_data = new_data[0]
right_data = new_data[1]
- type(strdata) : <class ‘bytes’>
- type(data): <class ‘numpy.ndarray’>
- data.size : 6708224
- left_data.size: 3354112
- 写音频
write_wav_file="wave_out_left.wav"
nchannel = 1
sampwidth = 2
framerate=16000
nframes=len(left_data)
comptype="NONE"
compname="no compressed"
write_params = (nchannel, sampwidth, framerate, nframes, comptype, compname)
with wave.open(write_wav_file,'wb') as fw:
fw.setparams(write_params)
left_data = (left_data*32768).astype(np.int16)
fw.writeframes(left_data.tobytes())
write_wav_file="wave_out_right.wav"
with wave.open(write_wav_file,'wb') as fw:
fw.setparams(write_params)
right_data = (right_data*32768).astype(np.int16)
fw.writeframes(right_data.tobytes())
总结
- wave读取的是原始数据,原始二进制数据,然后转成int16的数据;
- 里面的数据并不是缩小到-1到1之间的float32数据,可以除以32768变成float32,这样就和其它读取方式读取到的float32数据一样;
- wave读取多通道数据之后,是一维的,需要首先reshape,reshape时第二个维度时通道数,然后再transpose;
- wave写的时候也是需要转换成int16,然后将二进制写入;
- wave可以读取各种各种类型的wav文件,然后根据自己的需求判断/处理,个人使用较多;
- 绘制wav:https://blog.csdn.net/qq_44109982/article/details/111560494
wavfile
- 读取音频
from scipy.io import wavfile
wav_path="music/jamendo/music-jamendo-0080.wav"
sr, data = wavfile.read(wav_path) # 这里的顺序和其它的不同
# data.dtype: int16
data = data/32768
- type(data): <class ‘numpy.ndarray’>
- data.dtype: dtype(‘float64’)
- data.shape: (3354112,)
from scipy.io import wavfile
wav_path="立体声_缩混.wav"
sr, data = wavfile.read(wav_path)
data = data/32768
- type(data): <class ‘numpy.ndarray’>
- data.dtype: dtype(‘float64’)
- data.shape : (3354112, 2)
总结
- 立体的读出来就是立体的;
- 读取的也是原始的int16数据
- 返回的顺序和其它的不同,这里先返回sr,再是data
- 除以32768之后,默认是float64,一般再使用前再转成float32
- 写音频
from scipy.io import wavfile
write_wav_path=''
data *= 32768
wavefile.write(write_wav_path, sample_rate, data.astype(np.int16))
总结
- 乘以32768之后,转化为int16写入;
- 不需要转为二进制 (wave)写入;