mfcc中的fft操作
What we should know about sound. Sound is produced when there’s an object that vibrates and those vibrations determine the oscillation of air molecules which creates an alternation of air pressure and this high pressure alternated with low pressure causes a wave.
w ^ 帽子,我们应该知道的声音。 当有物体振动时,就会产生声音,而这些振动决定了空气分子的振动,从而产生了气压的交替变化,而这种高压与低压交替产生的波动。
Some key terms in audio processing.
音频处理中的一些关键术语。
- Amplitude — Perceived as loudness 振幅-视为响度
- Frequency — Perceived as pitch 频率-视为音高
- Sample rate — It is how many times the sample is taken of a sound file if it says sample rate as 22000 Hz it means 22000 samples are taken in each second. 采样率—如果声音文件的采样率表示为22000 Hz,则它是对声音文件进行采样的次数,这表示每秒进行22000个采样。
- Bit depth — It represents the quality of sound recorded, It just likes pixels in an image. So 24 Bit sound is of better quality than 16 Bit. 位深度—它代表所记录声音的质量,就像图像中的像素一样。 因此,24位声音的质量比16位更好。
Here I have used the sound of a piano key from freesound.org
在这里,我使用了freesound.org上钢琴琴键的声音
signal, sample_rate = librosa.load(file, sr=22050)
plt.figure(figsize=FIG_SIZE)
librosa.display.waveplot(signal, sample_rate, alpha=0.4)
plt.xlabel(“Time (s)”)
plt.ylabel(“Amplitude”)
plt.title(“Waveform”)
plt.savefig(‘waveform.png’, dpi=100)
plt.show()
![Image for post](https://miro.medium.com/max/9999/1*GwDgTkcfl7-NPadPDVvD-w.png)
To move wave from a time domain to frequency domain we need to perform Fast Fourier Transform on data. Basically what we do with the Fourier transform is the process of decomposing a periodic sound into a sum of sine waves which all vibrate oscillate at different frequencies. It is quite incredible so we can describe a very complex sound as long as it’s periodic as a sum as the su