直观解释 稀疏傅里叶变换



本文很简单的解释了什么是  

稀疏傅里叶变换

1. Let's play an ideal piano.

The keys of a piano are such that each key corresponds to a single specific frequency of sound.  For example, one of the better known frequencies is middle A (440 Hz).  When the A key is pressed, the audio you hear is a perfect sine wave, oscillating at 440 Hz.  Similarly, middle C corresponds to about 261 Hz frequency sound wave.

钢琴的每个琴键都对应一个特定频率的声音。例如,一个比较有名的频率是国际标准音A(440赫兹)。当有琴键按下时,你听到的声音是一个完美的正弦波,振荡在440赫兹。同样,中央C对应的频率约为261赫兹声波。

However, playing one note at a time is boring on the piano, let's try playing them both at the same time.  The interesting thing that happens here is the fact that the two respective sounds combine to create a completely new and unique sound.  It is no longer just a single frequency, it is some sort of combination of the two.  If pressed together we find that they would essentially be adding together!

不过,每次只演奏一个音符太单调了,我们来尝试几个音符同时演奏。有趣的是,两个各不相关的声音结合起来,就创造一个全新的独特声音。它不再只是单一的频率,这是两个频率的结合。如果琴键一起按下我们会发现,对应的频率也叠加在了一起。

三个音符组合形成的最终声音信号!

The Fast Fourier Transform (FFT) lets us take this new sound, and deconstruct it back into the original frequencies to essentially see what keys made up the chord.  Let's take a step back to just playing one note and look at an example graph of the original signal and its FFT.

快速傅立叶变换(FFT)可以让我们将这个新的声音解构为原始的频率,从本质上得到这个和弦是由哪些琴键组成的。现在我们退一步,只演奏一个音符,看看这个原始信号及其FFT的示例图。

The numbers in this graph are not as important as the understanding of what these shapes represent.  The blue graph at the top represents the audio wave, the amplitude with respect to the time.  It is a single frequency, and represents playing just the note A.  After taking the FFT we get a very interesting looking graph of amplitude with respect tofrequency.  A single spike on this graph means a single frequency in our original signal, while most of the frequencies are not present.  Moving ahead a little bit, in our chord example with two notes, C and A, our FFT would have two spikes! One would appear in the same spot, and another would appear at a lower frequency.  Overall, the FFT of a signal will output the amounts of each 'pure' frequency was added up to make the final result.

这幅图中的数字没有曲线形状所代表的意义那么重要。上面图中蓝色表示声音的波形,表示了其幅值相对于时间的关系。它是一个单一的频率,表示只演奏了音符A。FFT变换后,我们得到了一个很有趣的图形,幅值相对于频率的关系。此图中单个波峰表示原始信号中的单一频率,而大部分的频率不存在。前进一点点,在我们的和弦例子中有两个音符C和A,我们的FFT将有两个波峰!一个会出现在相同的位置,而另一个将出现在较低的频率。总体来说,一个信号的FFT将每个“纯”频率相加得到最终的输出结果。

2. Let's add a singer to accompany the piano.

Human voice frequency has a wide range, with many of the sounds (words) made with combinations of many of these frequencies.  As seen in the picture below, an audio signal can get very very complicated.  The respective FFTs may have thousands of non-zero frequencies represented in some proportion (the red graph above would have thousands of peaks of varying heights).  Even a singer trying to sing an F, for example, would end up with many different frequencies due to the nature of human voice not being an ideal musical instrument.

我们给钢琴加一个歌手伴奏。

人的声音频率范围很宽,多种多样的频率组成了多种多样的声音(词语)。正如下面的图片,音频信号可能会非常非常复杂。相应的FFT在一定比例上有成千上万的非零频率(图上的红色曲线将有成千上万不同高度的峰值)。举个例子,即使是一个歌手想发出F音,最终也会产生许多不同的频率,因为人声不是一个理想的乐器。

说出不同词语时的音频信号。显然不像上面的标准音A那样光滑波动!

Now that we've somewhat understood what the FFT does, let's take a look at MIT's Sparse FFT.  After we added the singer to the piano, we had a chord of C and A, as well as a singer trying to maintain an F creating a very jagged audio signal and FFT.  The original FFT would calculate the amplitudes of every single frequency, but maybe we can leverage the fact that most of the frequencies will be clustered around the C, A and F!  Thus, if we only calculate the amount these three frequencies contribute to the final audio signal we may be able to replicate a close enough sound to the original musical score.  This is exactly what the Sparse FFT does.

现在,我们已经有点明白FFT了,现在来看看MIT的稀疏FFT。当我们为钢琴加了歌手伴奏后,我们有一个C和A的和弦以及一个歌手努力维持唱出的F音,然后得到了一个参差不齐的音频信号及其FFT。原本的FFT将计算出每个频率的幅度,但我们也许可以利用这样一个事实,即大部分的频率将集中在C、A和F周围!因此,如果我们只计算组成最终音频信号的三个频率,可以复制出一个足够接近于原音乐乐谱的声音。这就是稀疏FFT在做什么。

This specific paper noticed that in the case of video signals 89% of the frequencies that exist aren't needed.  By only calculating the Sparse FFT with only 11% of the frequencies, the signal quality does not deteriorate too much.  The notion of what a frequency and a signal is in terms of video gets more technical, but the theory is the same as the piano and singer.

这篇论文注意到一个事实,在视频信号中有89%的频率不是必须存在的。只计算11%的频率的稀疏FFT,信号质量不会恶化太多。虽然视频的频率和信号的相关概念更偏向技术性,但是理论同样适用于钢琴和歌手。


  • 2
    点赞
  • 16
    收藏
    觉得还不错? 一键收藏
  • 2
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 2
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值