cython_实用Cython-音乐检索:短时傅立叶变换

cython

I love so much Cython, as it takes the best of the two main programming worlds: C and Python. Both languages can be combined together in a straightforward way, in order to give you more computational efficient APIs or scripts. Furthermore, coding in Cython and C helps you to understand what is underneath common python packages such as sklearn, bringing data scientists to a further step, which is quite far from the simple import torch and predefined algorithms usage.

我非常喜欢Cython,因为它充分利用了两个主要的编程世界:C和Python。 可以将两种语言直接组合在一起,以便为您提供更多计算有效的API或脚本。 此外,用Cython和C进行编码可帮助您了解sklearn等通用python软件包的sklearn ,从而将数据科学家带入了更进一步的步骤,这与简单的import torch和预定义算法的使用相去甚远。

In this series I am going to show you how to implement in C and Cyhton algorithms to analyse music, following as a reference the corresponding sklearn and scipypackages.

在本系列文章中,我将向您展示如何在C和Cyhton算法中实现以分析音乐,并以相应的sklearnscipy包作为参考。

This lesson deals with the Short-time Fourier Transform or STFT. This algorithm is widely used to analyse frequencies of a signal and their evolution in time. The codes are stored in this repository:

本课涉及短时傅立叶变换或STFT。 该算法被广泛用于分析信号的频率及其随时间的演变。 代码存储在此存储库中:

The codes are extremely useful to understand how to structure a Cython project, subdividing codes in folders and install the final package.

这些代码对于了解如何构造Cython项目,在文件夹中细分代码并安装最终软件包非常有用。

This post contains external link to Amazon affiliate program.

这篇文章包含指向亚马逊会员计划的外部链接。

莱曼理论 (Layman theory)

Feel free to skip this section if you want immediately to get your hands dirty with the code. This is just a light introduction to STFT and its theory.

如果您想立即弄清代码,请随时跳过本节。 这只是STFT及其理论的简要介绍。

The main aspect of the Fourier Transform is to map (or let’s say to sketch) a signal into a frequency domain, pointing out the most important frequencies which constitutes the signal itself. This mapping has wide implications in many fields: in biomedical engineering (e.g. studying frequency contributions in electrocardiogram (ECG) to detect possible diseases or heart malfunctions¹ ² ³), computational science (e.g. compression algorithms such as mp3, jpeg) or finance (e.g. studying stock prices, bond prices behaviours). This mapping is beneficial for studying music signals as well, as the main frequency content can be retrieved and analysed, for example, to create a genres classifiers or app like Shazam (e.g. check my post ). However, it is sometimes interesting and helpful to understand the frequency evolution in time and amplitude, in order to find specific noises or to equalise frequencies in a recording session, or to create neural network algorithms to convert a speech signal to text (e.g. DeepPavlov).

傅里叶变换的主要方面是将信号映射(或说画草图)到频域,指出构成信号本身的最重要的频率。 这种映射在许多领域有着广泛的影响:在生物医学工程(如学习心电图频率贡献(ECG),以检测可能的疾病或心脏故障¹ ² ³ ),计算科学(如压缩算法,如mp3, jpeg )或金融(例如研究股票价格,债券价格行为 )。 这种映射对研究音乐信号也非常有益,因为可以检索和分析主要频率内容,例如创建诸如Shazam之类的流派分类器或应用(例如, 检查我的帖子 )。 但是,有时了解时间和幅度的频率变化,以发现特定的噪声或在录制会话中均衡频率,或者创建神经网络算法以将语音信号转换为文本, 有时是很有帮助的 (例如DeepPavlov ) 。

In practice, STFT divides a time singal into short segments of equal length ( window_length ) and then the Fourier transform of each segment is computed. The resulting segment-frequency content can be plotted against time and it is called spectrogram.

实际上,STFT将时间信号分成等长( window_length )的短段,然后计算每个段的傅里叶变换。 可以将所得的段频率内容相对于时间作图,这称为频谱图。

Practically, the STFT can be summarised in these steps:

实际上,可以将STFT概括为以下步骤:

  • Take an input signal ( e.g. mp3 file)

    接受输入信号(例如mp3文件)
  • Multiply the signal by a window function (e.g. Hamming function). This will help the Fourier transform to be computed at the extremes of each segment, in order to avoid possible discontinuities in the signal, which may block the Fourier Transform computation

    用窗口函数(例如汉明函数 )乘以信号。 这将有助于在每个段的极限处计算傅立叶变换,以避免信号中可能出现的不连续性,这可能会阻塞傅立叶变换的计算

  • Slide with a window and a hop-size window along the singal/time and compute the Fourier transform

    沿着信号/时间与一个窗口和一个跳数大小的窗口一起滑动,并计算傅立叶变换

Fig. 1 helps to better understand what STFT does. An input signal with a defined amplitude, in decibel, and time, in seconds, is encapsulated in N windows of size windowSize. Every HopSize, the window define a signal-segment, which is Fourier transformed. The output frequency, in Hertz, can be plotted as a function of time.

图1帮助更好地了解STFT的功能。 具有定义的幅度(以分贝为单位)和时间(以秒为单位)的输入信号封装在大小为windowSize的 N个窗口中 每个HopSize窗口都定义一个信号段,该信号段经过傅立叶变换。 可以将输出频率(以赫兹为单位)绘制为时间的函数。

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值