OpenSMILE特征提取分类

28 篇文章 12 订阅

具体介绍参考:openSMILE 简介_qq_22237367的博客-CSDN博客_opensmileopenSMILE简介最近使用到openSMILE来提取音频特征,看了手册有一点学习心得,整理了一下发出来。openSMILE工具简介openSMILE是一款以命令行形式运行的工具,通过配置config文件,主要用于提取音频特征。 openSMILE适用于: ① speech recognition (feature extraction front-end, keywor...https://blog.csdn.net/qq_22237367/article/details/80897271

openSMILE的输入输出格式
① 文件输入格式可以是:
• RIFF-WAVE (PCM) (for MP3, MP4, OGG, etc. a converter needs to be used)
• Comma Separated Value (CSV)
• HTK parameter files
• WEKA’s ARFF format.(由htk工具产生)
• Video streams via openCV.(opencv产生的视频流数据)

② 文件输出格式可以是:
• RIFF-WAVE (PCM uncompressed audio)
• Comma Separated Value (CSV)
• HTK parameter file
• WEKA ARFF file
• LibSVM feature file format
• Binary float matrix format

③ 分类器和其他组件
语音处理任务经常需要对语音流进行分段,出于这个目的,openSMILE提供语音活动监测算法:
• Voice Activity Detection based on Fuzzy Logic(基于模糊逻辑‎的语音活动检测)
• Voice Activity Detection based on LSTM-RNN with pre-trained models(预先训练的模型的 LSTM-RNN 语音活动检测)
• Turn-/Speech-segment detector(回音/语音探测器)
• LibSVM (on-line)(林智仁教授等开发设计的一个简单、易于使用和快速有效的SVM模式识别与回归)
• LSTM-RNN (Neural Network) classifier which can load RNNLIB and CURRENNT nets(LSTM-RNN分类器,可以加载 RNNLIB 和 CURRENNT 网络 )
• GMM (experimental implementation from eNTERFACE’12 project, to be release soon)
• SVM sink (for loading linear kernel WEKA SMO models)(SVM 接收器-用于加载线性内核 WEKA SMO 模型)
• Speech Emotion recognition pre-trained models (openEAR)(语音情感识别预训练模型)

Signal Processing: The following functionality is provided for general signal processing or signal

pre-processing (prior to feature extraction):
• Windowing-functions (Rectangular, Hamming, Hann (raised cosine), Gauss, Sine, Triangular, Bartlett, Bartlett-Hann, Blackmann, Blackmann-Harris, Lanczos)(窗口特征)
• Pre-/De-emphasis (i.e. 1st order high/low-pass)(预/去重加载)
• Re-sampling (spectral domain algorithm)(重采样)
• FFT (magnitude, phase, complex) and inverse (快速傅里叶变换–幅度、相和 complex fft–及反变换)
• Scaling of spectral axis via spline interpolation (open-source version only)(通过样条插值进行频谱轴的缩放)
• dbA weighting of magnitude spectrum(幅度谱加权)
• Autocorrelation function (ACF) (via IFFT of power spectrum)(自相关函数)
• Average magnitude difference function (AMDF)(平均幅值差分函数)

Data Processing: openSMILE can perform a number of operations for feature normalisation,
modification, and differentiation:
• Range normalisation (off-line and on-line) (幅度标准化)
• Mean-Variance normalisation (off-line and on-line)(均值方差标准化)
• Delta-Regression coefficients (and simple differential) (Delta 回归系数和简易的微分)
• Weighted Differential as in [SER07] (加权微分)
• Various vector operations: length, element-wise addition, multiplication, logarithm, and
power.(各种各样的向量运算)
• Moving average filter for smoothing of contour over time(平滑时间轮廓的移动平均滤波)

Audio features (low-level): The following (audio specific) low-level descriptors can be computed by openSMILE :
• Frame Energy (帧能量)
• Frame Intensity / Loudness (approximation)(帧强度)
• Critical Band spectra (Mel/Bark/Octave, triangular masking filters)(临界频带谱)
• Mel-/Bark-Frequency-Cepstral Coefficients (MFCC)(倒谱系数)
• Auditory Spectra(听觉谱)
• Loudness approximated from auditory spectra.(听觉谱近似强度)
• Perceptual Linear Predictive (PLP) Coefficients(知觉线性预测(PLP)系数)
• Perceptual Linear Predictive Cepstral Coefficients (PLP-CC)(知觉线性预测倒谱系数)
• Linear Predictive Coefficients (LPC) (线性预测系数)
• Line Spectral Pairs (LSP, aka. LSF)(线光谱对)
• Fundamental Frequency (via ACF/Cepstrum method and via Subharmonic-Summation
(SHS)) (基础频率)
• Probability of Voicing from ACF and SHS spectrum peak(ACF 和 SHS 谱峰的概率)
• Voice-Quality: Jitter and Shimmer (声音质量:紧张和支支吾吾)
• Formant frequencies and bandwidths (共振频率和带宽)
• Zero- and Mean-Crossing rate (过零率和平均穿越率)
• Spectral features (arbitrary band energies, roll-off points, centroid, entropy, maxpos, minpos, variance (=spread), skewness, kurtosis, slope)(光谱特性)
• Psychoacoustic sharpness, spectral harmonicity(心理声学锐度和声谱调和性)
• CHROMA (octave warped semitone spectra) and CENS features (energy normalised and
smoothed CHROMA)(色度/CENS特征)
• CHROMA-derived Features for Chord and Key recognition(用于和弦、声调识别的 CHROMA 产生的特征)
• F0 Harmonics ratios(F0谐波率)

Video features (low-level): The following video low-level descriptors can be currently computed by openSMILE , based on the openCV library:
• HSV colour histograms
• Local binary patterns (LBP)
• LBP histograms
• Face detection: all these features can be extracted from an automatically detected facial
region, or from the full image.• Optical flow and optical flow histograms

Functionals: In order to map contours of audio and video low-level descriptors onto a vector
of fixed dimensionality, the following functionals can be applied:
• Extreme values and positions(极值和位置)
• Means (arithmetic, quadratic, geometric)(均值(算术、二次、几何))
• Moments (standard deviation, variance, kurtosis, skewness)(标准差、方差、峰度、偏度)
• Percentiles and percentile ranges(百分位数和百分位数范围 )
• Regression (linear and quadratic approximation, regression error)(回归(线性和二次近似、回归误差))
• Peaks(峰值)
• Centroid(质心)
• Segments(段)
• Sample values(采样值)
• Times/durations(时间/持续时间)
• Onsets/Offsets(开始/偏移)
• Discrete Cosine Transformation (DCT)(离散余弦变换)
• Linear Predictive Coding (LPC) coefficients and gain• Zero-Crossings(线性预测编码(LPC)系数和增益• 过零‎)

  • 0
    点赞
  • 3
    收藏
  • 打赏
    打赏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
©️2022 CSDN 皮肤主题:酷酷鲨 设计师:CSDN官方博客 返回首页
评论

打赏作者

柴神

你的鼓励将是我创作的最大动力

¥2 ¥4 ¥6 ¥10 ¥20
输入1-500的整数
余额支付 (余额:-- )
扫码支付
扫码支付:¥2
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、C币套餐、付费专栏及课程。

余额充值