初学语音识别

最新推荐文章于 2024-06-21 08:20:00 发布

BarbaraChow

最新推荐文章于 2024-06-21 08:20:00 发布

阅读量932

点赞数 1

分类专栏：语音识别文章标签： python 语音识别

原文链接：https://librosa.org/doc/latest/tutorial.html

版权

语音识别专栏收录该内容

8 篇文章 12 订阅

订阅专栏

@[初学librosa]

Librosa

Librosa是一个用于音频、音乐分析、处理的python工具包，一些常见的时频处理、特征提取、绘制声音图形等功能应有尽有，功能十分强大。

librosa 库包含了如下子模块：
1.librosa.beat :用于估计速率和检测节拍。
2.librosa.core: 核心功能包括从磁盘加载音频、计算各种声谱图表示以及用于音乐分析的各种常用工具。为了方便，这个子模块中的所有功能都可以从顶级librosa.*名称空间 *直接访问。
3. librosa.decompose: 利用scilkit-learn 库，通过矩阵分解，计算谐波-冲击源分离(HPSS)和通用谱图分解。
4. librosa.display: 利用matplotlib库，实现可视化和程序展示。
5. librosa.effects: 时域音频处理，如基音偏移和时间拉伸。此子模块还为分解子模块提供时域包装器。
6. librosa.feature: 特征提取和处理。例如色度图、Mel谱图、MFCC以及各种其他谱和韵律特征。还提供了特征操作方法，如delta特征和内存嵌入。
7. librosa.filter: 生成滤波器组（chroma, pseudo-CQT, CQT等）。主要是librosa其他部分使用的内部函数。
8. librosa.oneset：起始检测和起效强度计算。
9. librosa.sequence:用于顺序建模的函数。维特比译码的各种形式，以及构造转移矩阵的辅助函数。
10. librosa.util：辅助工具(标准化、填充、中心化等)

Quickstart

先来一个简单例子：

# Beat tracking example
import librosa

# 1. 获取包含在librosa中的音频示例文件的路径
filename = librosa.example('nutcracker')


# 2. 加载音频 `y`
#    采样率 `sr`
# 默认情况下，所有音频是混合到单声道和重新采样到**22050**赫兹。可以通过向librosa.load提供附加参数来覆盖此行为。
y, sr = librosa.load(filename)

# 3. Run the default beat tracker
# tempo：in beats per minute; brat_frames:一组对应于检测节拍的帧数
# 帧长hop_length=512 librosa使用中心帧，因此第k帧围绕sample
tempo, beat_frames = librosa.beat.beat_track(y=y, sr=sr)

print('Estimated tempo: {:.2f} beats per minute'.format(tempo))

# 4. Convert the frame indices of beat events into timestamps
# in seconds
beat_times = librosa.frames_to_time(beat_frames, sr=sr)

高级用法：

# Feature extraction example
import numpy as np
import librosa

# Load the example clip
y, sr = librosa.load(librosa.ex('nutcracker'))

# Set the hop length; at 22050 Hz, 512 samples ~= 23ms
hop_length = 512

# Separate harmonics谐波（音调）and percussives冲击（瞬变) into two waveforms
y_harmonic, y_percussive = librosa.effects.hpss(y)

# Beat track on the percussive signal
tempo, beat_frames = librosa.beat.beat_track(y=y_percussive,
                                             sr=sr)

# Compute MFCC features from the raw signal
# mfccs是一个numpy.ndarray,其shape为（n_mfcc,T),其中T为track duration in frames.检测到的beat_frames值对应于mfcc的列
mfcc = librosa.feature.mfcc(y=y, sr=sr, hop_length=hop_length, n_mfcc=13)

# 第一种特征操作类型：平滑计算 first-order differences一阶差分 (**delta features**)
mfcc_delta = librosa.feature.delta(mfcc)

# 第二种特征操作：在样本索引之间聚合输入的列(例如节拍帧)
# Stack and synchronize between beat events
# This time, we'll use the mean value (default) instead of median
beat_mfcc_delta = librosa.util.sync(np.vstack([mfcc, mfcc_delta]),
                                    beat_frames)

# Compute chroma features色谱图 from the harmonic signal
# chromagram will be a numpy.ndarray of shape (12, T)
# chromagram的每一列都按其峰值归一化，也可以通过设置norm参数来覆盖。
chromagram = librosa.feature.chroma_cqt(y=y_harmonic,
                                        sr=sr)

# Aggregate chroma features between beat events
# We'll use the median value of each feature between beat frames
beat_chroma = librosa.util.sync(chromagram,
                                beat_frames,
                                aggregate=np.median)

# Finally, stack all beat-synchronous features together
# resulting in a feature matrix beat_features of shape (12 + 13 + 13, # beat intervals)
beat_features = np.vstack([beat_chroma, beat_mfcc_delta])