google speech command dataset的生成

最新推荐文章于 2025-01-02 01:02:14 发布

shuai_wen

最新推荐文章于 2025-01-02 01:02:14 发布

阅读量3.1k

点赞数 1

分类专栏：人工智能文章标签： python 深度学习

本文链接：https://blog.csdn.net/u011279649/article/details/113771142

版权

本文档介绍了如何使用Python和深度学习生成Google Speech Command数据集。内容包括模型设置、数据集划分、预处理音频数据、使用AudioProcessor进行数据处理、获取预处理后的数据集、数据集的样本分布和生成过程。主要涉及的库和工具包括TensorFlow、AudioProcess、MFCC等。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

GitHub - hyperconnect/TC-ResNet: Code for Temporal Convolution for Real-time Keyword Spotting on Mobile Devices

模型要分出几类：

def prepare_words_list(wanted_words):
"""Prepends common tokens to the custom word list.
前置: silence and unknown，不管wanted words是什么都会有这个两个class
Args:
wanted_words: List of strings containing the custom words.

Returns:
List with the standard silence and unknown tokens added.
"""
return [SILENCE_LABEL, UNKNOWN_WORD_LABEL] + wanted_words
SILENCE_LABEL = '_silence_'
UNKNOWN_WORD_LABEL = '_unknown_'
silence/unknown class又有什么特殊的吗？相对custom word list是有比例的，毕竟他们不是模型的最终目的。根据silence/unknown的比例形成数据集，下一步再把他们分成testing/validation and training 训练用的数据集。

文件属于哪个数据集？

根据文件名得到所在的数据集，这里的方法有点特殊，这么做的目的是某个文件始终在某个数据集不会在training和testing等 set间变换。

文件名不变对应的hash值就不变，且val/test percentage不变则最终得到的set 值不变

percentage_hash = ((int(hash_name_hashed, 16) % (MAX_NUM_WAVS_PER_CLASS + 1)) * (100.0 / MAX_NUM_WAVS_PER_CLASS))

percentage_hash = 80.49486488472569

def which_set(filename, validation_percentage, testing_percentage):
"""Determines which data partition the file should belong to.
Args:
filename: File path of the data sample.
validation_percentage: How much of the data set to use for validation.
testing_percentage: How much of the data set to use for testing.

Returns:
String, one of 'training', 'validation', or 'testing'.
"""
base_name = os.path.basename(filename)
hash_name_hashed = hashlib.sha1(compat.as_bytes(hash_name)).hexdigest()
percentage_hash = ((int(hash_name_hashed, 16) %
(MAX_NUM_WAVS_PER_CLASS + 1)) *
(100.0 / MAX_NUM_WAVS_PER_CLASS))
if percentage_hash < validation_percentage:
result = 'validation'
elif percentage_hash < (testing_percentage + validation_percentage):
result = 'testing'
else:
result = 'training'
return result

怎样生成google speech command数据集

AudioProcessor.py 对数据文件进行预处理，__init__实现数据集的创建和建立进行预处理的graph, 当调用get_data时生成预处理后的数据集

model_settings的描述

样本的描述(resample rate, clip length)，

怎样进行数据处理(window size, window stride, feature bins, preprocess-对频谱的后处理)，

生成什么(label_count)

def prepare_model_settings(label_count, sample_rate, clip_duration_ms,
window_size_ms, window_stride_ms, feature_bin_count,
preprocess):
"""Calculates common settings needed for all models.

Args:
label_count: How many classes are to be recognized. (包括: silence/unknown and wanted words)
sample_rate: Number of audio samples per second.
clip_duration_ms: Length of each audio clip to be analyzed. //样本的长度
window_size_ms: Duration of frequency analysis window. //分帧: 每帧的长度和步长
window_stride_ms: How far to move in time between frequency windows.
feature_bin_count: Number of frequency bins to use for analysis.//每帧取的特征数
preprocess: How the spectrogram is processed to produce features.

Returns:
Dictionary containing common settings.

Raises:
ValueError: If the preprocessing mode isn't recognized.
"""
desired_samples = int(sample_rate * clip_duration_ms / 1000)
window_size_samples = int(sample_rate * window_size_ms / 1000)
window_stride_samples = int(sample_rate * window_stride_ms / 1000)

length_minus_window = (desired_samples - window_size_samples)
if length_minus_window < 0:
spectrogram_length = 0
else: # window stride samples not window size for overlap
spectrogram_length = 1 + int(length_minus_window / window_stride_samples)

if preprocess == 'mfcc':
average_window_width = -1
fingerprint_width = feature_bin_count
elif preprocess == 'micro':
average_window_width = -1
fingerprint_width = feature_bin_count
else:
raise ValueError('Unknown preprocess mode "%s" (should be "mfcc",'
' "average", or "micro")' % (preprocess))
fingerprint_size = fingerprint_width * spectrogram_length
return {
'desired_samples': desired_samples,
'window_size_samples': window_size_samples,
'window_stride_samples': window_stride_samples,
'spectrogram_length': spectrogram_length, // 描述有多少帧
'fingerprint_width': fingerprint_width, // 每帧的特征数
'fingerprint_size': fingerprint_size, // 每个样本生成的特征数
'label_count': label_count,
'sample_rate': sample_rate,
'preprocess': preprocess,
'average_window_width': average_window_width,
}

prepare_data_index

prepare data and word 对应的index，不是生成了(data, index)这样的样本

赋值了成员变量：data_index, word_to_index

def prepare_data_index(self, silence_percentage, unknown_percentage,
wanted_words, validation_percentage,
testing_percentage):
"""Prepares a list of the samples organized by set and label.

The training loop needs a list of all the available data, organized by
which partition it should belong to, and with ground truth labels attached.
This function analyzes the folders below the `data_dir`, figures out the
right labels for each file based on the name of the subdirectory it belongs to,
and uses a stable hash to assign it to a data set partition.

Args: silence/unknown percentage相对wanted word而言的
silence_percentage: How much of the resulting data should be background.
unknown_percentage: How much should be audio outside the wanted classes.
wanted_words: Labels of the classes we want to be able to recognize.
validation_percentage: How much of the data set to use for validation.
testing_percentage: How much of the data set to use for testing.

Returns:
Dictionary containing a list of file information for each set partition,
and a lookup map for each class to determine its numeric index.

Raises:
Exception: If expected files are not found.
"""
# Make sure the shuffling and picking of unknowns is deterministic.
random.seed(RANDOM_SEED) #next used for shuffle(用于随机得到unknown样本)

# wanted_words_index: directory, key: string word, value: index of list
wanted_words_index = {}
for index, wanted_word in enumerate(wanted_words):
wanted_words_index[wanted_word] = index + 2

最低0.47元/天解锁文章