Reduce cost and horizontally scale deepspeech.pytorch using TorchElastic with Kubernetes.
使用TorchElastic和Kubernetes降低成本并水平扩展deepspeech.pytorch。
使用Deepspeech.pytorch的端到端语音到文本模型 (End-to-End Speech To Text Models Using Deepspeech.pytorch)
Deepspeech.pytorch provides training, evaluation and inference of End-to-End (E2E) speech to text models, in particular the highly popularised DeepSpeech2 architecture. Deepspeech.pytorch was developed to provide users the flexibility and simplicity to scale, train and deploy their own speech recognition models, whilst maintaining a minimalist design. Deepspeech.pytorch is a lightweight package for research iterations and integrations that fills the gap between audio research and production.
Deepspeech.pytorch为文本模型(尤其是高度流行的DeepSpeech2体系结构)提供了端到端(E2E)语音的培训,评估和推断。 Deepspeech.pytorch的开发旨在为用户提供灵活性和简便性,以扩展,训练和部署自己的语音识别模型,同时保持简约的设计。 Deepspeech.pytorch是用于研究迭代和集成的轻量级软件包,可填补音频研究与生产之间的空白。
使用TorchElastic进行水平缩放训练 (Scale Training Horizontally Using TorchElastic)
Training production E2E speech-to-text models currently requires thousands of hours of labelled transcription data. In recent cases, we see numbers exceeding 50k hours of labelled audio data. To train with these datasets requires optimised multi-GPU training and hyper-parameters configurations. As we move towards leveraging unlabelled audio data for our speech recognition models with the announcement of wav2vec 2.0, scaling and throughput will continue to be crucial to train larger models across larger datasets.
培训生产端到端语音到文本模型当前需要数千小时的标记转录数据。 在最近的情况下,我们看到超过5万小时的标记音频数据 。 要使用这些数据集进行训练,需要优化的多GPU训练和超参数配置。 随着wav2vec 2.0的发布,我们将为语音识别模型利用未标记的音频数据,缩放和吞吐量对于在更大的数据集中训练更大的模型仍然至关重要。
Multiple advancements in the field have improved training iteration times, such as the growth of cuDNN, introduction of