【语音处理】s3prl ASR调试记录

haoyunjyr

已于 2024-04-12 19:24:27 修改

阅读量149

点赞数 4

文章标签： python 深度学习机器学习语音识别

于 2024-04-09 22:39:37 首次发布

本文链接：https://blog.csdn.net/haoyunjyr/article/details/137566366

版权

文章目录

- 安装
- 模型训练

安装

环境准备：python 3.10 (2024年4月目前不支持3.10之上的版本），sox，torchaudio==2.0.2（如果高于此版本会报错）

conda create -n s3prl python==3.10
pip install torchaudio==2.0.2

sox的安装参考了网上了不需要sudo的教程

安装代码

git clone https://github.com/s3prl/s3prl.git
cd s3prl
pip install -e .

模型训练

数据准备
- 按照要求下载Librispeech
- 修改config文件downstream/asr/config.yaml中的数据根目录
```
downstream_expert:
    datarc:
        libri_root: "root directory of LibriSpeech"
```
- 运行数据准备代码

python3 preprocess/generate_len_for_bucket.py -i "/my/LibriSpeech" -o data/librispeech -a .flac --n_jobs 4
         0 : train-clean-100
         1 : train-clean-360
         2 : train-other-500
         3 : dev-clean
         4 : dev-other
         5 : test-clean
         6 : test-other
Please enter the index of splits you wish to use preprocess. (seperate with space): 0 3 5

Preprocessing data in: train-clean-100, 28539 audio files found.
Extracting audio length...
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 28539/28539 [00:57<00:00, 500.28it/s]

Preprocessing data in: dev-clean, 2703 audio files found.
Extracting audio length...
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2703/2703 [00:09<00:00, 282.02it/s]

Preprocessing data in: test-clean, 2620 audio files found.
Extracting audio length...
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2620/2620 [00:09<00:00, 270.78it/s]
All done, saved at data/librispeech/len_for_bucket exit.

模型训练

python3 run_downstream.py -n wav2vec2 -m train -u wav2vec2 -d asr -s hidden_states

其中

-u指定预训练模型，此处指定wav2vec2
-d指定任务asr
-n指定输出目录名称
-s选择使用的特征，hidden_states计算所有隐藏层的加权和

模型测试

python3 run_downstream.py -m evaluate -t "test-clean" -e [ckpt]

在LibriSpeech test-clean上的测试结果：
test-clean loss: 0.13725118339061737
test-clean uer: 1.8757698398733063
test-clean wer: 6.626597687157639

haoyunjyr

关注

4
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
【语音处理】s3prl ASR调试记录

sox的安装参考了网上了不需要sudo的教程。
复制链接

扫一扫