Montreal Forced Aligner (MFA)基础使用教程

Zero_to_zero1234

已于 2022-06-27 19:28:28 修改

阅读量3k

点赞数 1

分类专栏： Linux 语音合成编程基础文章标签：语音识别 python

于 2021-10-18 19:23:17 首次发布

本文链接：https://blog.csdn.net/suiyueruge1314/article/details/120832546

版权

编程基础同时被 3 个专栏收录

100 篇文章 3 订阅

订阅专栏

Linux

42 篇文章 0 订阅

订阅专栏

语音合成

5 篇文章 4 订阅

订阅专栏

1、声学模型训练
https://montreal-forced-aligner.readthedocs.io/en/latest/aligning.html#trained-alignment
最新 2.0 版本：
https://montreal-forced-aligner.readthedocs.io/en/latest/user_guide/workflows/train_acoustic_model.html?highlight=mfa%20train

usage: mfa train [-h] [--config_path CONFIG_PATH] [-o OUTPUT_MODEL_PATH]
                 [-s SPEAKER_CHARACTERS] [-a AUDIO_DIRECTORY]
                 [--phone_set {AUTO,IPA,ARPA,PINYIN}]
                 [--output_format {short_textgrid,long_textgrid,json}]
                 [--include_original_text] [--train_g2p]
                 [-t TEMPORARY_DIRECTORY] [--disable_mp] [-j NUM_JOBS] [-v]
                 [-q] [--clean] [--overwrite] [--debug]
                 [--disable_textgrid_cleanup]
                 corpus_directory dictionary_path output_paths
                 [output_paths ...]

mfa train corpus_directory dictionary_path output_directory

其他参数比较正常，temp_directory 和 num_jobs 两项参数建议进行设置，num_jobs 参数在训练语料较大的情况下，多核机器可以很好的进行倍数加速，而temp_directory也可以防止home内存不够，导致异常。

2、其他后续补充

mfa align /data/xxxx/prepared_for_mfa/ /data/xxxx/lexicon.txt english /data/xxxx/output/ -t /data/xxxx/temp_files/ -j 20 --clean

corpus_directory
Full path to the directory to align

dictionary_path
Full path to pronunciation dictionary, or saved dictionary name (you can use mfa model download dictionary to get MFA dictionaries)

acoustic_model_path
Full path to pre-trained acoustic model, or saved model name (you can use mfa model download acoustic to get pretrained MFA models)

output_directory
Full path to output directory, will be created if it doesn’t exist

-h, --help
show this help message and exit

–config_path <config_path>
Path to config file to use for alignment

-s <speaker_characters>, --speaker_characters <speaker_characters>
Number of characters of file names to use for determining speaker, default is to use directory names

-a <audio_directory>, --audio_directory <audio_directory>
Audio directory root to use for finding audio files

–reference_directory <reference_directory>
Directory containing gold standard alignments to evaluate

–custom_mapping_path <custom_mapping_path>
YAML file for mapping phones across phone sets in evaluations

-t <temporary_directory>, --temp_directory <temporary_directory>, --temporary_directory <temporary_directory>
Temporary directory root to store MFA created files, default is /home/docs/Documents/MFA

–disable_mp
Disable any multiprocessing during alignment (not recommended), default is False

-j <num_jobs>, --num_jobs <num_jobs>
Number of data splits (and cores to use if multiprocessing is enabled), defaults is 3

-v, --verbose
Output debug messages, default is False

–clean
Remove files from previous runs, default is False

–overwrite
Overwrite output files when they exist, default is False

–debug
Run extra steps for debugging issues, default is False

–disable_textgrid_cleanup
Disable extra clean up steps on TextGrid output, default is False

–config_path 添加 config.yaml 文件

beam: 10
retry_beam: 40

features:
  type: "mfcc"
  use_energy: false
  use_pitch: true
  frame_shift: 10

training:
  - monophone:
      subset: 10000
      num_iterations: 50
      max_gaussians: 2000
      boost_silence: 1.25

  - triphone:
      subset: 20000
      num_iterations: 50
      num_leaves: 2000
      max_gaussians: 10000
      cluster_threshold: -1
      boost_silence: 1.25
      power: 0.25

  - lda:
      subset: 20000
      num_leaves: 4000
      max_gaussians: 15000
      num_iterations: 40

  - sat:
      subset: 50000
      num_leaves: 4200
      max_gaussians: 40000
      power: 0.2
      silence_weight: 0.2
      fmllr_update_type: "full"

  - pronunciation_probabilities:
      subset: 50000
      silence_probabilities: true

  - sat:
      subset: 150000
      num_leaves: 5000
      max_gaussians: 100000
      power: 0.2
      silence_weight: 0.20
      fmllr_update_type: "full"

  - pronunciation_probabilities:
      subset: 150000
      silence_probabilities: true
      optional: true # Skipped if the corpus is smaller than the subset

  - sat:
      subset: 0
      quick: true # Performs fewer fMLLR estimation
      num_iterations: 20
      num_leaves: 7000
      max_gaussians: 150000
      power: 0.2
      silence_weight: 0.2
      fmllr_update_type: "full"
      optional: true # Skipped if the corpus is smaller than the previous subset

Zero_to_zero1234

关注

1
点赞
踩
2

收藏

觉得还不错? 一键收藏
2
评论
Montreal Forced Aligner (MFA)基础使用教程

1、声学模型训练https://montreal-forced-aligner.readthedocs.io/en/latest/aligning.html#trained-alignmentmfa train corpus_directory dictionary_path output_directory-t DIRECTORY--temp_directory DIRECTORYTemporary directory root to use for aligning, default is ~
复制链接

扫一扫