1、声学模型训练
https://montreal-forced-aligner.readthedocs.io/en/latest/aligning.html#trained-alignment
最新 2.0 版本:
https://montreal-forced-aligner.readthedocs.io/en/latest/user_guide/workflows/train_acoustic_model.html?highlight=mfa%20train
usage: mfa train [-h] [--config_path CONFIG_PATH] [-o OUTPUT_MODEL_PATH]
[-s SPEAKER_CHARACTERS] [-a AUDIO_DIRECTORY]
[--phone_set {AUTO,IPA,ARPA,PINYIN}]
[--output_format {short_textgrid,long_textgrid,json}]
[--include_original_text] [--train_g2p]
[-t TEMPORARY_DIRECTORY] [--disable_mp] [-j NUM_JOBS] [-v]
[-q] [--clean] [--overwrite] [--debug]
[--disable_textgrid_cleanup]
corpus_directory dictionary_path output_paths
[output_paths ...]
mfa train corpus_directory dictionary_path output_directory
其他参数比较正常,temp_directory
和 num_jobs
两项参数建议进行设置,num_jobs
参数在训练语料较大的情况下,多核机器可以很好的进行倍数加速,而temp_directory
也可以防止home内存不够,导致异常。
2、其他后续补充
mfa align /data/xxxx/prepared_for_mfa/ /data/xxxx/lexicon.txt english /data/xxxx/output/ -t /data/xxxx/temp_files/ -j 20 --clean
corpus_directory
Full path to the directory to align
dictionary_path
Full path to pronunciation dictionary, or saved dictionary name (you can use mfa model download dictionary to get MFA dictionaries)
acoustic_model_path
Full path to pre-trained acoustic model, or saved model name (you can use mfa model download acoustic to get pretrained MFA models)
output_directory
Full path to output directory, will be created if it doesn’t exist
-h, --help
show this help message and exit
–config_path <config_path>
Path to config file to use for alignment
-s <speaker_characters>, --speaker_characters <speaker_characters>
Number of characters of file names to use for determining speaker, default is to use directory names
-a <audio_directory>, --audio_directory <audio_directory>
Audio directory root to use for finding audio files
–reference_directory <reference_directory>
Directory containing gold standard alignments to evaluate
–custom_mapping_path <custom_mapping_path>
YAML file for mapping phones across phone sets in evaluations
-t <temporary_directory>, --temp_directory <temporary_directory>, --temporary_directory <temporary_directory>
Temporary directory root to store MFA created files, default is /home/docs/Documents/MFA
–disable_mp
Disable any multiprocessing during alignment (not recommended), default is False
-j <num_jobs>, --num_jobs <num_jobs>
Number of data splits (and cores to use if multiprocessing is enabled), defaults is 3
-v, --verbose
Output debug messages, default is False
–clean
Remove files from previous runs, default is False
–overwrite
Overwrite output files when they exist, default is False
–debug
Run extra steps for debugging issues, default is False
–disable_textgrid_cleanup
Disable extra clean up steps on TextGrid output, default is False
–config_path 添加 config.yaml 文件
beam: 10
retry_beam: 40
features:
type: "mfcc"
use_energy: false
use_pitch: true
frame_shift: 10
training:
- monophone:
subset: 10000
num_iterations: 50
max_gaussians: 2000
boost_silence: 1.25
- triphone:
subset: 20000
num_iterations: 50
num_leaves: 2000
max_gaussians: 10000
cluster_threshold: -1
boost_silence: 1.25
power: 0.25
- lda:
subset: 20000
num_leaves: 4000
max_gaussians: 15000
num_iterations: 40
- sat:
subset: 50000
num_leaves: 4200
max_gaussians: 40000
power: 0.2
silence_weight: 0.2
fmllr_update_type: "full"
- pronunciation_probabilities:
subset: 50000
silence_probabilities: true
- sat:
subset: 150000
num_leaves: 5000
max_gaussians: 100000
power: 0.2
silence_weight: 0.20
fmllr_update_type: "full"
- pronunciation_probabilities:
subset: 150000
silence_probabilities: true
optional: true # Skipped if the corpus is smaller than the subset
- sat:
subset: 0
quick: true # Performs fewer fMLLR estimation
num_iterations: 20
num_leaves: 7000
max_gaussians: 150000
power: 0.2
silence_weight: 0.2
fmllr_update_type: "full"
optional: true # Skipped if the corpus is smaller than the previous subset