kaldi中aishell案例s5,进行tdnn训练时报错:
bash: line 1: 9006 Aborted (core dumped) ( nnet3-train --use-gpu=wait --read-cache=exp/nnet3/tdnn_sp/cache.25 --write-cache=exp/nnet3/tdnn_sp/cache.26 --print-interval=10 --momentum=0.0 --max-param-change=2.0 --backstitch-training-scale=0.0 --l2-regularize-factor=1.0 --backstitch-training-interval=1 --srand=25 "nnet3-copy --learning-rate=0.0014489961134561324 --scale=1.0 exp/nnet3/tdnn_sp/25.mdl - |" "ark,bg:nnet3-copy-egs --frame=2 ark:exp/nnet3/tdnn_sp/egs/egs.26.ark ark:- | nnet3-shuffle-egs --buffer-size=5000 --srand=25 ark:- ark:- | nnet3-merge-egs --minibatch-size=512 ark:- ark:- |" exp/nnet3/tdnn_sp/26.1.raw ) 2>> exp/nnet3/tdnn_sp/log/train.25.1.log >> exp/nnet3/tdnn_sp/log/train.25.1.log
run.pl: job failed, log is in exp/nnet3/tdnn_sp/log/train.25.1.log
log文件中记录:
WARNING (nnet3-train[5.5.1068~2-59299]:ReorthogonalizeRt1():natural-gradient-online.cc:241) Cholesky out of expected range, reorthogonalizing with Gram-Schmidt
WARNING (nnet3-train[5.5.1068~2-59299]:ReorthogonalizeRt1():natural-gradient-online.cc:241) Cholesky out of expected range, reorthogonalizing with Gram-Schmidt
ASSERTION_FAILED (nnet3-train[5.5.1068~2-59299]:HouseBackward():qr.cc:123) Assertion failed: (KALDI_ISFINITE(sigma) && "Tridiagonalizing matrix that is too large or has NaNs.")
已解决,调低学习率直到不报错即可。
如果学习率调得比较低,需要增加epoch。同时如果增加minibatch可以提高训练速度。