1,使用已有的词汇化概率lex文件
如果训练目录下以后lex文件,那么moses将使用已有的lex文件,不去统计词汇化概率。
(4) generate lexical translation table 0-0 @ Wed Apr 24 17:20:15 CST 2013
moses输出: reusing: /**/lex.f2e and /**/lex.e2f
2,moses的mert.log的值并不是BLEU值
需要得到mert的BLEU值,自己按run.out的结果去测吧。
3.参数\后面加了东西 悲剧了。。。
4.moses-chart:
Either your data contains <s> in a position other than the first word or your language model is missing <s>. Did you build your ARPA using IRSTLM and forget to run add-start-end.sh?
5.moses-chart
Start loading LanguageModel language1.gz : [0.000] seconds
language.gz: line 21955: warning: non-zero probability for <unk> in closed-vocabulary LM
Start loading LanguageModel /language2.gz : [26.000] seconds
moses_chart: File.cc:259: char* File::getline(): Assertion `buffer != 0' failed.
gzip: stdout: Broken pipe
sh: line 1: 6384 Aborted moses.ini -inputtype 0 -show-weights > ./features.list
Exit code: 134
6.关于调参
The simplest method here is to try out with a large number of possible settings, and pick what works best. Good values for the weights for phrase translation table (weight-t, short tm), language model (weight-l, shortlm), and reordering model (weight-d, shortd) are 0.1-1, good values for the word penalty (weight-w, short w) are -3-3. Negative values for the word penalty favor longer output, positive values favor shorter output.
7.moses-chart :ERROR: malformed XML
http://comments.gmane.org/gmane.comp.nlp.moses.user/4121
8.
http://article.gmane.org/gmane.comp.nlp.moses.user/7878/match=truncated+sentence
9.多线程mert
--decoder-flags "-threads 8"\
10.moses自带的语料预处理脚本 模型过滤脚本
clean-corpus-n.perl
filter-model-given-input.pl
10.moses-chart自动为待翻译句子添加收尾标记符
moses没有,同时训练语料中添加收尾标记符号无效,翻译的时候不仅仅是首尾才考虑首尾标记
10.xml-input
das ist ein kleines <np translation="house||dwelling||place"
> prob="0.5||0.25||0.25">haus</np>
> The parameters (-xml-input inclusive, or, -xml-input exclusive) for the
11. word alignemnt
原因:编码错误,moses默认是utf-8编码。跑对齐前推荐使用moses自带的clean工具clean下语料。
11. 解码
Exception: moses/Phrase.cpp:214 in void Moses::Phrase::CreateFromString(Moses::FactorDirection, const std::vector<long unsigned int>&, const StringPiece&, Mos
es::Word**) threw util::Exception because `nextPos == string::npos'.
Incorrect formatting of non-terminal. Should have 2 non-terms, eg. [X][X]. Current string: [37]
Exit code: 1
原因:moses抽取rule的格式使用[X] 表示非终结符,需要将语料中"[",“]"符号进行去除。
http://smtmoses.blogspot.com/2014/02/moses-support-digest-vol-88-issue-50.html