Moses运行过程记录---Moses语言模型和翻译模型构建（三）

最新推荐文章于 2020-11-24 00:32:06 发布

liangliang

最新推荐文章于 2020-11-24 00:32:06 发布

阅读量4.2k

点赞数 1

分类专栏：机器翻译文章标签：语言 ubuntu 2010

版权声明：本文为博主原创文章，遵循 CC 4.0 BY-SA 版权协议，转载请附上原文出处链接和本声明。

本文链接：https://blog.csdn.net/tianliang0123/article/details/6207216

版权

本文详细记录了使用Moses构建从英语到汉语的翻译模型和5-gram语言模型的过程，包括使用ngram-count创建中文5-gram模型、训练翻译模型的9个步骤，并解释了各步骤的作用和所需参数。最后，介绍了训练完成后得到的各文件夹内容，如词汇文件、GIZA++的输出和最终的解码器模型。

摘要由CSDN通过智能技术生成

This time I want to translate English to Chinese, so I choose Chinese as a language model.Go to the directory:/home/tianliang/mosesdecoder/srilm/bin/i686-gcc4, we will use the “ngram-count” to build a 5-gram model. The process is like below:

tianliang@ubuntu:~/mosesdecoder/srilm/bin/i686-gcc4/test$ mkdir test

tianliang@ubuntu:~/mosesdecoder/srilm/bin/i686-gcc4/test$ cd test

tianliang@ubuntu:~/mosesdecoder/srilm/bin/i686-gcc4/test$ ./ngram-count -text clean.chn -lm chinese.gz -order 5 -unk -wbdiscount -interpolate

Here it means: we will build Chinese file “clean.chn” into a 5-gram language model chinese.gz using the smoothing methods called Witten-Bell discounting and interpolated estimates. The chinese.gz looks like:

It shows the number of the n-gram models. For example, there are 594 3-gram models in our corpus.

Moses' toolkit does a great job of wrapping up calls to mkcls and GIZA++ inside a training script, and outputting the phrase and reordering tables needed for decoding. The script that does this is called train-factored-phrase-model.perl. In my running, the train-factored-phrase-model.perl is located at

最低0.47元/天解锁文章

关注

1
点赞
踩
1

收藏

觉得还不错? 一键收藏
4
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫

专栏目录

liangliang CSDN认证博客专家 CSDN认证企业博客

码龄15年

19: 原创

23万+: 周排名

111万+: 总排名

9万+: 访问

: 等级

1115: 积分

60: 粉丝

13: 获赞

104: 评论

12: 收藏

私信

关注

热门文章

分类专栏

最新评论

进军CSDN！
萧瑟_天道酬勤: 14年前的博客，时间啊
GIZA++运行详细报告
rush_boy: [code=plain] 您好，我在使用GIZA++的时候发现它可以接受的描述input file的参数： parameters describing input files: ---------------------------------- c = (training corpus file name) d = (dictionary file name) s = (source vocabulary file name) t = (target vocabulary file name) tc = (test corpus file name) 其中的tc我想是用来测试test的，即用训练好的对齐模型在训练corpus上直接得出结果，但是尝试多次后不管怎么组合这些命令参数都无法得到正确的结果，网上也找不到对于这个参数任何描述，所以很困扰，不知道博主对这个参数有没有研究 [/code]
GIZA++运行详细报告
空の鱼: 前辈，我在最后生成FZeroWords是出现问题： bash: ./rewrite.mkZeroFert.perl:Permission denied 是怎么回事？
GIZA++运行情况记录和结果对比
alonso1stking: 田老师您好，我目前在阅读您的GIZA++运行情况记录和结果对比现在发现isi-rewrite-decoder-r1.0.0a解码器不能使用，时间太过于久远了。libstdc++-libc6.2-2.so.3也无法安装了麻烦问下田老师现在有什么可用的解码器，谢谢您了
GIZA++运行情况记录和结果对比
Levin__NLP_CV_AIGC: Warning: No Decoding Algorithm was specified. Will use default (greedy1). Using Fast Greedy Decoding (by Daniel Marcu and Ulrich Germann) Fatal Error: The configuration parameter 'LanguageModelFile' is not specified.

最新文章

目录

评论 4

被折叠的条评论为什么被折叠?

到【灌水乐园】发言

查看更多评论

添加红包

成就一亿技术人!

hope_wisdom

发出的红包

实付元

使用余额支付

点击重新获取

扫码支付

钱包余额 0

抵扣说明：

1.余额是钱包充值的虚拟货币，按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载，可以购买VIP、付费专栏及课程。