*博客地址:http://blog.csdn.net/wangxinginnlp/article/details/64921476
*由于没有step by step instruction,被代码搞晕了些,写个说明文档以备后用。[要注意,红色字都是坑]
代码准备:
nematus https://github.com/rsennrich/nematus
subword-nmt https://github.com/rsennrich/subword-nmt
数据准备:
nematus中自带了1000句对英德双语语料(En-De)
实验环境:
nematus里面写了需要配置:
Nematus requires the following packages:
- Python >= 2.7
- numpy
- Theano >= 0.7 (and its dependencies).
we recommend executing the following command in a Python virtual environment: pip install numpy numexpr cython tables theano
the following packages are optional, but highly recommended
- CUDA >= 7 (only GPU training is sufficiently fast)
- cuDNN >= 4 (speeds up training substantially)
you can run Nematus locally. To install it, execute python setup.py install
实验疑点:
1. 为什么有多个source dictionaries?支持Linguistic Input Features,每个Feauture一个dictionary?
nematus支持传入多个source dictionaries
nmt.py中接收source dictionary代码:
train = TextIterator(datasets[0], datasets[1],
dictionaries[:-1], dictionaries[-1],
....)