(1)Install GIZA++ and mkcls
Since I have installed GIZA++ and mkcls in my computer, I won't give the full process here. You can refer to the last report about running GIZA++.One you should do before the install is to delete the“-DBINARY_SEARCH_FOR_TTABLE” in the “Makefile”(新版的GIZA++不用删除，直接运行即可）.
Just enter the install folder and use the command:
Prepare third-party software:
You need the following free third-party software to build SRILM:
--> gcc version 3.4.3 or higher
Gcc is a compiler which is “GNU Compiler Collection” for short. It includes front ends for C, C++, Objective-C,Fortran, Java, and Ada, as well as libraries for these languages (libstdc++, libgcj...).My gcc version is 4.4.1.
--> GNU make and automake
Explain the commands in file –Makefile.
To build the program “memscore” .The details are described in http://randspringer.de/boost
--> John Ousterhout's Tcl toolkit, version 7.3 or higher
This is currently used only for some test programs, but is needed for the build to go through without manual intervention. Here I use Tcl 8.5 .
A kind of Unix Shell.It is very important, if you don't install this, there will be something wrong with the program!
The following tools are needed at runtime only:
--> GNU awk (gawk)
To interpret many of the utility scripts.
To read/write compressed files.
To read/write .bz2 compressed files.
To read/write .7z compressed files.
In Ubuntu 9.10, the tools mentioned above are all in the software package ---Synaptic Package Management.There is no need for us to download them, we just find the tools in the package and then install them. The reason why we install these tools first is that the SRILM tool is needed these tools to compressed or interpreted. For example, if you don't install the “libboost” you will meet an error that “could not detect the boost libraries” when compile the SCRIPTS mentioned above.
Before we compile the SRILM we should edit some Makefile to allow our installation.
1. Go to folder ~/mosesdecoder/srilm and edit the Makefile to point to your directory. Here is my change.
The purpose of first change is to tell the machine where the SRILM should be installed, and the second change is to tell the machine where to go after installing the SRILM. The folder it will point is:/home/tianliang/moses/common/Makefile.machine.i686-gcc4
2. Go to folder: ~/mosesdecoder/srilm/common and then find the file named: Makefile.machine.i686-gcc4
The purpose of this change is to tell the machine where the tcl is.
There is no need for us to change this one .What I do is just want to use the gawk, it’s depended on you!
You should know that different operating system have different default install path. The path I used above is my install path; you can use the following command line to get your install path:
$which is tools
3. Install Now
Now we can install the language model SRILM. Enter the folder “srilm” and compile:
tianliang@ubuntu:~/mosesdecoder/srilm$ make World
Then it will compile the srilm .Here I won't give the full run process for its long program. If you want to know the concrete run process you can refer to appendix.
After the successful compilation, you will get three new subdirectories: bin, lib and include. Here is the content of the three folders:
These three folders are built by the program itself as shown below:
If you can see the expected results in the last several lines like below, you can think you have successfully installed the program SRILM. In this case, there is no need for you to test as step 4.
Most errors will be shown after the “make”and “make”, if you don't meet these errors ,you can thank GOD!
The most important executable program we will use is “ngram-count” in the folder of bin/i686-gcc4.We will use this program to build an N-gram language model .Later you will see it!
If you are not sure whether you have successfully installed the SRILM. You can have a test using the “test” folder that provided by this SRILM package. But before your test, you need to add SRILM to your path and then run their test suite. Of course you don't need these in your path for normal training and decoding with Moses. The environment variable path is:
The test command is:
tianliang@ubuntu:~/mosesdecoder/srilm$ cd test
tianliang@ubuntu:~/mosesdecoder/srilm/test$ make all
Check output, look for IDENTICAL and DIFFERS. You can decide the program is right installed or not by the number of the IDENTICAL and DIFFERS. Of course we want to see more IDENTICAL.
(3) Install Moses
In the Moses folder structure, there are a lot of folders that we won't use now. Because they are projects for Eclipse, Xcode, and Visual Studio -- though these are not well maintained and may not be up to date. I'll focus on the Linux command-line method, which is the preferred way to compile.
1. Run regenerate-makefiles.sh to generate the Configure files that needed.
After download the decoder from Subversion, you will need to run the following script in the Moses directory.
tianliang@ubuntu:~$ cd mosesdecoder
tianliang@ubuntu:~/mosesdecoder$ cd moses
There is a good programming habit in this decoder---a warning to you that letting you to know how to type the command in the following steps as shown above. In this process, what you should know is that version 1.9 or higher of aclocal and automake are required. You can see the run process that it is “Calling” them. If you didn't install them as mentioned at the beginning of the report, you will meet some mistakes.
2. Run configure to build the Moses executable files ---Makefile.
At this point, you should choose the preferred LM toolkit. To do that we can use the parameter either -with-srilm
or -with-irstlm as follows.
$./configure --with-srilm=/path to srilm (or --with -irstlm=/path to irsltm)
Here I use the first one, because I won't use the memory mapping.
tianliang@ubuntu:~/mosesdecoder/moses$ ./configure --with srilm=/home/tianliang/mosesdecoder/srilm
You can see from the run process that it generates some Makefiles. If we don't do this step we can not do the next one, because we can't use the command “make” to compile the program without Makefile.
3. Compile the program using “make”
We know that we get the Makefile from the last step. Now we need to edit the Makefile to run the “make”. There are six Makefiles and we need change four of them.(We can change all of them ,but we won't use all of them this time.)
Go to the Makefiles:
Find the sentence:
Now we can compile the program:
tianliang@ubuntu:~/mosesdecoder/moses$ make -j 4
The -j 4 is optional, the make -j X where X is number of simultaneous tasks is a speedier option for machines with
multiple processors. My computer has two processors so I use this parameter .When you successfully run the
program, you will get the lines like below:
If you meet lines like below you should check your tools that asked to be installed at the beginning of the report.
This creates several files we will be using:
• misc/processPhraseTable - Used to binarize phrase tables
• misc/processLexicalTable - Used to binarize reordering tables
• moses-cmd/src/moses - The actual decoder
Now you can have a test whether you have installed the program. Download the sample models and extract them into your working directory. Here I just download the sample to the moses directory.
tianliang@ubuntu:~/mosesdecoder $cd moses
tianliang@ubuntu:~/mosesdecoder/moses$tar xzf sample-models.tgz
tianliang@ubuntu:~/mosesdecoder/moses/sample-models/phrase-model$ ./moses -f moses.ini <in> out
IF everything worked out right, this should translate the sentence das ist ein kleines haus (in the file in) as this is a small house (in the file out)
Also you can see it on the screen using command cat like below:
tianliang@ubuntu:~/moses/trunk/sample-models/phrase-model$ cat out
Moses uses a set of scripts to support training, tuning, and other tasks. The support scripts used by Moses are
"released" by a Makefile which edits their paths to match your local environment. So we should edit the Makefile first to get a path.
First we should make two folders called “bin” and “target”, and then we can edit the Makefile. My change is like below:
TARGETDIR is the directory where we want the compiled scripts to be copied to. , and BINDIR is the directory where GIZA++ and other tools are installed.
Now we should copy the compiled executable program GIZA++, snt2cooc.out and mkcls to the “bin” folder. We will use them when we compile the SCRIPTS. If you have done that go to compile it like below：
tianliang@ubuntu:~/mosesdecoder/moses/scripts$ make release
This will create a time-stamped folder named
/home/tianliang/mosesdecoder/moses/scripts/target/scripts-YYYYMMDD-HHMM with released versions of all the scripts. You will call these versions when training and tuning Moses. Some Moses training scripts also require a SCRIPTS_ROOTDIR environment variable to be set. The output of make release should indicate this. Most scripts allow you to override this by setting a -scripts-root-dir flag or something similar.
As mentioned in the run process, we should export the SCRIPTS_ROOTDIR like below:
语言模型（Language model）和翻译模型（translation model）请参看，第三部分“Moses运行过程记录---Moses语言模型和翻译模型的构建（三）”