How To Compile, Install and Run GIZA++
Partially copy from original article: http://kwang.blogdns.com/research/how-to-compile-install-run-giza.html
Download GIZA++:
GIZA++ is available here: http://code.google.com/p/giza-pp/
$ wget http://giza-pp.googlecode.com/files/giza-pp-v1.0.2.tar.gz
Compilation/Installation of GIZA++:
The current version of GIZA++ is not able to be compiled with gcc 3.4.6. If you meet the same problem, get your gcc updated. What I’m using is gcc 4.1.2. Although it’s not the most up-to-date version, it works for GIZA++.
1. Decompress the tar file, and you will get a directory named giza-pp . There are two sub-folders under it, which contain the GIZA++ toolkit and the mkcls tool respectively.
$ tar xvzf giza-pp-v1.0.2.tar.gz
2. Modify the Makefile under giza-pp/GIZA++-v2 :
(a) change the INSTALLDIR to the one appropriate for you (not necessary if you don’t invoke make install ).
(b) find -DBINARY_SEARCH_FOR_TTABLE option and delete it (Why? See this ).
$ cd giza-pp/GIZA++-v2
$ vi Makefile
3. Modify the file file_spec.h in the Directory of GIZA++-v2:
*** file_spec.h 2009/07/10 21:38:39 1.1 --- file_spec.h 2009/07/13 11:37:21 *************** *** 37,49 **** struct tm *local; time_t t; char *user; ! char time_stmp[17]; char *file_spec = 0; t = time(NULL); local = localtime(&t); ! sprintf(time_stmp, "%02d-%02d-%02d.%02d%02d%02d.", local->tm_year, (local->tm_mon + 1), local->tm_mday, local->tm_hour, local->tm_min, local->tm_sec); user = getenv("USER"); --- 37,49 ---- struct tm *local; time_t t; char *user; ! char time_stmp[19]; char *file_spec = 0; t = time(NULL); local = localtime(&t); ! sprintf(time_stmp, "%04d-%02d-%02d.%02d%02d%02d.", 1900 + local->tm_year, (local->tm_mon + 1), local->tm_mday, local->tm_hour, local->tm_min, local->tm_sec); user = getenv("USER");
4. Under the directory of giza-pp , run make to compile the source code for both GIZA++ and mkcls. This produces some execution files for both GIZA++ and mkcls projects.
Typing make install will only copy the GIZA++ execution file to the directory you specified, so it’s up to you whether you want it or not.
$ make
$ make install
Running GIZA++ (Training IBM Model):
Let’s say we already have a parallel corpus named english and foreign.
1. Run the plain2snt.out (under GIZA++-v2) to convert the parallel corpus into GIZA++ format.
$ plain2snt.out english foreign
This produces following files: english.vcb , foreign.vcb , english_foreign.snt and foreign_english.snt
2,Using IBM translation model 1 to learn word2word translation probabilities:
./GIZA++ -ml 101 -hmmiterations 0 -model1iterations 10 -model2iterations 0 -model3iterations 0 -model4iterations 0 -model5iterations 0 -model6iterations 0 -model1dumpfrequency 10 -s english.vcb -t foreign.vcb -c english_foreign.snt -o prob_table
the file "prob_table.t1.10 " is the w2w translation probability, format: source target probability_value
3,