About Giza++

How To Compile, Install and Run GIZA++

Partially copy from original article: http://kwang.blogdns.com/research/how-to-compile-install-run-giza.html

Download GIZA++:

GIZA++ is available here: http://code.google.com/p/giza-pp/
$ wget http://giza-pp.googlecode.com/files/giza-pp-v1.0.2.tar.gz

Compilation/Installation of GIZA++:

The current version of GIZA++ is not able to be compiled with gcc 3.4.6. If you meet the same problem, get your gcc updated. What I’m using is gcc 4.1.2. Although it’s not the most up-to-date version, it works for GIZA++.

1. Decompress the tar file, and you will get a directory named giza-pp . There are two sub-folders under it, which contain the GIZA++ toolkit and the mkcls tool respectively.
$ tar xvzf giza-pp-v1.0.2.tar.gz

2. Modify the Makefile under giza-pp/GIZA++-v2 :
(a) change the INSTALLDIR to the one appropriate for you (not necessary if you don’t invoke make install ).
(b) find -DBINARY_SEARCH_FOR_TTABLE option and delete it (Why? See this ).
$ cd giza-pp/GIZA++-v2
$ vi Makefile

 

3. Modify the file file_spec.h in the Directory of GIZA++-v2:

 

*** file_spec.h	2009/07/10 21:38:39	1.1
--- file_spec.h	2009/07/13 11:37:21
***************
*** 37,49 ****
    struct tm *local;
    time_t t;
    char *user;
!   char time_stmp[17];
    char *file_spec = 0;
    
    t = time(NULL);
    local = localtime(&t);
    
!   sprintf(time_stmp, "%02d-%02d-%02d.%02d%02d%02d.", local->tm_year, 
  	  (local->tm_mon + 1), local->tm_mday, local->tm_hour, 
  	  local->tm_min, local->tm_sec);
    user = getenv("USER");
--- 37,49 ----
    struct tm *local;
    time_t t;
    char *user;
!   char time_stmp[19];
    char *file_spec = 0;
    
    t = time(NULL);
    local = localtime(&t);
    
!   sprintf(time_stmp, "%04d-%02d-%02d.%02d%02d%02d.", 1900 + local->tm_year, 
  	  (local->tm_mon + 1), local->tm_mday, local->tm_hour, 
  	  local->tm_min, local->tm_sec);
    user = getenv("USER");

 


4. Under the directory of giza-pp , run make to compile the source code for both GIZA++ and mkcls. This produces some execution files for both GIZA++ and mkcls projects.
Typing make install will only copy the GIZA++ execution file to the directory you specified, so it’s up to you whether you want it or not.
$ make
$ make install

Running GIZA++ (Training IBM Model):

Let’s say we already have a parallel corpus named english and foreign.

1. Run the plain2snt.out (under GIZA++-v2) to convert the parallel corpus into GIZA++ format.
$ plain2snt.out english foreign
This produces following files: english.vcb , foreign.vcb , english_foreign.snt and foreign_english.snt

2,Using IBM translation model 1 to learn word2word translation probabilities:

./GIZA++ -ml 101 -hmmiterations 0 -model1iterations 10 -model2iterations 0 -model3iterations 0 -model4iterations 0 -model5iterations 0 -model6iterations 0 -model1dumpfrequency 10 -s english.vcb -t foreign.vcb -c english_foreign.snt -o prob_table

the file "prob_table.t1.10 " is the w2w translation probability, format: source target probability_value

 

3,

cat prob_table.t1.10 | sort -k 3 -r -g | sort -k 1 -g --stable > sorted_prob_table.t1.10
:the test.txt has three columns, first sort by the 3rd colum(-k) using general numerical comparison(-g) from big to small(-r), then sort by the 1st column by stable sort. (--stable). The output is stored in sorted_prob_table.t1.10

 

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值