Machine Translation Useful Links: Techniques, Toolkits, Videos (机器翻译中的有用链接:相关技术、工具和视频)

Machine TranslationUseful Links: Techniques, Toolkits, Videos

 

TianLiang

2011-12-2

 

There are some useful links about the machine translation research field, which you can refer for your research work orstudy. Here we give you some short descriptions about the function of therelated toolkits as well as the related URL. For more details, you can refer tothe links.

 

  

 Rule-based MachineTranslation Toolkits:
 

(1) Apertium:

Short Description:Afree/open-source rule-based machine translation platform.

URL:http://www.apertium.org/

 

(2)  OpenLogos:

Short Description: It’s an open source portof the Logos machine translation system for Linux.

URL: http://en.wikipedia.org/wiki/OpenLogos

 

(3) Matxin:

ShortDescriptions: It’s an open-source transfer machinetranslation engine.

URL:http://matxin.sourceforge.net/

 

   Example-based Machine Translation Toolkits:

 

(1)   Marclator:

ShortDescription:Marclator is a free example-basedmachine translation system based on the marker hypothesis, comprisinga marker-driven chunk, a collection of chunk aligners, and a simpleproof-of-concept monotonic recombination or "decoder".

URL:http://www.openmatrex.org/marclator/

 

(2)  PBMBMT:

ShortDescription: APhrase-based Memory-based Machine Translation system, based on memory-basedclassifiers.

URL:http://ilk.uvt.nl/mbmt/pbmbmt/

 

(3)  OpenMaTrEx:

ShortDescription: It’s afree/open-source marker-driven example-based machine translation system.

URL:http://openmatrex.org/

 

   Statistical-basedMachine TranslationToolkits:

 

(1)  cdec:

Short Description:It’s a software used for decoding and alignment.

URL:http://cdec-decoder.org/

 

(2)  Ncode:

Short Description: It’s an open source statistical machine translationsystem based on bilingual n-grams

URL:http://www.limsi.fr/Individu/jmcrego/bincoder/

 

(3)  DoMY™ CE

Short Description:DoMY™ CE combines the best open source SMT translation softwareinto one easy-to-install package

URL: http://www.precisiontranslationtools.com/index.php?option=com_content&view=article&id=1&Itemid=22

 

(4)  Phramer:

Short Description: An open-source statistical phrase-based MT decoder.

URL: http://www.hlt.utdallas.edu/~marian/phramer/

 

(5)  Joshua:

Short Description:Joshua is an open-source statisticalmachine translation decoder for hierarchical and syntax-based machinetranslation, written in Java

URL:http://joshua.sourceforge.net/Joshua/Welcome.html

 

(6)  Moses:

Short Description:The most usedSMT system, including phrase-based and tree-based models.

URL: http://www.statmt.org/moses/

 

(7)  Thot:

ShortDescription: It is a toolkit to trainphrase-based models for statistical machine translation. Thot allows toestimate phrase-based models and to obtain the best alignments at phrase levelfor a given set of sentence pairs.

URL:http://sourceforge.net/projects/thot/

 

   Combined Machine Translation System:

 

(1)   Anusaaraka:

ShortDescription: Anusaaraka is anEnglish-Hindi language accessing software. It is a machine translation toolwith insights from Panini's Ashtadhyayi (Grammar rules); and aims at the fusionof traditional Indian shastras and advanced modern technologies.

URL:http://anusaaraka.iiit.ac.in/

 

(2)  MANY:

ShortDescription: It’s MT system combination system.

URL:http://code.google.com/p/many/

 

(3)  MEMT:

ShortDescription:Systemcombination; won 6/8 language pairs in WMT11.

URL: http://kheafield.com/code/

 

(4)  Cunei:

ShortDescription:Cunei is a data-driven platform for machine translation.

URL:http://www.cunei.org/

 

   Alignment Tools:

 

(1)  ABBYY Aligner:

Short Description: ABBYY Aligner is a professional tool foraligning parallel texts and creating Translation Memory databases. Thiseasy-to-use and convenient software accurately finds matching segments inparallel texts and allows saving them into TMX files for further use inCAT-tools or into RTF files. Based on ABBYY's advanced linguistic technology,ABBYY Aligner ensures excellent quality of parallel text alignment. Thesoftware has an intuitive interface and wide function capabilities for quickand efficient work.

URL:http://www.abbyy.com/aligner/

 

(2)  GIZA++:

Short Description: GIZA++is a statisticalmachine translation toolkit that is used to train IBM Models 1-5 and an HMMword alignment model.

URL:http://code.google.com/p/giza-pp/

 

(3)   Anymalign:

Short Description: Anymalignis amultilingual sub-sentential aligner. It can extract lexical equivalences fromsentence-aligned parallel corpora. Its main advantage over other similar toolsis that it can align any number of languages simultaneously

URL:http://www.limsi.fr/Individu/alardill/anymalign/

 

(4)  Hualign:

ShortDescription:hunalign aligns bilingual text on the sentence level. Its input is tokenizedand sentence-segmented text in two languages. In the simplest case, its outputis a sequence of bilingual sentence pairs

URL:http://mokk.bme.hu/en/resources/hunalign/

 

(5)  Araya:

Short Description:Bilingualalignment and alignment editor creatingTMX files

URL:http://www.heartsome.de/en/araya.php#TMX

 

(6)  MGIZA++:

ShortDescription: A word alignment tool based on famous GIZA++,extended to support multi-threading, resume training and incremental training.

URL:http://sourceforge.net/projects/mgizapp/

 

(7)  Berkeley WordAligner:

Short Description: The Berkeley Word Aligner is astatistical machine translation tool that automatically aligns words in asentence-aligned parallel corpus in supervised and unsupervised ways.

URL:http://code.google.com/p/berkeleyaligner/

 

(8)  PostCAT:

Short Description: This package contains code to perform word alignment using IBM model 1, 2 and the HMM model, using both EM to train and also using constrained EM with agreement constraints and sub stochastic constraints.

URL:http://www.seas.upenn.edu/~strctlrn/CAT/CAT.html

 

(9)  BIA:

Short Description:A suite consisting of a discriminative phrase-based alignment decoder based onlinear alignment models, along with training and tuning tools. In the trainingphase, relative link probabilities are calculated based on an initialalignment. The tuning of the model weights may be performed directly accordingto MT metrics.

URL:http://code.google.com/p/bia-aligner/

 

(10)  RegAligner:

Short Description: It is an adequate replacement for GIZA++ as the models IBM-1, 2, 3, 4 and HMM are implemented.

URL: https://github.com/Thomas1205/RegAligner

 

(11)  Tree Aligner:

ShortDescription:A statisticaltree-to-tree aligner, which can be used for the automatic generation ofparallel treebanks.

URL:http://www.ventsislavzhechev.eu/Home/Software/Software.html

 

(12)  CorpusFiltergraph:

ShortDescription:Statistical machine translation support toolbox toextract, filter, align and transform text data from multilingual documents intoparallel training corpora.

URL:http://sourceforge.net/projects/corpfiltergraph/

 

(13)  tree-alignment-visualizer:

ShortDescription:The majority of existing tools has beencreated for the visualization of basic word alignments as well as for manualannotation of sentence pairs. At the same time, the growing interest in theresearch community in syntax augmented machine translation makes thesimultaneous visualization of alignment links and parse trees increasinglyimportant. We provide a visualization tool for this purpose, which givinginsight in the data facilitates the research toward better translation systems.

URL:http://code.google.com/p/tree-alignment-visualizer/

 

   Evaluation Tools:

 

(1)   NIST/BLEU Confidence Estimator:

ShortDescription:ConfidenceInterval Estimation for MT Evaluations.

URL:http://projectile.sv.cmu.edu/research/public/tools/bootStrap/tutorial.htm

 

(2)  EvalTrans:

ShortDescription:A Tool for the automatic and manualevaluation of translations

URL:http://www-i6.informatik.rwth-aachen.de/web/Software/EvalTrans/index.html

 

(3)  ROUGE:

ShortDescription:Recall-OrientedUnderstudy for Gisting Evaluation (ROUGE), is a set of metrics and a softwarepackage used for evaluating automatic summarization and machine translationsoftware in natural language processing. The metrics compare an automaticallyproduced summary or translation against a reference or a set of references(human-produced) summary or translation.

URL: http://berouge.com/default.aspx

 

(4)  Hierson:

ShortDescription:It’s a tool for automatic error classificationbased on Levensthein distance, precision and recall

URL:http://www.dfki.de/~mapo02/hjerson/

 

(5)  SymEval:

ShortDescription: SymEval is a translation evaluation toolkit that allowsyou to compare and score translations. All you need is source text andtranslated texts that you would like to evaluate.

URL:http://sourceforge.net/apps/mediawiki/symeval/index.php?title=Main_Page

 

(6)  METEROR:

ShortDescription:It’s an automated Metricand Toolkit for MT Evaluation.

URL:http://www.cs.cmu.edu/~alavie/METEOR/

 

(7)  TEROM:

ShortDescription:TERCOM is an implementation of the Translation Error Rate, which is an errormetric for machine translation that measures the number of edits required tochange a system output into one of the references.

URL: http://nlp.cs.qc.cuny.edu/snover/

 

(8)  TERcpp:

ShortDescription:This tool is made toscore machine translation performance with the TER metric. This code is basedon Snover's algorithm.

URL:http://sourceforge.net/projects/tercpp/

 

(9)  Mteval:

ShortDescription:Implementation of BLEU and NIST MTevaluation metrics.

URL:http://jaguar.ncsl.nist.gov/mt/resources/mteval-v13a-20091001.tar.gz

 

(10)  MultiEval:

ShortDescription: Machine Translation Evaluation Toolkit(BLEU, METEOR, TER).

URL:https://github.com/jhclark/multeval

 

(11)  NIST:

ShortDescription: Anopen MT evaluation.

URL:http://www.itl.nist.gov/iad/mig//tests/mt/

 

   Language Models:

 

(1)  IRSTLM:

Short Description:A collection of implemented algorithms anddata structures suitable to estimate, store, and access very large LMs

URL:http://hlt.fbk.eu/en/irstlm

 

(2)  KenLM:

Short Description:KenLM is a library that loads language model files andreturns probabilities.

URL:http://kheafield.com/code/kenlm/

 

(3)  RandLM:

ShortDescription:This projects dealswith space-efficient ngram-based language models built using randomizedrepresentations.

URL:http://randlm.sourceforge.net/

 

(4)  SRILM:

ShortDescription:Atoolkit for building and applying statistical language models

URL:http://www.speech.sri.com/projects/srilm/

 

   Part-of-Speech Taggers:

 

(1)   MXPOST:

Short Description:MXPOST wasdeveloped by Adwait Ratnaparkhi as part of his PhD thesis. It is a Java implementation of a maximum entropy model. It can betrained for any languagepair for with annotatedPOS data exists.

URL: ftp://ftp.cis.upenn.edu/pub/adwait/jmx/jmx.tar.gz 
 
(2)   TreeTagger:

ShortDescription: TreeTagger is atool for annotating text with part-of-speech and lemma information.

URL:http://www.ims.uni-stuttgart.de/projekte/corplex/TreeTagger/

 

   SyntacticParsers:

 

(1)   Berkeley Parser:

ShortDescription:The parser focuseson learning probabilistic context-free grammars (PCFGs) which assign a sequenceof words the most likely parse tree. The parser supports a variety of languagesand achieves state-of-the-art performance on most of them. The languages are: English, Bulgarian, Arabic, Chinese, French, and German.

URL:http://code.google.com/p/berkeleyparser/

 

(2)  BitPar:

ShortDescription:BitPar is a parser forhighly ambiguous probabilistic context-free grammars (such as treebank grammars). BitPar usesbit-vector operations to speed up the basic parsing operations byparallelization.

URL: http://www.ims.uni-stuttgart.de/tcl/SOFTWARE/BitPar.html

 

(3)  Collins:

Short Description: It’s the firststatistical parser as part of Michael Collins’s PhD thesis, which also requiresthe installation of MXPOST.

URL:http://www.cs.columbia.edu/~mcollins/

 

(4)  GenPar:

ShortDescription:It provides an architecture, a design, and an implementation of an integratedsystem for statistical machine translation by parsing.

URL:http://nlp.cs.nyu.edu/GenPar/

 

(5)  LoPar:

ShortDescription:LoPar is an implementation of a parser for head-lexicalized probabilisticcontext-free grammars, which can be also used for morphologicalanalysis.

URL:http://www.ims.uni-stuttgart.de/tcl/SOFTWARE/LoPar.html

 

   Study Videos:

(1)  Phrase-based and factored statistical machinetranslation videos:

Short Description:It’s a lectureprovided by Philipp Koehn.

URL:http://videolectures.net/aerfaiss08_koehn_pbfs/

 

(2)  Video and Lectures (视频教程大全):

ShortDescription: It’s a collection ofdifferent videos on programming language. 

URL: http://www.spjc8.com/

 

(3)  Boobooke (播布客):

Short Description:There are manylanguage study lectures on this webpage.

URL:http://www.boobooke.com/index.html

 

  • 2
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 4
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 4
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值