Challenges
- Lexical Ambiguity
- Differing Word Orders
- Syntactic Structure is not Preserved
- Syntactic Ambiguity
- pronoun resolution
Classical MT (rule-based)
- Direct MT
– word by word, no syntactic or semantic analysis
– long-range reorderings - transfer-based approaches
– analysis, transfer, generation - interlingua based translation
– analysis into lang independent representation & generation
– what is representation: intersection of breaking concepts
Statistical MT
- parallel corpus
- Noisy Channel Model
– argmaxep(e)p(f|e) - IBM model 1
– alignment a: (l+1)m
– a∗=argmaxap(a|f,e,m)
– p(f, a | e,m)= p(a | e,m) p(f | a,e,m)
– p(f | e,m)
– translation prob: p(f|a,e,m)=∏mj=1t(fj|eaj)
– the generative process: p(f,a|e,m)=p(a|e,m)p(f|a,e,m)=1(l+1)m∏mj=1t(fj|eaj)