复述(paraphrases)

 总结了一些海量数据课程所学的东西。

一.复述的定义:同一个意思的不同表达

二.复述的分类

按照粒度可以分为Surface ParaphrasesStructural paraphrases. Surface Paraphrases 有词汇,短语,句子,论述四个级别。Structural paraphrases有模式和搭配两个级别。按照复述的风格可以分为细小变化,短语替换,短语重排序,句子分割和合并,复杂复述。

三.复述的应用:

机器翻译:

Translate unknown terms (phrases)

Expand training data

Rewrite input sentences

Improve automatic evaluation

Tune parameters

问答系统,信息提取,信息检索,总结,自然语言生成。

 

四.复述的识别:基于分类的方法和基于对齐的方法

4.1   典型的基于分类的方法:

1 Brockett and Dolan, 2005

 特点:String相似特征:句子长度,单词覆盖,编辑距离

           形态变体

wordNet 词汇映射

词关系对:同义词

 分类器:SVM

2. Finch et al., 2005:使用机器翻译评估方法计算句子的相似度

 

Feature vector vec(s1, s2)vec1(s1, s2): s1as reference, s2as MT system output;

vec2(s1, s2): s2as reference, s1as MT system output;

vec(s1, s2): average of vec1(s1, s2) and vec2(s1, s2)

3.Malakasiotis, 2009

  组合了多种分类方法

String similarity (various levels)

Tokens, stems, POS tags, nouns only, verbs only, …

Different measures

Edit distance, Jaro-Winkler distance, Manhattan distance…

同义词相似度

Treat synonyms in two sentences as identical words

句法相似度

Dependency parsing of two sentences and compute the overlap of dependencies

4.2  基于对齐的方法:

1Wu, 2005

Conduct alignment based on Inversion Transduction Grammars (ITG)

对句子结构敏感,不用任何词库处理词汇变化

性能和基于分类方法差不多,识别文本蕴含时性能也很好

2Das and Smith, 2009

Conduct alignment based on Quasi-Synchronous Dependency Grammar (QG)

Alignment between two dependency trees

Assumption: the dependency trees of two paraphrase sentences should be aligned closely

Summary:

Classification based method is still the mainstream method, since:

Binary classification problem is well defined;

Classification algorithms and tools are readily available;

It can combine various features in a simple way;

It achieves state-of-the-art performance.

五.复述提取

1.    词典

2.    单语平行语料库

3.    单语可比语料库

4.    双语平行语料库

4.1  Takao et al., 2002

Basic idea:

Generating lexical paraphrases using 2-way dictionaries

English word e1can be translated to a Japanese word jwith an E-J dic. D1, and then jcan be translated back to an English word e2with a J-E dictionary D2. e1and e2are extracted as paraphrases

4.2  Bannard and Callison-Burch, 2005

Word alignment and phrase extraction

Basic assumption:

If two English phrases e1and e2can be aligned with the same foreign phrase f, e1and e2are likely to be paraphrases.

4.3  Callison-Burch, 2008Basic idea:Two paraphrase phrases should have the same syntactic type.

Syntactic constraints are also used when substituting paraphrases in sentences

 

4.4  Kok and Brockett, 2010 Basic idea:Convert aligned phrases into a graph, extract paraphrases based on random walks and hitting times

5.    网络语料库

6.    词典注解

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值