[NLP]论文笔记Paraphrasing With Bilingual Parallel Corpora 双语平行语料库释义

最新推荐文章于 2023-07-29 23:31:45 发布

idevede

最新推荐文章于 2023-07-29 23:31:45 发布

阅读量9.8k

点赞数

分类专栏：数据挖掘机器学习 NLP 文章标签： nlp 自然语言处理双语平行语料库

本文链接：https://blog.csdn.net/idevede/article/details/80098631

版权

机器学习同时被 3 个专栏收录

7 篇文章 0 订阅

订阅专栏

数据挖掘

5 篇文章 0 订阅

订阅专栏

NLP

3 篇文章 0 订阅

订阅专栏

在复述模型的研究方面 ,Bannard 和 Callison 2 Burch基于双语平行语料提出了一种复述模型 [10] , 该模型利用外文翻译作为“枢轴”来计算短语 e 2 是 e 1 的复述的概率 P( e 2 | e 1 ) . 具体地 , 设 f 是 e 2 和 e 1 共有的外文翻
译 , 则该模型通过计算 P(f | e 1 ) 和 P( e 2 |f)的乘积来得到 P( e 2 | e 1 ) . 同时 , 该模型还结合语言模型来计算 e 2出现在给定上下文中的概率

基于枢轴的方法应用于大规模的双语平行语料库 [58] .他们首先使用机器翻译中的词对齐和短语抽取技术从双语平行语料库中抽取出短语翻译对,然后利用外文翻译作为枢轴抽取英文复述短语.设英文短语 e 1 和 e 2 对应的枢轴为 f,则该方法将 e 1 到 f 的翻译概率和 f 到 e 2 的翻译概率的乘积作为 e 1 到 e 2 的复述概率.受该研究的启发,我们将基于枢轴的方法用于复述模板的抽取 [4] .我们使用对数线性(log-linear)模型计算复述概率,抽取出像 consider X 和 take X into consideration 这样的复述模板.

单语平行语料库的稀缺性。the narrow range of text genres available for monolingual parallel corpora limits the range of contexts in which the paraphrases can be used.

拓展了一种翻译方法：phrase-based statistical machine translation
The essence of our method is to align phrases in a bilingual parallel corpus, and equate different English phrases that are aligned with the same phrase in the other language.
Section 2: we rank the extracted paraphrases with a probability assignment
Section 3 describes our experimental setup and includes information about how phrases were selected, how we manually aligned parts of the bilingual corpus, and how we evaluated the para- phrases.
Section2: 对齐短语：align phrases within sentence pairs
2.1 statistical machine translation techniques are used to align phrases within sentence pairs in a bilingual corpus
哪些统计技术
-- recent phrase-based approaches to statistical machine translation

（1） The original formulation of statistical machine translation (Brown et al., 1993) was defined as a word-based operation.
（2） More recent approaches to statistical translation calculate the translation probability using larger blocks of aligned text.
We use the heuristic for phrase alignment described in Och and Ney (2003) which aligns phrases by incrementally build- ing longer phrases from words and phrases which
have adjacent alignment points.
（3）

2.2如何计算释义的概率：
公式3：通过计算短语e和f在平行语料库中对齐的频率，计算最大似然函数。
公式4：S allows us to re-rank the candidate paraphrases based on additional contextual information
最终：We produced automatic alignments for it with the Giza++ toolkit (Och and Ney, 2003)
we also developed a gold standard of word alignments for the set of phrases that we wanted to paraphrase
为每个短语提取多种可能的释义：our method frequently extracts more than one possible paraphrase for each phrase.

2.1NLP中的对齐短语对。
2.2 翻译模型概率，，最大似然估计。

拓展释义概率。

补充：协同训练框架图。

现有的语料库建设主要表现出以下共同特点：第一，句子层面实现对齐，方便了对特定语言转换现象的大规模观察与分析；第二，自动标注与人工标注相结合，使得相关研究得以从形式到语义、语用、文体等方面深入。基于平行语料库的翻译研究主要集中在三个方面：第一，语料库建构技术探索。主要探讨如何运用计算机技术来研制语料库，尤其是对汉语文本的加工、英汉对齐的处理以及手工标注介入等问题；第二，基于语料库的实证研究和理论探讨。以翻译共性为例，相关的实证研究不仅关注单一类比模式（ｔｈｅｃｏｍｐａｒａｂｌｅｍｏｄｅ）下目标语中翻译文本与非翻译文本之间的差异，而且也将源文本作为分析和解释翻译文本中特定语言转换现象的一个维度。既有对翻译语言宏观特征的探究，也有对具体语言转换的考察。第三，平行语料库在翻译教学中的应用，具体包括网络检索平台的辅助翻译教学和自建语料库在课堂教学中的运用等。

idevede

关注

0
点赞
踩
2

收藏

觉得还不错? 一键收藏
0
评论
[NLP]论文笔记Paraphrasing With Bilingual Parallel Corpora 双语平行语料库释义

在复述模型的研究方面 ,Bannard 和 Callison 2 Burch基于双语平行语料提出了一种复述模型 [10] , 该模型利用外文翻译作为“枢轴”来计算短语 e 2 是 e 1 的复述的概率 P( e 2 | e 1 ) . 具体地 , 设 f 是 e 2 和 e 1 共有的外文翻译 , 则该模型通过计算 P(f | e 1 ) 和 P( e 2 |f)的乘积来得到 P( e 2 | e...
复制链接

扫一扫

专栏目录