计算机应用类词语翻译,基于上下文的词语相似度计算及其应用-计算机应用技术专业论文.docx...-CSDN博客

本文针对自然语言处理中的词语相似度和关系相似度计算，提出了一种结合语义资源与统计方法的语义相似度算法，以及基于潜在语义索引的关系相似度算法。通过国家公务员考试替换题型测试集验证了语义相似度算法的效果，并在专利语料中应用关系相似度算法进行关系分类，提高了6%的分类准确率。此外，还实现了FAQ相似问句检索系统和实体关系分类系统来进一步验证算法的有效性。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

沈阳航空工业学院硕士学位论文摘

沈阳航空工业学院硕士学位论文

摘要

词语之间相互关系的量化方法是自然语言处理的重要研究内容，在信息检索、词义消歧、机器翻译等自然语言处理领域都有广泛的应用。本文以知网为基础，研究和探讨了词语的语义相似度和关系相似度的度量方法，提出了语义与统计相融合的语义相似度算法和基于潜在语义索引的关系相似度算法，改进了相似度的计算结果，具体内容体现如下：

现有的语义和关系相似度算法主要分为基于语义资源和基于统计两类方法，前者利用人工构建的语义词典或语义网络计算相似度，而后者完全是数据驱动的方式，即从大规模的语料中统计与词语共现的上下文信息以计算其相似度。本文研究知网的语义相似度计算方法，针对其在计算异类义原词语间相似度效果不佳的不足，提出一种语义与统计相融合的语义相似度算法，以改善最终的语义相似度计算结果。本文引入国家公务员考试的替换题型作为中文词语相似度算法的测试集，在一定程度上解决该类问题缺少公共中文测试集的问题，在该测试集对不同语义相似度算法进行对比，本算法取得了较好的实验结果。

针对传统的无监督或半监督的关系相似度计算中难以解决的数据稀疏问题，本文使用知网进行同义词扩展，运用奇异值分解降维去除噪声，从而提出一种基于潜在语义索引的关系SN他,t度算法，最终在专利语料中进行关系分类实验，较传统的SVM分类准确率提高6％，达到44％。

为进一步验证本文提出的两种相似度算法的有效性，本文实现了FAQ的相似问句检索系统和实体关系分类系统，并对上述两种词语SNn_,t度算法进行相应实验。

关键词：词语相似度；关系相似度；潜在语义索引；知网

沈阳航空1：业学院硕十学位论文Abstract

沈阳航空1：业学院硕十学位论文

Abstract

The complex relationship between the natural language words needs to be dealt with quantitative analysis practically．This paper introduces two kinds of word similarity algorithm， one is semantic similarity between words，and another is relation similarity between pairs of words．Either of them is widely used in the field of natural language processing，such as

information retrieval，information extraction，text classification，word sense disambiguation

and machine translation based on examples．

The existing semantic similarity and relation similarity are mainly divided into two types： semantic resource and statistic，the former algorithm calculates the similarity based on manual semantic dictionary,and the latter is in a data·driven way completely,which means

finding out the word occurrence information in the context from a large corpus．This paper studies the word similarity algorithm based on Hownet and many other statistical word similarity algorithms，and in order to solve the problem of the words whose kinds of sememe are different，a new similarity algorithm based on the combination of semantics with statistics

is proposed．It is the first time to use the word alternation in national official tests to prove the

efficiency of the alg