matlab源代码 语义相似度计算,如何计算两个句子之间的相似度(句法和语义)...

我应该每次取两个句子并计算它们是否相似。我的意思是,在语法和语义上都是这样。如何计算两个句子之间的相似度(句法和语义)

INPUT1:奥巴马签署法律。奥巴马签署了一项新法律。

INPUT2: 总线停在这里。 车辆停在这里。

INPUT3:纽约的火灾。 纽约被烧毁。

INPUT4:在纽约的火灾。在纽约大火中死亡50人。

我不想用本体树作为灵魂。我写了一个代码来计算句子之间Levenshtein distance(LD),然后决定是否第二个句子:

可以忽略不计(INPUT1和2),

应更换的第一句话(INPUT 3),或

与第一句(INPUT4)一起存储。

我对代码不满意,因为LD只计算语法级别(还有其他什么方法?)。语义如何融入(比如公交车就像是一辆车?)。

的代码放在这里:

%# As the difference is computed, a decision is made on the new event

%# (string 2) to be ignored, to replace existing event (string 1) or to be

%# stored separately. The higher the LD metric, the higher the difference

%# between two strings. Of course, lower difference indices either identical

%# or similar events. However, the higher difference indicates the new event

%# as a fresh event.

%#.........................................................................

%# Calculating the LD between two strings of events.

%#.........................................................................

L1=length(str1)+1;

L2=length(str2)+1;

L=zeros(L1,L2); %# Initializing the new length.

g=+1; %# just constant

m=+0; %# match is cheaper, we seek to minimize

d=+1; %# not-a-match is more costly.

% do BC's

L(:,1)=([0:L1-1]*g)';

L(1,:)=[0:L2-1]*g;

m4=0; %# loop invariant

%# Calculating required edits.

for idx=2:L1;

for idy=2:L2

if(str1(idx-1)==str2(idy-1))

score=m;

else

score=d;

end

m1=L(idx-1,idy-1) + score;

m2=L(idx-1,idy) + g;

m3=L(idx,idy-1) + g;

L(idx,idy)=min(m1,min(m2,m3)); % only minimum edits allowed.

end

end

%# The LD between two strings.

D=L(L1,L2);

%#....................................................................

%# Making decision on what to do with the new event (string 2).

%#...................................................................

if (D<=4) %# Distance is so less that string 2 seems identical to string 1.

store=str1; %# Hence string 2 is ignored. String 1 remains stored.

elseif (D>=5 && D<=15) %# Distance is larger to be identical but not enough to

%# make string 2 an individual event.

store= str2; %# String 2 is somewhat similar to string 1.

%# So, string 1 is replaced with string 2 and stored.

else

%# For all other distances, string 2 is stored along with string 1.

store={str1; str2};

end

任何帮助表示赞赏。

2010-09-07

Tinglin

+0

“语义上”。没有简单的文本书算法。自然语言(特别是英语)是一个非常复杂而反复无常的野兽。 –

2010-09-07 22:16:49

+0

@Amro:“'#'”使它们变灰,因为这里的注释是SO? –

2010-09-14 08:41:33

+0

@Lazer:是的,它的眼睛更容易..我希望StackOverflow引入了包含代码块的功能,如:'...',以便为该特定语言正确突出显示 –

2010-09-14 15:54:46

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值