php mysql文本相似度,如何在MYSQL中计算两个字符串之间的相似度

if i have two strings in mysql:

@a="Welcome to Stack Overflow"

@b=" Hello to stack overflow";

is there a way to get the similarity percentage between those two string using MYSQL?

here for example 3 words are similar and thus the similarity should be something like:

count(similar words between @a and @b) / (count(@a)+count(@b) - count(intersection))

and thus the result is 3/(4 + 4 - 3)= 0.6

any idea is highly appreciated!

解决方案CREATE FUNCTION `levenshtein`( s1 text, s2 text) RETURNS int(11)

DETERMINISTIC

BEGIN

DECLARE s1_len, s2_len, i, j, c, c_temp, cost INT;

DECLARE s1_char CHAR;

DECLARE cv0, cv1 text;

SET s1_len = CHAR_LENGTH(s1), s2_len = CHAR_LENGTH(s2), cv1 = 0x00, j = 1, i = 1, c = 0;

IF s1 = s2 THEN

RETURN 0;

ELSEIF s1_len = 0 THEN

RETURN s2_len;

ELSEIF s2_len = 0 THEN

RETURN s1_len;

ELSE

WHILE j <= s2_len DO

SET cv1 = CONCAT(cv1, UNHEX(HEX(j))), j = j + 1;

END WHILE;

WHILE i <= s1_len DO

SET s1_char = SUBSTRING(s1, i, 1), c = i, cv0 = UNHEX(HEX(i)), j = 1;

WHILE j <= s2_len DO

SET c = c + 1;

IF s1_char = SUBSTRING(s2, j, 1) THEN

SET cost = 0; ELSE SET cost = 1;

END IF;

SET c_temp = CONV(HEX(SUBSTRING(cv1, j, 1)), 16, 10) + cost;

IF c > c_temp THEN SET c = c_temp; END IF;

SET c_temp = CONV(HEX(SUBSTRING(cv1, j+1, 1)), 16, 10) + 1;

IF c > c_temp THEN

SET c = c_temp;

END IF;

SET cv0 = CONCAT(cv0, UNHEX(HEX(c))), j = j + 1;

END WHILE;

SET cv1 = cv0, i = i + 1;

END WHILE;

END IF;

RETURN c;

END

and for getting it as XX% use this function

CREATE FUNCTION `levenshtein_ratio`( s1 text, s2 text ) RETURNS int(11)

DETERMINISTIC

BEGIN

DECLARE s1_len, s2_len, max_len INT;

SET s1_len = LENGTH(s1), s2_len = LENGTH(s2);

IF s1_len > s2_len THEN

SET max_len = s1_len;

ELSE

SET max_len = s2_len;

END IF;

RETURN ROUND((1 - LEVENSHTEIN(s1, s2) / max_len) * 100);

END

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值