php similartext 中文,PHP similar_text 用法手册 | 示例代码

最新推荐文章于 2024-08-13 19:05:40 发布

bz2

最新推荐文章于 2024-08-13 19:05:40 发布

阅读量124

点赞数

文章标签： php similartext 中文

Well, as mentioned above the speed is O(N^3), i've done a longest common subsequence way that is O(m.n) where m and n are the length of str1 and str2, the result is a percentage and it seems to be exactly the same as similar_text percentage but with better performance... here's the 3 functions i'm using..

{$m=strlen($s1);$n=strlen($s2);//this table will be used to compute the LCS-Length, only 128 chars per string are considered$LCS_Length_Table= array(array(128),array(128));//reset the 2 cols in the tablefor($i=1;$i

for($j=0;$j

for ($i=1;$i<=$m;$i++) {

for ($j=1;$j<=$n;$j++) {

if ($s1[$i-1]==$s2[$j-1])$LCS_Length_Table[$i][$j] =$LCS_Length_Table[$i-1][$j-1] +1;

else if ($LCS_Length_Table[$i-1][$j] >=$LCS_Length_Table[$i][$j-1])$LCS_Length_Table[$i][$j] =$LCS_Length_Table[$i-1][$j];

else$LCS_Length_Table[$i][$j] =$LCS_Length_Table[$i][$j-1];

}

return$LCS_Length_Table[$m][$n];

}

functionstr_lcsfix($s)

{$s=str_replace(" ","",$s);$s=ereg_replace("[��]","e",$s);$s=ereg_replace("[��]","a",$s);$s=ereg_replace("[��]","i",$s);$s=ereg_replace("[��]","o",$s);$s=ereg_replace("[��]","u",$s);$s=ereg_replace("[�]","c",$s);

return$s;

}

functionget_lcs($s1,$s2)

{//ok, now replace all spaces with nothing$s1=strtolower(str_lcsfix($s1));$s2=strtolower(str_lcsfix($s2));$lcs=LCS_Length($s1,$s2);//longest common sub sequence$ms= (strlen($s1) +strlen($s2)) /2;

return (($lcs*100)/$ms);

}?>

you can skip calling str_lcsfix if you don't worry about accentuated characters and things like that or you can add up to it or modify it for faster performance, i think ereg is not the fastest way?

hope this helps.

Georges