Well, as mentioned above the speed is O(N^3), i've done a longest common subsequence way that is O(m.n) where m and n are the length of str1 and str2, the result is a percentage and it seems to be exactly the same as similar_text percentage but with better performance... here's the 3 functions i'm using..
{$m=strlen($s1);$n=strlen($s2);//this table will be used to compute the LCS-Length, only 128 chars per string are considered$LCS_Length_Table= array(array(128),array(128));//reset the 2 cols in the tablefor($i=1;$i
for($j=0;$j
for ($i=1;$i<=$m;$i++) {
for ($j=1;$j<=$n;$j++) {
if ($s1[$i-1]==$s2[$j-1])$LCS_Length_Table[$i][$j] =$LCS_Length_Table[$i-1][$j-1] +1;
else if ($LCS_Length_Table[$i-1][$j] >=$LCS_Length_Table[$i][$j-1])$LCS_Length_Table[$i][$j] =$LCS_Length_Table[$i-1][$j];
else$LCS_Length_Table[$i][$j] =$LCS_Length_Table[$i][$j-1];
}
}
return$LCS_Length_Table[$m][$n];
}
functionstr_lcsfix($s)
{$s=str_replace(" ","",$s);$s=ereg_replace("[��������]","e",$s);$s=ereg_replace("[������������]","a",$s);$s=ereg_replace("[��������]","i",$s);$s=ereg_replace("[���������]","o",$s);$s=ereg_replace("[��������]","u",$s);$s=ereg_replace("[�]","c",$s);
return$s;
}
functionget_lcs($s1,$s2)
{//ok, now replace all spaces with nothing$s1=strtolower(str_lcsfix($s1));$s2=strtolower(str_lcsfix($s2));$lcs=LCS_Length($s1,$s2);//longest common sub sequence$ms= (strlen($s1) +strlen($s2)) /2;
return (($lcs*100)/$ms);
}?>
you can skip calling str_lcsfix if you don't worry about accentuated characters and things like that or you can add up to it or modify it for faster performance, i think ereg is not the fastest way?
hope this helps.
Georges