Levenshtein distance (相似度)

最新推荐文章于 2022-12-30 15:29:06 发布

hloveloveu

最新推荐文章于 2022-12-30 15:29:06 发布

阅读量1.5k

点赞数

文章标签： distance integer string function delete algorithm

来源: http://www.freemedialibrary.com/index.php/Levenshtein_distance

Implementations in different languages of the Levenshtein distance algorithm.

[hide]

[ edit]

Introductory Remarks

The implementations of the Levenshtein algorithm on this page are illustrative only. Applications will, in most cases, use implementations which use heap allocations sparingly, in particular when large lists of words are compared to each other. The following remarks indicate some of the variations on this and related topics:

1. Most implementations use one- or two-dimensional arays to store the distances of prefixes of the words compared. In most applications the size of these structures is previously known. This is the case, when, for instance the distance is relevant only if it is below a certian maximally allowed distance (this happens when words are selected from a dictionary to approximately match a given word). In this case the arrays can be preallocated and reused over the various runs of the algorithm over successive words.

2. Using maximally allowed distances reduces the search. To achieve this, include the maximal distance in the parameter list to the algorithm and keep track of the shortest found distance of prefixes of the words compared.

3. Deletion/Insertion and replacement of characters can be assigned different weights. The usual choice is to set them both to 1 (this is the choice for all of the algorithms on this page). Different values for these weights allows for more flexible search strategies in lists of words.

[ edit]

Implementations

[ edit]

C++

const unsigned int cost_del = 1;
const unsigned int cost_ins = 1;
const unsigned int cost_sub = 1;

unsigned int edit_distance( const std::string& s1, const std::string& s2 )
{
  unsigned int n1 = s1.length();
  unsigned int n2 = s2.length();
  unsigned int* dist = new unsigned int[ (n1+1)*(n2+1) ];

  dist[ 0 ] = 0;

  for( unsigned int i = 1; i <= n1; ++i )
    dist[ i*(n2+1) ] = dist[ (i-1)*(n2+1) ] + cost_del;

  for( unsigned int j = 1; j <= n2; ++j )
    dist[ j ] = dist[ j-1 ] + cost_ins;

  for( unsigned int i = 1; i <= n1; ++i )
    for( unsigned int j = 1; j <= n2; ++j )
      {
        unsigned int dist_del = dist[ (i-1)*(n2+1) + (j  ) ] + cost_del;
        unsigned int dist_ins = dist[ (i  )*(n2+1) + (j-1) ] + cost_ins;
        unsigned int dist_sub = dist[ (i-1)*(n2+1) + (j-1) ] + ( s1[i-1] == s2[j-1] ? 0 : cost_sub );
        dist[ i*(n2+1) + j ] = std::min( std::min( dist_del, dist_ins ), dist_sub );
      }

  unsigned int tmp = dist[ n1*(n2+1) + n2 ];
  delete[] dist;

  return tmp;
}

Below is a slightly faster version using only linear space.

const unsigned int cost_del = 1;
const unsigned int cost_ins = 1;
const unsigned int cost_sub = 1;

unsigned int edit_distance( const std::string& s1, const std::string& s2 )
{
  unsigned int n1 = s1.length();
  unsigned int n2 = s2.length();

  unsigned int* p = new unsigned int[ n2+1 ];
  unsigned int* q = new unsigned int[ n2+1 ];
  unsigned int* r;

  p[ 0 ] = 0;
  for( unsigned int j = 1; j <= n2; ++j )
    p[ j ] = p[ j-1 ] + cost_ins;

  for( unsigned int i = 1; i <= n1; ++i )
    {
      q[ 0 ] = p[ 0 ] + cost_del;
      for( unsigned int j = 1; j <= n2; ++j )
        {
          unsigned int d_del = p[ j   ] + cost_del;
          unsigned int d_ins = q[ j-1 ] + cost_ins;
          unsigned int d_sub = p[ j-1 ] + ( s1[i-1] == s2[j-1] ? 0 : cost_sub );
          q[ j ] = std::min( std::min( d_del, d_ins ), d_sub );
      }
      r = p;
      p = q;
      q = r;
    }

  unsigned int tmp = p[ n2 ];
  delete[] p;
  delete[] q;

  return tmp;
}

[ edit]

C#

public int LevenshteinDistance(string str1, string str2)
{ 
	int[ , ] d = new int[str1.Length + 1, str2.Length + 1];

	int i, j, cost;

	for (i = 0; i < str1.Length + 1; i++)
		d[i, 0] = i;
	for (j = 0; j < str2.Length + 1; j++)
		d[0, j] = j;

	for (i = 1; i < str1.Length + 1; i++)
	{
		for (j = 1; j < str2.Length + 1; j++)
		{
			if (str1[i - 1] == str2[j - 1])
				cost = 0;
			else
				cost = 1;

			d[i, j] = minimum3(d[i - 1, j] + 1, 
								d[i, j - 1] + 1, 
								d[i - 1, j - 1] + cost);
		}
	}

	return d[str1.Length, str2.Length];
}
  
public int minimum3(int a, int b, int c)
{
	if ((a < b) && (a < c))
		return a;

	if (b < c)
		return b;

	return c;
}

[ edit]

Common Lisp

A fast Common Lisp version of the Python code below.

(defun levenshtein-distance (a b)
  "Computes the Levenshtein distance between strings a and b."
  (let ((al (length a))
        (bl (length b)))
    ;; Swap a and b if a is longer.
    (when (> al bl)
      (multiple-value-setq (a b al bl) (values b a bl al)))
    (loop
       for i below bl
       for prev = (loop for i from 0 to al collect i)
                then (copy-list current)
       for current = (cons (1+ i) (make-list al :initial-element 0))
       do (loop
             for j below al
             for (c0 . ccons) on current
             for (p1 p0) on prev
             for add = (1+ p1)
             for delete = (1+ c0)
             for change = (+ p0 (if (eql (elt a j) (elt b i))
                                    0 1))
             do (rplaca ccons (min add delete change)))
       finally (return (car (last current))))))

[ edit]

Emacs Lisp

Implementation available from the EmacsWiki.[1]

(defun levenshtein-distance (str1 str2)
  "Return the edit distance between strings STR1 and STR2."
  (if (not (stringp str1))
      (error "Argument was not a string: %s" str1))
  (if (not (stringp str2))
      (error "Argument was not a string: %s" str2))
  (let* ((make-table (function (lambda (columns rows init)
           (make-vector rows (make-vector columns init)))))
         (tref (function (lambda (table x y)
           (aref (aref table y) x))))
         (tset (function (lambda (table x y object)
           (let ((row (copy-sequence (aref table y))))
             (aset row x object)
             (aset table y row) object))))
         (length-str1 (length str1))
         (length-str2 (length str2))
         (d (funcall make-table (1+ length-str1) (1+ length-str2) 0)))
    (let ((i 0) (j 0))
      (while (<= i length-str1)
        (funcall tset d i 0 i)
        (setq i (1+ i)))
      (while (<= j length-str2)
        (funcall tset d 0 j j)
        (setq j (1+ j))))
    (let ((i 1))
      (while (<= i length-str1)
        (let ((j 1))
          (while (<= j length-str2)
            (let* ((cost (if (equal (aref str1 (1- i)) (aref str2 (1- j)))
                             0
                           1))
                   (deletion (1+ (funcall tref d (1- i) j)))
                   (insertion (1+ (funcall tref d i (1- j))))
                   (substitution (+ (funcall tref d (1- i) (1- j)) cost)))
              (funcall tset d i j (min insertion deletion substitution)))
            (setq j (1+ j))))
        (setq i (1+ i))))
    (funcall tref d length-str1 length-str2)))

[ edit]

Haskell

Because Haskell automatically memoizes results of previous calls, it is particularly suited to a simple recursive implementation:

editDistance :: Eq a => [a] -> [a] -> Int
editDistance s [] = length s
editDistance [] t = length t
editDistance (s:ss) (t:ts) = minimum [ (if s == t then 0 else 1) + editDistance ss ts,
                                       1 + editDistance ss (t:ts),
                                       1 + editDistance (s:ss) ts ]

I think this is wrong. The levenshtein algorithm should use no more than O(n²) time, and this is exponential. It's possible to write a memorizing implementation, but Haskell doesn't do it automatically. (The ruby one is wrong, too, and probably some others.)

An O(n²) version:

 distance str1 str2 =
    last $ elems $ foldl update table [ (i,j) | i <- [1..length str1] , j <- [1..length str2] ]
    where table = initial (length str1  , length str2 )
          update table (i,j) =  table // [((i,j),value)]
              where value =
                        minimum [  table ! (i-1 , j)     + 1     -- deletion
                                ,  table ! (i   , j-1)   + 1     -- insertion
                                ,  table ! (i-1 , j-1)   + cost  -- substitution
                                ]
                            where cost = if str1!!(i-1) == str2!!(j-1) then 0 else 1 
 initial (b1,b2) = array ((0,0),(b1,b2)) [ ((i,j), value (i,j)) | i <- [0 .. b1] , j <- [0..b2]]
     where value (0,j) = j
           value (i,0) = i
           value (i,j) = 0

[ edit]

Java

public class LevenshteinDistance {
    private static int minimum(int a, int b, int c) 
    {
        if (a<=b && a<=c)
            return a;
        if (b<=a && b<=c)
            return b;
        return c;
    }

    public static int computeLevenshteinDistance(String str1, String str2) {
        return computeLevenshteinDistance(str1.toCharArray(), str2.toCharArray());
    }

    private static int computeLevenshteinDistance(char[] str1, char[] str2) {
        int[][] distance = new int[str1.length+1][];

        for(int i=0; i<=str1.length; i++)
        {
            distance[i] = new int[str2.length+1];
            distance[i][0] = i;
        }
        for(int j=0; j<str2.length+1; j++)
            distance[0][j]=j;

        for(int i=1; i<=str1.length; i++)
            for(int j=1;j<=str2.length; j++)
                distance[i][j]= minimum(distance[i-1][j]+1,
                    distance[i][j-1]+1, distance[i-1][j-1] + 
                    ((str1[i-1]==str2[j-1])?0:1));
       
        return distance[str1.length][str2.length];
    }
}

[ edit]

Python

Written by Magnus Lie Hetland [2].

def distance(a,b):
    "Calculates the Levenshtein distance between a and b."
    n, m = len(a), len(b)
    if n > m:
        # Make sure n <= m, to use O(min(n,m)) space
        a,b = b,a
        n,m = m,n
        
    current = range(n+1)
    for i in range(1,m+1):
        previous, current = current, [i]+[0]*n
        for j in range(1,n+1):
            add, delete = previous[j]+1, current[j-1]+1
            change = previous[j-1]
            if a[j-1] != b[i-1]:
                change = change + 1
            current[j] = min(add, delete, change)
            
    return current[n]

[ edit]

Ruby

Because dynamic typing nature of ruby this version works with many kinds of objects. The container must implement [] method and the element '=='.

def editDistance(a, b)
  case
    when a.empty?: b.length
    when b.empty?: a.length
    else [(a[0] == b[0] ? 0 : 1) + editDistance(a[1..-1], b[1..-1]),
          1 + editDistance(a[1..-1], b),
          2 + editDistance(a, b[1..-1])].min
  end
end

--Jonas Fagundes 01:29:28, 2005-08-22 (UTC)

More efficient solution at http://raa.ruby-lang.org/gonzui/source/levenshtein/levenshtein.rb

[ edit]

Scheme

Uses srfi-25 and srfi-42

(define add1 (lambda (x) (+ x 1)))
(define sub1 (lambda (x) (- x 1)))

(define levenshtein-distance
  (lambda (s1 s2)
    (let* ((width (add1 (string-length s1)))
           (height (add1 (string-length s2)))
	   (d (make-array (shape 0 height 0 width) 0)))
      (do-ec (:range x width) (array-set! d 0 x x))
      (do-ec (:range y height) (array-set! d y 0 y))
      (do-ec (:range x (string-length s1))
	     (:range y (string-length s2))
             (array-set!
              d (add1 y) (add1 x)
              (min
               (add1 (array-ref d y (add1 x)))
               (add1 (array-ref d (add1 y) x))
               (+ (array-ref d y x)
                  (if (eqv? (string-ref s1 x)
                            (string-ref s2 y))
                      0
                      1)))))
      (displarray d)
      (array-ref d (sub1 height) (sub1 width)))))

A functional implementation is by Rares Vernica [3]

(define (edit-distance- a b)
  (if (= (length a) 0)
      (length b)
      (if (= (length b) 0)
          (length a)
          (min (+ (edit-distance- (cdr a) (cdr b))
                  (if (eq? (car a) (car b)) 0 1))
               (+ (edit-distance- a (cdr b))
                  1)
               (+ (edit-distance- (cdr a) b)
                  1)))))

(define (edit-distance a b)
  (edit-distance- (string->list a) (string->list b)))

[ edit]

Prolog

Contributed by Stephan Maier. This is an implementation in Prolog which makes use only of ISO-Prolog with the exception of the append(?list, ?list, ?list)-rule which, however, is available on most prolog implementations. The implementation of the algorithm is recursive but the recursion only serves to simulate two loops over the length of the words compared. Efficiency of the implementation is contrasted with rather large parameter set for the rule 'lev(...)'.

% Implementation of the computation of the leveshtein distance of two words.
% This implementation works on all ISO-compatible implemenations of Prolog
% which implement append(?list, ?list, ?list) in the usual way.

% levenshtein(+atom, +atom, -integer): Computes the Levenshtein distance between two given words
% W1: Atom
% W2: Atom
% D : Distance
levenshtein(W1, W2, D) :- atom_length(W1, 0), atom_length(W2, L), D is L, !.
levenshtein(W1, W2, D) :- atom_length(W2, 0), atom_length(W1, L), D is L, !.
levenshtein(W1, W2, D) :- atom_length(W1, L1), initialList(L1, StartRow), lev(W1, W2, [], StartRow, 1, 1, 0, D), !.

% lev(+W1, +W2, +CR, +LR, +P1, +P2, +CD, -D): Rule which represents the actual calculation. This rule
% works recursively simulating a loop over the positions (P1, P2) in the words W1 and W2. The outer
% loop is over P2, the inner loop is over P1. To keep track of previously computed distances two lists 
% are used. 
% W1: Atom
% W2: Atom
% CR: Start of the row (current row)
% LR: Remainder of last row (last row)
% P1: Position in word W1
% P2: Position in word W2
% CD: Current value of distance; this is the last element of CR 
% D : Value of distance

% Verify the end-condition for the recursion first: This is achieved when the P2-counter exceeds the length of W2
lev(_, W2, _, _, _, P2, CD, D) :- atom_length(W2, L2), P2 > L2, D is CD, !.

% The initial rule after P2 has been increased by 1
lev(W1, W2, [], LR, 1, P2, _, D) :- CR = [P2], lev(W1, W2, CR, LR, 1, P2, P2, D), !.

% This is the actual computational rule
lev(W1, W2, CR, [H1, H2|LR], P1, P2, CD, D) :-
		% T1 is the distance at (P1, P2) obtained from removing the letter at P1 from W1 
		T1 is CD + 1,
		% T2 is the distance at (P1, P2) obtained from adding the letter at P2 from W2 
		T2 is H2 + 1,
		charAt(W1, P1, C1), charAt(W2, P2, C2),
		% T3 is the distance at (P1, P2) obtained from exchanging the letters at P1 from W1 and P2 from W2
		(  C1 = C2 -> T3 is H1; T3 is H1 + 1 ),
		% The distance at (P1, P2) is the minimum of the previously computed distances
		min(T1, T2, T3, NCD), 
		% Compute the next value for CR, increase the P1 position by i and the recursively call lev(...)
		append(CR, [NCD], NCR), NP1 is P1 + 1,
		lev(W1, W2, NCR, [H2|LR], NP1, P2, NCD, D), !.

% This rule verifies the end of line condition for the loop over P1 
lev(W1, W2, CR, [_], _, P2, CD, D) :- NP2 is P2 + 1, lev(W1, W2, [], CR, 1, NP2, CD, D), !.

% initialList(+N, -L): Returns a list L of the form [0, 1, 2, ..., N]. 
initialList(N, _) :- (/+ integer(N); N < 1) -> fail.
initialList(1, L) :- L = [0, 1], !.
initialList(N, L) :- N1 is N - 1, L2 = [N], initialList(N1, L1), append(L1, L2, L).

% charAt(+A, ?N, -C): Returns the character at position N in the atom A
% The position is 1-based
% A: The atom
% N: The position at which to extract the character
% C: The character of A at position N 
charAt(A, N, C) :- P is N - 1, sub_atom(A, P, 1, _, C).

% min(...): These rules compute the minimum of the given integer values
% I1, I2, I3: Integer values
% M:          The minimum over the values
min(I1, I2, M) :- integer(I1), integer(I2), ( I1 =< I2 -> M is I1; M is I2).
min(I1, I2, I3, M) :- min(I1, I2, A), min(I2, I3, B), min(A, B, M).

[ edit]

Visual Basic .NET

Imports System.Math

Public Module Levenshtein
        Public Function LevenshteinDistance(ByVal a As String, ByVal b As String) As Integer

            Dim d As Integer(,) = New Integer(a.Length + 1, b.Length + 1) {}
            Dim i As Integer
            Dim j As Integer
            Dim insert_penalty As Integer = 1
            Dim delete_penalty As Integer = 1
            Dim substitute_penalty As Integer = 1
            Dim cost As Integer

            For i = 0 To a.Length
                d(i, 0) = i
            Next
            For j = 0 To b.Length
                d(0, j) = j
            Next

            For i = 1 To a.Length
                For j = 1 To b.Length
                    If (a.Chars(i - 1) = b.Chars(j - 1)) Then
                        cost = 0
                    Else
                        cost = substitute_penalty
                    End If
                    d(i, j) = Min(Min(d(i - 1, j) + insert_penalty, d(i, j - 1) + delete_penalty), d(i - 1, j - 1) + cost)
                Next
            Next

            Return d(a.Length, b.Length)

        End Function

        ' This may also be useful to some
        ' Just split two strings into arrays on spaces or on a lot of different things

        ' A good use would be if you want to check that a user input
        ' a name correctly or misspelled/misordered the name
        Public Function StringArrayLevenshteinDistance(ByVal first As String(), ByVal second As  String()) As Integer
            ' assume first is search string
            ' assume second is find string
            Dim j As Integer
            Dim k As Integer
            Dim value As Integer

            Dim bestMatchIndex = -1
            Dim smallestWordLevenshtein As Integer = -1
            Dim strLevenshtein As Integer = 0

            ' go through the parts of the search string
            For j = 0 To first.Length - 1
                ' go through the parts of the find string
                For k = 0 To second.Length - 1
                    ' find the part that is closest to the one being searched for
                    If (first(j) <> "" And second(k) <> "") Then
                        value = LevenshteinDistance(first(j), second(k))
                        If (value < smallestWordLevenshtein Or smallestWordLevenshtein = -1) Then
                            smallestWordLevenshtein = value
                            bestMatchIndex = k
                        End If
                    End If
                Next

                ' add the smallest lev vaule found to the return value for the strings
                If (smallestWordLevenshtein <> -1) Then
                    strLevenshtein += smallestWordLevenshtein
                    smallestWordLevenshtein = -1
                End If
                ' and drop that section of this string so it's not used later
                If (bestMatchIndex <> -1) Then
                    second.SetValue("", bestMatchIndex)
                    bestMatchIndex = -1
                End If

            Next

            Return strLevenshtein

        End Function
End Module

[ edit]

Visual FoxPro

loFuzzy = CREATEOBJECT("FuzzyMatch")
? loFuzzy.EditDistance("kitten", "sitting")

define class FuzzyMatch as custom

   ** <summary>
   ** Computes the edit distance between two strings
   ** using the Levenshtein algorithm. The edit distance is how many
   ** changes need to be made to make them identical. The edit distance
   ** for identical strings is 0.0, the edit distance for completely
   ** different strings depends on what letters need to change or
   ** be added in order to make them the same.
   ** </summary>
   function EditDistance ( ;
         sSource as string, ;
         sTarget as string ) as decimal
      local ;
         iLenSource as integer, ;
         iLenTarget as integer, ;
         iRow as integer, ;
         iCol as integer

      iLenSource = len( sSource )
      iLenTarget = len( sTarget )

      if iLenSource == 0.0
         return iLenTarget
      endif
      if iLenTarget == 0.0
         return iLenSource
      endif
     
      if iLenSource < iLenTarget
         sTemp = sSource
         sSource = sTarget
         sTarget = sTemp
         
         iLenTemp = iLenSource
         iLenSource = iLenTarget
         iLenTarget = iLenTemp
      endif

      dimension Levenshtein( iLenTarget+1, iLenSource+1 )

      for iRow = 1 to iLenTarget + 1
         Levenshtein[iRow,1] = iRow -1
      endfor

      for iCol = 1 to iLenSource + 1
         Levenshtein[1,iCol] = iCol -1
      endfor

      for iCol = 2 to iLenSource + 1
         for iRow = 2 to iLenTarget + 1
            local ;
               dCost as decimal, ;
               cColChar as string, ;
               cRowChar as string

            dCost = 0.0

            cColChar = substr( sSource, iCol-1, 1 )
            cRowChar = substr( sTarget, iRow-1, 1 )

            if cColChar != cRowChar
               dCost = 1.0
            endif

            Levenshtein[iRow,iCol] = ;
               min( ;
               Levenshtein[iRow-1,iCol] + 1, ;
               Levenshtein[iRow,iCol-1] + 1, ;
               Levenshtein[iRow-1,iCol-1] + dCost )
         endfor
      endfor
      return Levenshtein[iLenTarget+1,iLenSource+1]
   endfunc

   ** <summary>
   ** Compares two strings and returns the % match using
   ** the EditDistance function.
   ** </summary>
   function PercentMatch( ;
         sSource as string, ;
         sTarget as string ) as decimal
      local ;
         iLenSource as integer, ;
         iLenTarget as integer, ;
         dResult as decimal, ;
         dEditDistance as decimal

      iLenSource = len( sSource )
      iLenTarget = len( sTarget )

      if iLenSource + iLenTarget == 0
         return 0.0
      else
         dEditDistance = this.EditDistance( sSource, sTarget)
         dResult = (1.0-(dEditDistance/max(iLenSource,iLenTarget))) * 100.0
         return dResult
      endif
   endfunc
enddefine

[ edit]

Actionscript 2.0

Written by Moorthy S V [4]

class LevenshteinDistance {
  function LevenshteinDistance() {
  }

  private function compute(a:Array, b:Array) {
     if(typeof(a) == "string" || typeof(b) == "string") {
        trace("FATAL ERROR : String found when array needed");
        return false;
     }

     var distance = new Array();	
     for(var g=0; g<a.length+1; g++) {
        distance[g] = 0;
     }

     for(var i=0; i<=a.length; i++) {
        distance[i] = new Array();   
        for(g=0; g<b.length+1; g++) {
           distance[i][g] = 0;
        }
        distance[i][0] = i;
     }

     for(var j=0; j<=b.length+1; j++) { 
        distance[0][j] = j;
     }

     for(var k=1; k<=a.length; k++) {
        for(var m=1; m<=b.length; m++) {
           var x = Number(distance[k-1][m])+1;
           var y = Number(distance[k][m-1])+1;
           var z = Number(distance[k-1][m-1])+((a[k-1]==b[m-1]?0:1));
           distance[k][m] = this.minimum(Number(x), Number(y), Number(z));
        }
     }   
     return distance[a.length][b.length];		
  }

  private function minimum(a:Number, b:Number, c:Number) {
     var result:Number = -1;
     if (Number(a)<=Number(b) && a<=Number(c)) {
        result = Number(a);
     }       

     if (Number(b)<=Number(a) && Number(b)<=Number(c)) {
        result = Number(b);
     }

     if(result == -1 ) {
        result =  Number(c);
     }

     return result;		
  }
}

[ edit]

Perl

printf("/ndistance is: %d", dist("mama", "tatko"));

sub dist{
	my($str1, $str2)=@_;
	my($i, $j, $cost, @temp, @d);
	

	for($i=0; $i<length($str1)+1; $i++){
		for($j=0; $j<length($str2)+1; $j++){
			push(@temp, 0);
		}
		
		push(@d, [@temp]);
	}
	
	for($i=0; $i<length($str1)+1; $i++){
		$d[$i][0]=$i;
	}
	
	for($j=0; $j<length($str2)+1; $j++){
		$d[0][$j]=$j;
	}
	
	
	for($i=1; $i<length($str1)+1; $i++){
		for($j=1; $j<length($str2)+1; $j++){
			if(substr($str1, $i-1, 1) eq substr($str2, $j-1, 1)){
				$cost=0;
			}
			else { $cost=1; }
			
			$d[$i][$j]=min($d[$i-1][$j]+1, $d[$i][$j-1]+1, $d[$i-1][$j-1]+$cost);
		}
	}
	
	
	return $d[length($str1)][length($str2)];

}

sub min{
	Temy($m, $i);
	$m=9999;
	for($i=0; $i<@_; $i++){
		if($_[$i]<$m){ $m=$_[$i]; }
	}
	
	return $m;

}

[ edit]

Ocaml

Uses linear space, and can be curried with just one string to factor arrays allocations

 (* Written by Pierre Etchemaite <petchema@concept-micro.com> *)
 
 type levenshtein_costs = {
   insert_cost : int;
   delete_cost : int;
   replace_cost : int;
 }
 
 let levenshtein_distance lc =
   fun s1 ->
     let l1 = String.length s1 in
     let m = Array.make (l1 + 1) 0 in
     let n = Array.make (l1 + 1) 0 in
     fun s2 ->
       let l2 = String.length s2 in
       m.(0) <- 0;
       for i = 1 to l1 do
         m.(i) <- m.(i - 1) + lc.delete_cost
       done;
       let rec aux j m n =
         if j = l2 then m.(l1)
         else
           let c2 = s2.[j] in
           n.(0) <- m.(0) + lc.insert_cost;
           for i = 1 to l1 do
             let i1 = i - 1 in
             n.(i) <-
               min
                 (min (n.(i1) + lc.delete_cost)
                      (m.(i) + lc.insert_cost))
                 (m.(i1) + 
                   (if s1.[i1] <> c2 then lc.replace_cost else 0))
           done;
           aux (j + 1) n m in
       aux 0 m n

[ edit]

Lua

function levenshtein(string1, string2)
	local str1, str2, distance = {}, {}, {};
	str1.len, str2.len = string.len(string1), string.len(string2);
	string.gsub(string1, "(.)", function(s) table.insert(str1, s); end);
	string.gsub(string2, "(.)", function(s) table.insert(str2, s); end);
	for i = 0, str1.len do distance[i][0] = i; end
	for i = 0, str2.len do distance[0][i] = i; end
	for i = 1, str1.len do
		for j = 1, str2.len do
			local tmpdist = 1;
			if(str1[i-1] == str2[j-1]) then tmpdist = 0; end
			distance[i][j] = math.min(
				distance[i-1][j] + 1, distance[i][j-1]+1, distance[i-1][j-1] + tmpdist);
		end
	end
	return distance[str1.len][str2.len];
end

Retrieved from " http://www.freemedialibrary.com/index.php/Levenshtein_distance"

hloveloveu

关注

0
点赞
踩
2

收藏

觉得还不错? 一键收藏
0
评论
Levenshtein distance (相似度)

Levenshtein distanceFrom Free media libraryImplementations in different languages of the Levenshtein distance algorithm.
复制链接

扫一扫