How to calculate Levenshtein Distance matrix of strings in Python
str1 str2 str3 str4 ... strn
str1 0.8 0.4 0.6 0.1 ... 0.2
str2 0.4 0.7 0.5 0.1 ... 0.1
str3 0.6 0.5 0.6 0.1 ... 0.1
str4 0.1 0.1 0.1 0.5 ... 0.6
. . . . . ... .
. . . . . ... .
. . . . . ... .
strn 0.2 0.1 0.1 0.6 ... 0.7
Using Ditance function we can calculate distance betwwen 2 words. But here I have 1 list containing n number of strings. I wanted to calculate distance matrix after that I want to do clustering of words.
解决方案
Just use the pdist version that accepts a custom metric.
Y = pdist(X, levensthein)
and for the levensthein then you can use the implementation of rosettacode as suggested by Tanu
If you want a full squared matrix just use squareform on the result:
Y = scipy.spatial.distance.squareform(Y)