字符串相似度算法（ Levenshtein Distance算法）

最新推荐文章于 2022-04-21 20:57:19 发布

Tuzki

最新推荐文章于 2022-04-21 20:57:19 发布

阅读量791

点赞数

文章标签：算法 distance matrix character string equals

/*

字符串相似度算法（Levenshtein Distance算法）

一个字符串可以通过增加一个字符，删除一个字符，替换一个字符得到另外一个字符串，假设，我们把从字符串A转换成字符串B，前面种操作所执行的最少次数称为AB相似度

如 abc adc 度为1

ababababa babababab 度为2

abcd acdb 度为

字符串相似度算法可以使用Levenshtein Distance算法(中文翻译：编辑距离算法) 这算法是由俄国科学家Levenshtein提出的。

Step Description

1 Set n to be the length of s.

Set m to be the length of t.

If n = 0, return m and exit.

If m = 0, return n and exit.

Construct a matrix containing 0..m rows and 0..n columns.

2 Initialize the first row to 0..n.

Initialize the first column to 0..m.

3 Examine each character of s (i from 1 to n).

4 Examine each character of t (j from 1 to m).

5 If s[i] equals t[j], the cost is 0.

If s[i] doesn't equal t[j], the cost is 1.

6 Set cell d[i,j] of the matrix equal to the minimum of:

a. The cell immediately above plus 1: d[i-1,j] + 1.

b. The cell immediately to the left plus 1: d[i,j-1] + 1.

c. The cell diagonally above and to the left plus the cost: d[i-1,j-1] + cost.

7 After the iteration steps (3, 4, 5, 6) are complete, the distance is found in cell d[n,m].

*/

#include <iostream>

#include <vector>

#include <string>

using namespace std;

//算法

int ldistance(const string source,const string target)

{

//step 1

int n=source.length();

int m=target.length();

if (m==0) return n;

if (n==0) return m;

//Construct a matrix

typedef vector< vector<int> > Tmatrix;

Tmatrix matrix(n+1);

for(int i=0; i<=n; i++) matrix[i].resize(m+1);

//step 2 Initialize

for(int i=1;i<=n;i++) matrix[i][0]=i;

for(int i=1;i<=m;i++) matrix[0][i]=i;

//step 3

for(int i=1;i<=n;i++)

{

const char si=source[i-1];

//step 4

for(int j=1;j<=m;j++)

{

const char dj=target[j-1];

//step 5

int cost;

if(si==dj){

cost=0;

}

else{

cost=1;

}

//step 6

const int above=matrix[i-1][j]+1;

const int left=matrix[i][j-1]+1;

const int diag=matrix[i-1][j-1]+cost;

matrix[i][j]=min(above,min(left,diag));

}

}//step7

return matrix[n][m];

}

int main(){

string s;

string d;

cout<<"source=";

cin>>s;

cout<<"diag=";

cin>>d;

int dist=ldistance(s,d);

cout<<"dist="<<dist<<endl;

}

原文：

http://www.cppblog.com/whncpp/archive/2008/09/21/62378.html

关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫

评论

被折叠的条评论为什么被折叠?

到【灌水乐园】发言

查看更多评论

添加红包

成就一亿技术人!

hope_wisdom

发出的红包

实付元

使用余额支付

点击重新获取

扫码支付

钱包余额 0

抵扣说明：

1.余额是钱包充值的虚拟货币，按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载，可以购买VIP、付费专栏及课程。