最长不完全匹配子串频率计算-eaglet 的解法_算法计算字符串子串频率差-CSDN博客

本文链接：https://blog.csdn.net/eaglet/article/details/5009997

蛙蛙上午发的一片蛙蛙推荐：[算法练习]最长不完全匹配子串频率计算， eaglet 看了以后，也写了一个算法，用蛙蛙给的两个参数测试，速度大概比蛙蛙的快800倍左右。如果字符串更长，速度差异会更明显。

算法描述：找出一个长字符串里的某个特定的子串出现的频率，匹配的子串的上一个字符和下一个字符不需要紧紧相邻，只要满足下一个字符在当前字符的后面就行。
算法要求：长字符串的宽度最长是500个字符。
输入：一个长字符串，宽度不能超过500个字符，一个短字符串
输出：短字符串在长字符串中出现的次数的后四位，不足四位左边填充零。
举例来说：在“wweell”字符串中找“wel”出现的次数，可以匹配到8次，应输出0008，每个匹配子串的索引序列分别如下
0,2,4
0,2,5
0,3,4
0,3,5
1,2,4
1,2,5
1,3,4
1,3,5

算法分析

这个题目要求输出匹配次数，如果用蛙蛙的方法，反复递归查找，算法复杂度很高，有没有办法把算法复杂度降低到 O(n) 呢，答案是有的。首先我们分析这个题目，不难看出，这个出现次数等于每个被匹配分量上出现次数的乘积。

拿上面的参数为例，被匹配字符串为 wel , 划分为 w e l 三个分量，这三个分量在原字符串中出现次数分别是 2 2 2 则其出现次数为 2*2*2 = 8

当然程序设计是还要考虑顺序的问题，不过这就是小问题了，这里就不讨论的。有了这个大思路，eaglet 编写了如下代码

public static void Eaglet( string source, string sub)

{

Console.WriteLine( string .Format( " {0:0000} " , EagletMatch(source, sub)));

}

private static int EagletMatch( string source, string sub)

{

int [] hitCountArrary = new int [sub.Length]; // sub 字串每个分类在source 中的命中次数

int i = 0 ;

bool lastMatched = false ;

// 顺序扫描source字符串

foreach ( char c in source)

{

if (c == sub[i])

{

// 如果当前值和当前分量匹配，相应分量统计加一

hitCountArrary[i] ++ ;

lastMatched = true ;

}

else

{

if (lastMatched)

{

i ++ ;

if (i >= sub.Length)

{

// 重头继续查找

i = 0 ;

}

else

{

if (c == sub[i])

{

// 如果当前值和当前分量匹配，相应分量统计加一

hitCountArrary[i] ++ ;

}

else

{

// 如不匹配，往下匹配

lastMatched = false ;

if (i >= sub.Length)

{

// 重头继续查找

i = 0 ;

}

int result = 1 ;

// 计算乘积

foreach ( int count in hitCountArrary)

{

result *= count;

}

// 输出匹配的次数

return result;

}

.csharpcode, .csharpcode pre { font-size: small; color: black; font-family: consolas, "Courier New", courier, monospace; background-color: #ffffff; /*white-space: pre;*/ } .csharpcode pre { margin: 0em; } .csharpcode .rem { color: #008000; } .csharpcode .kwrd { color: #0000ff; } .csharpcode .str { color: #006080; } .csharpcode .op { color: #0000c0; } .csharpcode .preproc { color: #cc6633; } .csharpcode .asp { background-color: #ffff00; } .csharpcode .html { color: #800000; } .csharpcode .attr { color: #ff0000; } .csharpcode .alt { background-color: #f4f4f4; width: 100%; margin: 0em; } .csharpcode .lnum { color: #606060; }

这个代码算法复杂度为 O(n) ，计算蛙蛙给的两个参数

private static string math = "welcome to cnblogs";

private static string test_input = "wweellccoommee to cnblogs";

结果为0128 和蛙蛙的结果一致，执行时间只有蛙蛙的 1/800.