找出数组中出现次数最多的子串

23 篇文章 0 订阅

给定int数组一个,找出出现次数最多的非空子串。如果有多个这种子串,返回最长的子串。

两个子串如果包含完全相同的字符,且字符顺序也相同,那么他们相等。


Given an array of ints, find the most frequent non-empty subarray in it. If there are more than one such sub-arrays return the longest one/s.

Note: Two subarrays are equal if they contain identical elements and elements are in the same order.

For example: if input = {4,5,6,8,3,1,4,5,6,3,1}
Result: {4,5,6}

创建子串的后缀数组,并将它们排序。使用两个变量记录最长子串的长度和出现频率。

遍历已排序的数组,找到出现次数最多的数组并返回它。

1. Build a suffix array and sort the array. Use 2 variables - one to maintain the length of the longest repeated sub array and the other to maintain the frequency.

2. Traverse the sorted array to find out the most occurring and longest repeated subarray and return it.


后缀数组实际上是一个二维数组。下面是给定数组{4,5,6,8,3,1,4,5,6,3,1}的后缀数组。每一个元素是一个一维数组。

Suffix array is actually a 2D array. The suffix array for the given array {4,5,6,8,3,1,4,5,6,3,1} would be as below. Here, each element of the array itself is an array.

{4,5,6,8,3,1,4,5,6,3,1}
{5,6,8,3,1,4,5,6,3,1}
{6,8,3,1,4,5,6,3,1}
{8,3,1,4,5,6,3,1}
{3,1,4,5,6,3,1}
{1,4,5,6,3,1}
{4,5,6,3,1}
{5,6,3,1}
{6,3,1}
{3,1}
{1}
将这些后缀数组排序之后,得到:
After sorting the suffix array, you'd get:
{8,3,1,4,5,6,3,1}
{6,8,3,1,4,5,6,3,1}
{6,3,1}
{5,6,8,3,1,4,5,6,3,1}
{5,6,3,1}
{4,5,6,8,3,1,4,5,6,3,1}
{4,5,6,3,1}
{3,1,4,5,6,3,1}
{3,1}
{1,4,5,6,3,1}
{1}
通过比较前缀检查匹配的子串很容易。如果遍历上面的排序数组,比较相邻元素的相似性,得出前缀 4 5 6具有最大出现次数2,同时也为最长的子串。[6], [5,6],[3,1] and [1]与出现了两次,但是他们较短。
Checking for matching subarrays is easily done in a suffix array by comparing the prefixes. If you traverse the above sorted array and compare adjacent elements for similarity you'd see the prefix [4,5,6] is occurring maximum number(=2) of times and is also of maximum length. There are other subarrays as well, like [6], [5,6],[3,1] and [1] that are occurring 2 times, but they are shorter than the subarray [4,5,6], which is our required answer. HTH.


(下面是自己的思路:

我觉得记录频度和子串更好,一个current_subarray表示目前公共子串,一个max_subarray频率最高的最长公共子串 ,currfreq , maxfreq分布为频率计数器。

当两个相邻数组有公共子串时:

如果公共子串同current_subarray相等,那么currfreq++,current_subarray不变。

但是公共子串同current_subarray不相等时,表示上一步的current_subarray统计完毕,利用current_subarray和max_subarray比较,可能刷新max_subarray(如果current_subarray为空就不用再比较了,比较麻烦的是二者频率相等,这是就要比较长度了)。然后current_subarray设为这两个数组的公共子串,currfreq置为2,因为不知道对current_subarray的频率计算结束了没有,所以这里不用刷新max_fre,后续步骤刷新。


如果两个数组没有公共子串:

那么对于current_subarray的统计可以结束了,同max_subarray比较,可能刷新max_subarray及其计数器。然后将current_subarray设为空。

不断考查相邻数组,直至最后两个数组。最后一步要比较current_subarray和max_subarray进行可能的刷新。)




  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值