给出一个字符串 S,考虑其所有重复子串(S 的连续子串,出现两次或多次,可能会有重叠)。
返回任何具有最长可能长度的重复子串。(如果 S 不含重复子串,那么答案为 ""。)
示例 1:
输入:"banana"
输出:"ana"
示例 2:
输入:"abcd"
输出:""
提示:
2 <= S.length <= 10^5
S 由小写英文字母组成。
来源:力扣(LeetCode)
链接:https://leetcode-cn.com/problems/longest-duplicate-substring
著作权归领扣网络所有。商业转载请联系官方授权,非商业转载请注明出处。
方法一:二分查找 + Rabin-Karp 字符串编码
分析
我们可以把这个问题分解成两个子问题:
从 1 到 N 中选取子串的长度 L;
检查字符串中是否存在长度为 L 的重复子串。
子任务一:二分查找
解决子问题一的最简单的方法是,我们从 L = N - 1 开始,依次递减选取子串的长度,并进行判断。如果发现存在长度为 L 的重复子串,就表示 L 是最长的可能长度。
但我们发现,如果字符串中存在长度为 L 的重复子串,那么一定存在长度为 L0 < L 的重复子串(选取长度为 L 的重复子串的某个长度为 L0 的子串即可),因此我们可以使用二分查找的方法,找到最大的 L。
子任务二:Rabin-Karp 字符串编码
我们可以使用 Rabin-Karp 算法将字符串进行编码,这样只要有两个编码相同,就说明存在重复子串。对于选取的长度 L:
使用长度为 L 的滑动窗口在长度为 N 的字符串上从左向右滑动;
检查当前处于滑动窗口中的子串的编码是否已经出现过(用一个集合存储已经出现过的编码);
若已经出现过,就说明找到了长度为 L 的重复子串;
若没有出现过,就把当前子串的编码加入到集合中。
作者:LeetCode
链接:https://leetcode-cn.com/problems/longest-duplicate-substring/solution/zui-chang-zhong-fu-zi-chuan-by-leetcode/
来源:力扣(LeetCode)
著作权归作者所有。商业转载请联系作者获得授权,非商业转载请注明出处。
package leetCodeTest;
import java.util.HashSet;
public class 最长重复子串 {
public static void main(String[] args) {
String str = "shabhlesyffuflsdxvvvoiqfjacpacgoucvrlshauspzdrmdfsvzinwdfmwrapbzuyrlvulpalqltqshaskpsoiispneszlcgzvygeltuctslqtzeyrkfeyohutbpufxigoeagvrfgkpefythszpzpxjwgklrdbypyarepdeskotoolwnmeibkqpiuvktejvbejgjptzfjpfbjgkorvgzipnjazzvpsjxjscekiqlcqeawsdydpsuqewszlpkgkrtlwxgozdqvyynlcxgnskjjmdhabqxbnnbflsscammppnlwyzycidzbhllvfvheujhnxrfujwmhwiamqplygaujruuptfdjmdqdndyzrmowhctnvxryxtvzzecmeqdfppsjczqtyxlvqwafjozrtnbvshvxshpetqijlzwgevdpwdkycmpsehxtwzxcpzwyxmpawwrddvcbgbgyrldmbeignsotjhgajqhgrttwjesrzxhvtetifyxwiyydzxdqvokkvfbrfihslgmvqrvvqfptdqhqnzujeiilfyxuehhvwamdkkvfllvdjsldijzkjvloojspdbnslxunkujnfbacgcuaiohdytbnqlqmhavajcldohdiirxfahbrgmqerkcrbqidstemvngasvxzdjjqkwixdlkkrewaszqnyiulnwaxfdbyianmcaaoxiyrshxumtggkcrydngowfjijxqczvnvpkiijksvridusfeasawkndjpsxwxaoiydusqwkaqrjgkkzhkpvlbuqbzvpewzodmxkzetnlttdypdxrqgcpmqcsgohyrsrlqctgxzlummuobadnpbxjndtofuihfjedkzakhvixkejjxffbktghzudqmarvmhmthjhqbxwnoexqrovxolfkxdizsdslenejkypyzteigpzjpzkdqfkqtsbbpnlmcjcveartpmmzwtpumbwhcgihjkdjdwlfhfopibwjjsikyqawyvnbfbfaikycrawcbkdhnbwnhyxnddxxctwlywjcisgqfsctzatdgqqauuvgclicdrpjcphysqdjaflpdbmvnhqggixxzcmpsysbwfkzwxzjictnngufpqhcxlbkodyrqlfomlkiefbmcfenugzqnyqqvgpxonmizkpjdlaqyyowjagzkzrzvcrupfyofeftyfvoqorzvxphhdhydnqiyiczfcgzsecxzsoaobwrixcajabjnvtoerzwayjowahrmuixmmkbtchogfizmvbjnpespxngxjxntohzatlpkcmpphmewevpteharnszafbpbexrvnbedieojezdhnyooiivhnhakilvkobxepbksnqrtxxuqhalvtjspyvporalbliiwjciamlhttaydhxoelimuorjnfvebjhcocbkrgbguwdncodskzzoqrzgavsbjcippetltqaxjhkqacwlgmsbxezqubyzeznnsoqegkykzlxohvitbmjcxllbrvgdijyovpjyeaojlyxqwnheyblznwoyikhqiutotpfukyqkvatxotulvlqzfcvskdccuixthzqrwymzccosjmjqjigehcnfphjuuybaxxukconatzseljyirycbhucxmwwftulfwfmyqyprlnsmxzyfmgjctgeunuuexhbrbsaaingqxqrjvpuhbvcmyztmkgenhonajrkzfrqjinjrbmjyinhwvlcmmxvbgvjgfmaoliosmxbonvlzoiqvkxxtoposygcgkcotohcrauivxxvmrghuauadwojxjligrgstczirnvhqpzwgjbvqzlkxltqnqrfxieggnuriytavbnouwhuamdtlspednyckixkhxedjmotiuucewllthhducwgwmgzxsbkqzfnqfynwisvsctyqdoaiypjivtxkxgoyhwhccklbdjoqykaqzljejlizgbehekmkfetvgfstmypmfnyoundudqlorcogbzoznddfalthwpmiewkmvogmzirbprjftbtffjrkrfminnechitfyfaujgtugadqbrskulsjbaunonxolauvsifevpdyurvfocxtkizflcuvltzuhwyhlbxaphifhtgkfktfnnmocpenrlujsuppbbuorvtubuiyszawzftijwhwgdyubjmmodzybiyunksuixnkariubegpdgctbayaynfskkuyhjvegsjwsbppodvhpjdjlzhxixswdncapxyfjspxeqxdfkhockvrzoisikaymoiqzqbjyoscwegfomlnurwboesfiszetebjblaolnovgvfcpnbemwambkhwcgdbhvkoluhjfxlfrfaeedocdilaboesauplmttewlbojkocklhsbzrtzeyhqtmgroupbzlymupmupsvlkzchclujuozzmngjvktzstsvocxrziuxelruwojzaleyrkjkdleavwqxwgjdbtiywqtdtaamrlcjogxufhgvoqpqkgopbtyqchzhexqgecheahjzxapqjdylzjqhzlzssbjmokncxalgasexztnlzfisxxpeerywlrjdohprewwnlwdbtwmfnnxnoolfhqqxzcvoymdbvmaoliedpvwzyvgyrwjguvoqnxrnaeqwvcfrqkwjmlvxovptultyfuxerelpfgctnpdxluqeruxkxqntosggfjqmrnlnkhhilznpycdrnemnktcsmzufpqgiraphzmgfhevzejhavsypohpttnnowfahpxfwmvxgwfuomxemdkzdlzldesmowzmhwoydnsovwykxqyllbmcurlvtwcfwxvvkxfknwwcwfjkzjtonalgijdsulcfagehiptszrcltbbypopdbmdfkyofelmrdmdbceguyxnkheqqtbletpqmjugpckmjyuuvsbqhyzmposwcgscnishluuhnwkyrkujefpgtsqrmcoortgitpdoagdncxlofkqozgngbtmlyyoyodcmcwetdtltupjrtemrjswekkfjvfecmvagyptjjuwsqpjwlxxosqhpssdvjraaicjfwvesyqfbumjjbqytkinpldxopxjzmvpigmberobyzyxwvwmlmbziduqhmbescgkvhgqtalmaxfsjlysmvrizgvrudstiwmaahtqehfbofvqwgqygvseykmgmhgjbxcrtdjqvojvyhohallyewqelzhjeuqmmsqhkluvqsfmxzbqqokehfoqrlqmwpnwojfowqpqebnuggeuvsszgfywceolvabyvbrwatuyherijsdqvpyyhdyradbammmchqkvdbxpbrxzrpfrsiiezvowrfqejibvociujtcwbygvfwojgfnvvwqlqqgipxhrogppzghtnweodaxuqxknnqnajlnsvheiycsvifvoljsncgnunsqcymnyoeeslrjflpprvtksimffvnuvakskdakvmlkpowfpfzdrcfctikhvvbagrvjlzjydnlmspzyynyjjfxnozpjjgjelipswsmfroitqphzsuqgumlnkxksbzhrsvcnfwufofhurmhksvvfjzggbtgrezkrkqmhduyqgwuwxoxaiifemtwrbilftiuhcgpjvqxldrnlzphdffncevlcyrxlpbwuswjfdegexeoooshdfqtqithpfocyowaqeedikssptyvkabhtaeotcwxccgguuotqvypugpcbwzalxwqbjdcokoxjnqhggpbbfeyjiellsikiqqtxpvzmjsfleepjpbxpeicxfcwbpprzgcrjgjaxshewradetsqsvfmcxptmksecfpynqzpctqpogcgokzrkltsbmwxkmynasohpkzjupapngusnvdjfqezqhyikllgkelewwwhhbdjvxdagnnxscjkotbbmhzkqbjwuwidrnvmztikmqjcxmcpgkoudhydmdvberfuvjnhlnfcsbpzmuquvrgogtfwefhqzkmxxgadtvjpxvurxprbsssihviypclwkjfaatzjxtvlzwaacqlwnqetgkldqaqghuihrgxbbpmjfsvaigqrhiiskkfibaeilqptkdsqqfwxeixuxgkiboaqnuaeutjcydnxyxnmattjrrxmthwvyipgazaxgrrjcvdnyxpktsldhluhicyqprxhljyfhawuvoonrwyklcdlmdvsgqrwqqomisksftsfyeifmupvylkjbagzyctuifbsrugqsbrkvskmundmczltpamhmgqespzgrkxebsvubrlmkwyqhjyljnkeqvdxtjxjvzlrubsiiahciwefwsswgssxmvyvgjrobvubcbgjomqajmotbcgqjudneovfbjtjzwqtsovzshmxeqofssukkvcdwlsdtyplrlgwtehnwvhhegtwkwnqqdiajpcaajsylesadaiflruewhrbrogbujbppunsqgytgnyuhnkejhccavaptbydtqhvyatftxcaaljyhhkkadzdhhzawgndunwwgknnbtqaddpszqgummmnomfqmdxqtwjexsbadfdqhnyixjslsaisscocbabivzokkgiinqqzsrtfpzjmxfkqmuzzlelqjtjidjarkwbwlcqrefokrlwdmuzyffdtajnqoimlzzpcgpjjwlqkusefzbgznhexzojxnzxmmedobgvdabtdoiskozrdrjscxwivaekrkyyfynuktmgyziteavdxfctvkfkrmsdwpaywzbkeojeycwdkectydojttisizruilwokhepscqdnjygiakomkhyujaffxjyxqmvkemqihpcdygprdeaxgjbkonfvgtzayfbmgwsskoyxjlknwwtehhhpjllhkcblyaxnbekoidbbyqvdqqsyfcemylmqskpxifcnhmemkkitqtbfwhmyemkzightkjbhlquticivpeeclpamsqoztxvdtcqbsonxyecnhcadtghkjckhrcdfggnqlwurydzbeybqkcfnnbwkciwaqdzgmcrbltvcuftxsqfpxnoombsbeoqxivgtkrjklxatgcorkdrvmngwlekeopgecefzxtprcoajoopxviijxilxfiwuajsbtcctfcqqgzhyjmonwdbyjlnneidyaqhhothzpzaxcthvbxpdcqofaeamxbqjwhunnqwclhcqhagawjsxygorgbuqryzickyplbaivkabbrkibqzqacabbwmnpndaqvbknbqcjuywxjrdbznndomwbbqfgulwczydrhrocebhygriiwxmwtjjyqqiqrjblxuamddlsiocdoysdaacuovljtpgocephnxuugdyggsnhcqiqhulyhlxiwzvhrtbmvjnhacwunckqzhpcthqgsupptdwkfeprbg";
String s = longestDupSubstring(str);
System.out.println("s = " + s);
}
/**
* 暴力枚举算法,超时。
* @param S
* @return
*/
// public static String longestDupSubstring(String S) {
// int len,i,j;
// String res = "";
// for (len = 1;len<=S.length();len++){
// int temp = 0;
// CharSequence charSequence = "";
// CharSequence charSequence1 = "";
// for (i=0;i+len<=S.length();i++){
// temp = 0;
// charSequence = S.subSequence(i, i + len);
// for (j=0;j+len<=S.length();j++){
// charSequence1 = S.subSequence(j, j + len);
// if (charSequence.equals(charSequence1)){
// temp++;
// }
// }
// if (temp > 1)
// res = charSequence.toString();
// }
//
// }
// return res;
// }
/**
* 使用HashSet<String>寻找重复的最长子串,从最长的子串开始遍历,这样可以省去判断长度的步骤,超时。
* @param S
* @return
*/
// public static String longestDupSubstring(String S) {
// int len,i,j;
// HashSet<String> set = new HashSet();
// String charSequence;
// for (len = S.length();len>0;len--){
// set.clear();
// for (i=0;i+len<=S.length();i++){
// charSequence = S.substring(i, i + len);
// if (set.contains(charSequence)){
// return charSequence;
// }else {
// set.add(charSequence);
// }
// }
// }
// return "";
// }
/**
* 具有多项式滚动散列的Rabin Karp。
* 搜索一个给定长度的至少出现2次的子串。
* 如果子串存在,则返回起始位置,否则返回-1。
* @param L
* @param a
* @param modulus
* @param n
* @param nums
* @return
*/
/*
Rabin-Karp with polynomial rolling hash.Search a substring of given length that occurs at least 2 times.Return start position if the substring exits and -1 otherwise.
*/
public static int search(int L, int a, long modulus, int n, int[] nums) {
// compute the hash of string S[:L]
//计算字符串S的哈希值[:L]
long h = 0;
for (int i = 0; i < L; ++i) h = (h * a + nums[i]) % modulus;
// already seen hashes of strings of length L
//已经看到长度为L的字符串的散列
HashSet<Long> seen = new HashSet();
seen.add(h);
// const value to be used often : a**L % modulus
//经常使用的常量值:a**L%模数
long aL = 1;
for (int i = 1; i <= L; ++i) aL = (aL * a) % modulus;
for (int start = 1; start < n - L + 1; ++start) {
// compute rolling hash in O(1) time
//在O(1)时间内计算滚动散列
h = (h * a - nums[start - 1] * aL % modulus + modulus) % modulus;
h = (h + nums[start + L - 1]) % modulus;
if (seen.contains(h)) return start;
seen.add(h);
}
return -1;
}
public static String longestDupSubstring(String S) {
int n = S.length();
// convert string to array of integers to implement constant time slice
//将字符串转换为整数数组以实现常量时间片
int[] nums = new int[n];
for (int i = 0; i < n; ++i) nums[i] = (int)S.charAt(i) - (int)'a';
// base value for the rolling hash function
//滚动哈希函数的基值
int a = 26;
// modulus value for the rolling hash function to avoid overflow
//滚动哈希函数避免溢出的模数值
long modulus = (long)Math.pow(2, 32);
// binary search, L = repeating string length
//二进制搜索,L=重复字符串长度
int left = 1, right = n;
int L;
while (left != right) {
L = left + (right - left) / 2;
if (search(L, a, modulus, n, nums) != -1) left = L + 1;
else right = L;
}
int start = search(left - 1, a, modulus, n, nums);
return start != -1 ? S.substring(start, start + left - 1) : "";
}
}