目录
算法背景
有很多关于回文的算法,其中最有名的当属Manacher的算法,它可以在一个字符串中以O(N)的时间复杂度,找到最长的回文. 所谓的回文字符串,就是如果向前和向后读一个字符串可以读出相同的字符串,比如xax,babab,oogelegoo. 一个字符串bxax中的最长回文为xax,它的长度为3. cxaxebababfgoogelegood中最长的回文为oogelegoo,其长度为9.
在一个字符串中找到最长的回文当前最有名的应该是Manacher的算法,它的时间和空间复杂度都是N。在这里我要和大家介绍各种回文的算法,还有一种和Manacher算法在时间和空间复杂度上相同,在结构上更好,适用性更广泛的算法。这个算法最早是我在hackerrank上刷题时想出来的,当时仅仅用了4,5个小时的时间. 现在普遍的叫法Palindromic Tree, 有俄罗斯的学者Mikhail Ruinchik在2014年8月份的时候展示了这个算法,在第二年同月以EERTREE:An Efficient Data Structure for Processing Palindromes in String,这个篇名发布了这个算法的论文。从他的论文中发现他也是参考了在一些名为“回文树”的IT博文,链接地址为:
http://adilet.org/blog/25-09-14/。 从Manacher在1975发明了这个算法到2014年出现了palindromic Tree这个算法,差不多用了40年时间。
我当时在做这个算法题的时候,这个算法的难度在hackerrank上只有中等难度. 我的设计思想,逻辑很简单和直观,用了数学归纳法和回溯法的设计思想. 我在做出这个算法之前,学习过的有名的算法只有寥寥几个,比较Dijkstra算法,冒泡排序,插入排序,快速排序,归并排序和AVL tree. 不过所有的算法设计思想都掌握了: 递归法,暴力法,贪婪法,分治法,动态规划,和回溯法. 现在我基本掌握了所有被命名的算法,比如堆排序,Manacher,segment tree,Fenwick Tree,Bell-man ford,KMP, ahocorasick, Ukkonen等。当时我也不太相信这么多年了这个算法还没有被人发明出来。我在做这个算法题的时候,还不怎么会用hackerrank,不懂得怎么把里面的case拷贝出来在本地跑,在线上debug,造成提交了非常多次代码。从这个算法的发明过程中我悟出了一个道理,机会是给有准备的人的,坚持不懈的努力,不断充实自己,机遇和你的知识结构总会有碰撞出火花的时候.
下面我在这里就给大学介绍一下这些回文算法:
1.brute force的算法
两个循环加上检查字符串是不是回文,其时间复杂度为O(N3),不需要额外的数组来存储,其空间复杂度为N
Java源代码如下:
/*
* brute force method to solve longest palindromic substring challenge
*/
public class BruteForce {
// store the length of longest palindromic substring
static int maxLen = 0;
public static void main(String[] args) {
String str = "banana";
System.out.println(bruteForce(str));
}
public static int bruteForce(String s) {
int N = s.length();
char[] strc = s.toCharArray();
// two loops to enumerate substrings within a string
for (int i = 0; i < N; i++) {
for (int j = i; j < N; j++) {
if (checkIsPalindrome(i, j, strc)) {
/*
* if substring from index i to j is a palindrome, then check whether is longer
* than max length of palindrome, if so then assign this value to max length.
*/
maxLen = Math.max(maxLen, j - i + 1);
}
}
}
return maxLen;
}
// check whether substring from index i to j is palindrome or not
public static boolean checkIsPalindrome(int i, int j, char[] strc) {
// loop from i to middle index
for (int f = i; f < (i + j) / 2; f++) {
int t = f - i;
// check whether pair of characters at two positions is equal or not
if (strc[f] != strc[j - t])
return false;
}
return true;
}
}
2.简单动态规划算法
两个循环用O(N2),加上一个读出动态规划读出包含的回文长度O(1),所以其时间复杂度为O(N2),用到了一个N2的数组来存中间结果,其空间复杂度为N2. 经过观察, 回文是基于子回文构建,那么就可以通过一个循环,从最短的回文开始向两边扩展,那么就可以把N平方的空间省掉。在这里我主要讨论回文树的算法,不再描述如果去掉N平方的空间。如果你们感兴趣的话可以自己实现一下代码,或者考虑网络上的其它资源,也可以联系我。
/*
* dynamic programming to solve longest palindromic substring
*/
public class DynamicProgramming {
// the length of longest palindromic substring
static int maxLen = 0;
public static void main(String[] args) {
String str = "bananadefg";
System.out.println(dynamicProgramming(str));
}
public static int dynamicProgramming(String s) {
int N = s.length();
/*
* initialize a two dimensional array to the length of substring if which is a
* palindromic substring, otherwise assign minus 1.
*/
int[][] mem = new int[N][N];
// two loop to enumerate substrings
for (int i = 0; i < N; i++) {
for (int j = i; j < N; j++) {
dynamicCaclulateMaxLen(mem, i, j, s.toCharArray());
}
}
return maxLen;
}
/*
* dynamically whether substring from index i to j is palindrome, if so check
* the max length again and save this length to two dimensional array memorized
* array
*/
public static int dynamicCaclulateMaxLen(int[][] mem, int i, int j, char[] strc) {
int t = 0;
// if i is larger than j, then they can't form palindrome, ignore this case
if (i > j)
return 0;
else if (mem[i][j] != 0) {// if the length of palindrome from index i to j has been calculated, return
// this length
return mem[i][j];
} else if (i == j) {// if char at i is same with char at j, then this is a one char palindrome.
t = 1;
} else {
t = dynamicCaclulateMaxLen(mem, i + 1, j - 1, strc);
// if substring is paldinrome,char i is same with char j,then length is two plus
// sub palindrome
if (t >= 0 && strc[i] == strc[j]) {
t += 2;
} else// can't build a palindrome,set -1 as length
t = -1;
}
mem[i][j] = t;
maxLen = Math.max(maxLen, mem[i][j]);
return t;
}
}
3.Manancher算法
一次遍历其时间复杂度为O(N),不需要额外的数组其空间复杂度为N。经过观察,一个字符串共用2N个中心做为回文的中心。一个回文被另一个回文包含,在由于包含的回文具有左右对称性,那么它在包含回文的左侧一定有个相同的回文。具体这个思路来实现Manancher算法,那么就不要用N平方的动态规则来取出子回文,而是基于其左侧的回文来计算新回文的长度,如下代码实现:
/*
* Manacher's algorithm to solve longest palindromic substring
*/
public class Manacher {
// store the length of longest palindromic substring
static int maxLen = 0;
public static void main(String[] args) {
String str = "banana";
System.out.println(manacher(str));
}
public static int manacher(String s) {
int N = s.length();
// there is 2*N+1 centers for palindromes
int N2 = N * 2 + 1;
char[] strct = s.toCharArray();
char[] strc = new char[N2];
// change original string to string with all centers
for (int i = 0; i < N; i++) {
strc[i * 2] = '|';
strc[i * 2 + 1] = strct[i];
}
// set last center
strc[N2 - 1] = '|';
int[] mRadius = new int[N2];
// set the center and radius of plindrome centered at index 1
mRadius[0] = 0;
mRadius[1] = 1;
maxLen = 1;
int m = 1, r = 1;
int m1, l1, r1, rd;
// one loop to calculate max length of palindrome centered at index i
for (int i = 2; i < N2; i++) {
if (i <= m + r) {
m1 = m - (i - m);
rd = mRadius[m1];
rd = m1 - rd >= m - r ? rd : (m1 - (m - r));
l1 = i - rd - 1;
r1 = i + rd + 1;
} else {
l1 = i - 1;
r1 = i + 1;
rd = 0;
}
while (l1 >= 0 && r1 < N2 && strc[l1--] == strc[r1++]) {
rd++;
}
mRadius[i] = rd;
callen(rd);
if (i + rd > m + r) {
m = i;
r = rd;
}
}
return maxLen;
}
// caculate the max length of palindrome centered at index i,the recalculate the
// max length of palindrome
public static void callen(int r) {
maxLen = Math.max(maxLen,r);
}
}
4.Palindromic Tree算法
其时间复杂度为O(N),空间复杂度为N乘以字符集的大小。该算法的实现思路是:一个回文由另一个回文加两边相同字符构成,下一个回文串一定是基于上一个回文串来构建,如下图。在构建的过程中,用回溯的方法,那么回溯的长度不超过N。如果字符集是有限的,比如26,就可以一次性创建一个数组来保存所有字符节点,那么两边加字符的时间复杂度就为常量O(1)。
我在LeetCode上对该算法解释如下链接。
import java.util.ArrayList;
import java.util.List;
/*
* Palindromic Tree to solve longest palindromic substring, also find all distinct palindromes
*/
public class PalindromicTree {
// a List to store all distinct palindromes
public List<Node> distinctPalindrome = new ArrayList<Node>();
// Node used to represent a palindrome at index start with the length len
public class Node {
// link to next palindromic Node
Node link;
// sides used to store index of char to next palindrome Nodes
Node[] sides = new Node[26];
// start index of this palindrome
int start;
// length of this palindrome
int len;
public Node(int start, int end) {
this.start = start;
this.len = end - start + 1;
}
public Node(int len) {
this.len = len;
}
}
// initialize a length minus one used for calculation of one character
// palindrome
Node root1 = new Node(-1);
// initialize a length zero used for calculation of two characters
// palindrome
Node root2 = new Node(0);
// current tail of chain of palindromes
Node tail = root2;
// store the Node of longest palindromic substring
Node max = null;
public PalindromicTree(char[] cs) {
root2.link = root1;
for (int i = 0; i < cs.length; i++) {
build(cs, i);
max = max == null ? tail : max.len < tail.len ? tail : max;
}
}
public void build(char[] cs, int idx) {
// convert char to index
int ci = cs[idx] - 'a';
Node cursor = tail;
tail=null;
Node nextNode = root2;
// back track to find tail
while (tail==null) {
int start = idx - cursor.len - 1;
if (start >= 0 && cs[idx] == cs[start]) {
// already has branch node return as tail
if (cursor.sides[ci] != null) {
tail = cursor.sides[ci];
return;
} else {// tail is new node
tail = new Node(start, idx);
cursor.sides[ci] = tail;
distinctPalindrome.add(tail);
}
}
cursor = cursor.link;
}
// back track to find next node in the chain
while (cursor != null) {
int start = idx - cursor.len - 1;
if (start >= 0 && cs[start] == cs[idx]) {
nextNode = cursor.sides[ci];
break;
}
// go to next cursor
cursor = cursor.link;
}
// link next node to tail
tail.link = nextNode;
}
public static void main(String[] args) {
String str = "banana";
char[] cs = str.toCharArray();
PalindromicTree pt = new PalindromicTree(cs);
Node max = pt.max;
System.out.println(String.format("The longest length is %d,longest palindrome is %s", max.len,
str.substring(max.start, max.start + max.len)));
for (Node p : pt.distinctPalindrome) {
System.out.println(new String(cs, p.start, p.len));
}
}
}