Leetcode 5. Longest Palindromic Substring - Manacher‘s Algorithm

Hearod

已于 2024-04-26 14:56:39 修改

阅读量360

点赞数

分类专栏： C++ leetcode 文章标签： c++ leetcode

于 2020-08-31 07:05:06 首次发布

本文链接：https://blog.csdn.net/Hearod/article/details/108314280

版权

C++ 同时被 2 个专栏收录

3 篇文章 0 订阅

订阅专栏

leetcode

2 篇文章 0 订阅

订阅专栏

本文旨在讲清楚针对 Leetcode 5. Longest Palindromic Substring 采用 Manacher’s Algorithm的原理，并给出C++实现。

Manacher’s Algorithm

Reference：Link1(BIT祝威), Link2。图片及部分文字陈述参考自Link1, 数学推导参考自Link2。

Introduction

Given a string S with the length of N.

化一般为特殊。

在字符串S的每个字符之间以及S的首尾都插入一个特殊字符，该字符绝不会在S中出现，比如"#"。得到字符串T。 $T . l e n g t h = 2 N + 1$ 。

例如: S=“abaaba”，T=“#a#b#a#a#b#a#”。

如此一来，T的长度必定是奇数，这样就无需讨论S的长度是奇是偶了。此外，还有一个好处。

Calculate a table P[], where $P [i]$ is the radius of the longest palindromic substring of T centered at $i\in[0,2N]$ .

例如：S=“abaaba”

$i$	0	1	2	3	4	5	6	7	8	9	10	11	12
T	#	a	#	b	#	a	#	a	#	b	#	a	#
P	0	1	0	3	0	1	6	1	0	3	0	1	0

观察发现：

当T[i]在S中时， $P [i]$ is the length of the longest palindromic substring of S centered at this character. ------odd case
当T[i]是插入的特殊字符时， $P [i]$ is the length of the longest palindromic substring of S centered at the middle of two characters. -------even case

所以 $P []$ 可以直接用于求 $S$ 中所有回文子串的长度，自然也就容易求出最长回文子串的长度及中心位置了。这便是T的好处之二。

为了计算 $P [i]$ , 就必须以 $T [i]$ 左右扩展，那么有什么办法节省扩展的时间吗？

“想象你在"abaaba"中心画一道竖线，你是否注意到数组P围绕此竖线是中心对称的？再试试"aba"的中心，P围绕此中心也是对称的。这当然不是巧合，而是在某个条件下的必然规律。我们将利用此规律减少对数组P中某些元素的重复计算。” – Link1(BIT祝威)

Derivation

下面定义几个变量：

$C$ : the center of the palindrome currently known to include the boundary closest to the right end of $T$
$R$ : the rightmost boundary of the palindrome centered at $C$ , $\therefore T[C+k]=T[C-k], k\in[0,R-C]$
$L$ : the leftmost boundary of the palindrome centered at $C$ , $L = 2 C - R$
$i$ : the position of an element in $T$ whose palindromic span is being determined. $i$ is always to the right of $C$ .
$i^{'}$ : mirrored position of $i$ w.r.t. $C$ . so $i^{'} = 2 C - i$ .

我们可以根据已知的 $P [i^{'}]$ 来加速计算 $P [i]$ . 对于每一个 $i$ , 有下面几种可能：

Case 1: The length of the longest palindrome centered at $i^{'}$ such that the left boundary of this palindrome does not extend beyond or until the left boundary of the longest palindrome centered at $C$ , i.e., $P [i^{'}] < i^{'} - L = R - i$ .

例如：S = “babcbabcbaccba”

此时 $i = 13$ ，显然， $P [13] = P [9] = 1$ . 直观上，因为对称，所以必然成立 $P [i] = P [i^{'}]$ 。我们也可以严格证明在case 1情形下，此规律必然成立。

证明分为两部分：
1. 中心位于 $i^{'}$ 的回文子串P1，关于 $C$ 的镜像子串P2(中心位于 $i$ )也是回文字符串
2. 在P1之外的字符，关于 $C$ 的镜像字符，不在中心位于 $i$ 的回文子串中。
Consider $T[i+k]\quad and \quad T[i'+k], T[i-k] \quad \forall k \leq P[i']$ :

$\because k\leq P[i'] \quad \therefore k< i'-L=R-i$
$T [i^{'} - k] = T [2 C - i - k] = T [C - (i + k - C)]$
$T [i + k] = T [C + (i + k - C)]$
Let $\therefore k'<i+R-i-C=R-C$
$\begin{aligned} \therefore T[i'-k]&=T[C-k']\\ T[i+k]&=T[C+k']\\ \because T[C+k']&=T[C-k'], k'\in[0,R-C)\\ \therefore T[i'-k]&=T[i+k] \quad \forall k\leq P[i'] \tag{1} \end{aligned}$
Same with $KaTeX parse error: \tag works only in display equations$
$\because T[i'-k]=T[i'+k] \quad \forall k \leq P[i']$
$\therefore T[i-k] = T[i+k] \quad \forall k\leq P[i']$
when $k = P [i^{'}] + 1$ , formula (1) and (2) still exist. however, $T [i^{'} - k]! = T [i^{'} + k]$ .
$\therefore T[i-k] != T[i+k] \quad k= P[i']+1$

Case 2: the palindrome centred at $i^{'}$ extends beyond the left boundary of the palindrome centred at $C$ , i.e., $P[i']\ge i'-L=R-i$ .

导致Case2 与 Case1不同的原因是，仿照Case1的推导，我们只能确定 $\quad \forall k\leq i'-L$ ，却无法确定 $k\in(i'-L, P[i'])$ 时 $T [i - k], T [i + k]$ 的关系。即，只能知道 $P[i]\ge R-i$ , 至于具体是多少，只能再逐个字符检测了。如果 $P [i] > R - i$ ，即以 $i$ 为中心的回文子串右边界超过了 $R$ ，那么以 $C$ 为中心的回文子串就不再是其右边界最靠近 $T$ 右边界的回文子串了，取而代之的是以 $i$ 为中心的回文子串，因此 $C = i$ ，同时相应改变 $L, R$ 。
Case 3: 如果 $i >= R$ , 那么先前得到的 $P [i^{'}]$ 不能给我们提供有用的信息，只能逐个字符检测。

总结一下：

if(i<R)
{
	if(P[i']<R-i)
		P[i]=P[i']
	else
	{
		P[i]>=R-i #(此时要逐个验证R右边的字符)
		if(i处的回文超过了R) 
		{
			C=i;
            update R;
		}
	}
}
else
{
	check one by one;
}

Complexity Analysis

时间复杂度

引用自Link1(BIT祝威)

图中i为索引，T为加入"#“、”^“和”$"后的字符串，P[i]就是算法里的P[i]，calc[i]是为了求出P[i]而需要执行比较的次数。Note: 首位加入不同的字符是为了防止考虑边界情况, 这样i就可以从1开始，到T.length-1结束。

"V"表示此列的字符与其左侧的字符进行了比较，在左侧用"X"对应。绿色的表示比较结果为两个字符相同（即比较结果为成功），红色的表示不同（即比较结果为失败）。

很显然"X"和"V"的数量是相等的。

你可以看到，所需的成功比较的次数（绿色的"V"，表现为横向增长）不超过N，失败的次数（红色的"V"，表现为纵向增长）也不超过N，所以这个算法的时间复杂度就是2N，即O(N)。

在这里插入图片描述

空间复杂度

创建新字符串 $T$ ， $T . l e n g t h = 2 N + 3$ ，数组P, $P . l e n g t h = 2 N + 3$ 。因此空间复杂度 $O (N)$

C++ Implementation

Runtime: 28 ms, faster than 91.09% of C++ online submissions for Longest Palindromic Substring.

Memory Usage: 7.7 MB, less than 66.76% of C++ online submissions for Longest Palindromic Substring.
为了更好的符合上述推导过程，因此代码没有优化，存在复用。

string longestPalindrome(string s) {
    // preprocess
    string T = "^";
    for(int i = 0; i < s.length(); i++)
    {
        T.append("#");
        T.append(s.substr(i, 1));
    }
    T.append("#$");

    const int n = T.length();
    vector<int> P(n, 0);
    int C = 0, R = 0;
    int maxIdx = 0; // record the position of the longest palindrome

    for(int i = 1; i < n-1; i++)
    {
        int i_mirror = C - (i - C);
        if(R > i)
        {
            //Case 1
            if(P[i_mirror] < R - i)
                P[i] = P[i_mirror];
            //Case 2
            else{
                P[i] = R - i;
                while(T[i+P[i]+1] == T[i-P[i]-1])
                    P[i]++;
                C = i;
                R = i + P[i];
            }
        }
        //Case 3
        else{
            P[i] = 0;
            while(T[i+P[i]+1] == T[i-P[i]-1])
                P[i]++;
            C = i;
            R = i + P[i];
        }

        if(P[i] > P[maxIdx])
            maxIdx = i;
        cout << P[i] << ' ';
    }

    cout << '\n' << maxIdx << endl;
    return s.substr((maxIdx-1-P[maxIdx]) / 2, P[maxIdx]);
}

Hearod

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
1
评论
Leetcode 5. Longest Palindromic Substring - Manacher‘s Algorithm

Manacher’s AlgorithmReference：Link1(BIT祝威), Link2。图片及部分文字陈述参考自Link1, 数学推导参考自Link2。IntroductionGiven a string S with the length of N.化一般为特殊。在字符串S的每个字符之间以及S的首位都插入一个特殊字符，该字符绝不会在S中出现，比如"#"。得到字符串T。T.length=2N+1T.length = 2N+1T.length=2N+1。例如: S=“abaaba”
复制链接

扫一扫