rabin-karp 算法
The Rabin-Karp algorithm is a string matching/searching algorithm developed by Michael O. Rabin and Richard M. Karp. It uses hashing technique and brute force for comparison, and is a good candidate for plagiarism detection.
Rabin-Karp算法是由Michael O. Rabin和Richard M. Karp开发的字符串匹配/搜索算法。 它使用哈希技术和蛮力进行比较,是窃检测的良好候选者。
重要条款 (Important terms)
pattern is the string to be searched. Consider length of pattern as M characters.
pattern是要搜索的字符串。 将图案长度视为M个字符。
text is the whole text from which the pattern is to be searched. Consider length of text as N characters.
文本是要从中搜索模式的整个文本。 将文字长度视为N个字符。
什么是蛮力比较? (What is brute force comparison?)
In brute force comparison each character of pattern is compared with each character of text until characters that don't match are found.
在蛮力比较中,将模式的每个字符与文本的每个字符进行比较,直到找到不匹配的字符。
Rabin-Karp算法如何工作 (How the Rabin-Karp Algorithm Works)
Calculate hash value of pattern
计算模式的哈希值
Calculate hash value of first M characters of text
计算文本的前M个字符的哈希值
- Compare both hash values 比较两个哈希值
If they are unequal, calculate hash value for next M characters of text and compare again.
如果它们不相等,请为文本的后 M个字符计算哈希值,然后再次进行比较。
- If they are equal, perform a brute force comparison. 如果它们相等,则执行蛮力比较。
hash_p = hash value of pattern
hash_t = hash value of first M letters in body of text
do
if (hash_p == hash_t)
brute force comparison of pattern and selected section of text
hash_t= hash value of next section of text, one character over
while (end of text or brute force comparison == true)
优于朴素字符串匹配算法 (Advantage over Naive String Matching Algorithm)
This technique results in only one comparison per text sub-sequence and brute force is only required when the hash values match.
此技术仅对每个文本子序列进行一次比较,并且仅在哈希值匹配时才需要蛮力。
翻译自: https://www.freecodecamp.org/news/the-rabin-karp-algorithm-explained/
rabin-karp 算法