算法和数据结构项目练习6-基于Karp‐Rabin 算法的字符串搜索

最新推荐文章于 2024-03-15 23:54:46 发布

Jifu_M

最新推荐文章于 2024-03-15 23:54:46 发布

阅读量220

点赞数

分类专栏：项目练习算法和数据结构

本文链接：https://blog.csdn.net/Jifu_M/article/details/113838220

版权

Karp-Rabin算法字符串搜索哈希函数滚动哈希字符序列

关键词由CSDN通过智能技术生成

项目练习同时被 2 个专栏收录

36 篇文章 3 订阅

订阅专栏

算法和数据结构

16 篇文章 0 订阅

订阅专栏

Karp‐Rabin String Search

项目介绍
代码实现

项目介绍

本项目实现了Karp‐Rabin字符串搜索算法。
程序读取的txt文件包含两个字符序列，分别在不同的测试行上。第一行是目标序列T，第二行是搜索序列S。读取这两个字符串并使用Karp‐Rabin算法找到序列S在序列T中出现的所有情况。
对于每个匹配的序列，打印T中第一个匹配字符的位置。
不使用STL或等价的库。
代码中使用简单的哈希函数做例子，可以自行改为复杂版的避免哈希值撞车。

读取文件介绍：

在这里插入图片描述
第一行是序列T，它是基于DNA碱基序列的无序序列，在测试文档中有1000+列。
第二行是搜索序列S，它是我们需要在序列T中搜索到的连续字符串。

代码实现

#include <iostream>
#include <fstream>
#include <string>

using namespace std;

//define the number of chars in the input alphabet d(e.g. 256)
//define a suitable prime number q(e.g. 101)
#define d 256
#define q 101

int roll(int ht, int c1, int c2, int h);
int pow(int e);
int hash1(string str, int m);

int main()
{

	ifstream read;
	string a;
	cout << "Please enter the filename: " << endl;
	cin >>a;
	
	read.open(a.c_str());
	if (!read)
	{
		cout << "Can't open file" << endl;
		exit(1);
	}
	string T;
	string S;
	while (!read.eof())
	{
		read >> T;
		read >> S;
	}
	read.close();
	int n = T.length();
	int m = S.length();
	int h = pow(m - 1);
	int hash_s = hash1(S, m);
	int hash_t = hash1(T, m);
	for (int i = 0; i < n - m; i++)
	{
		if (hash_s == hash_t)
		{
/*
compare S and the substring of T to confirm
if match print(“String found at location: “ i)
*/
// check the characters to avoid the same hash value
			int c = 0;// count value 
			int pn = 0;//position number
			while( pn < m)
			{
			
					if (S[pn] == T[i + pn])
					{
						c++;
					}
			
			 if (c==m)
			{
			cout << "String found at location: " << i << endl;
					
			}
				pn++;
			}
	
		}

		//call next rolling hash key
		hash_t = roll(hash_t, T[i], T[i + m], h);
	}

	return 0;
}

// Rolling hash fn: Calculates hash value for next substring
int roll(int ht, int c1, int c2, int h)
{
	// Remove leading char and add trailing char to hash value
	ht = (d * (ht - c1 * h) + c2) % q;
	if (ht < 0)
	{
		ht = ht + q;
	}
	return ht;
}

// return d exp e mod q
int pow(int e)
{
	int p = 1;
	for (int i = 0; i < e; i++)
	{
		p = (p*d) % q;
	}
	return p;
}

// returns rolling hash hey of str
int hash1(string str, int m)
{
	int h = 0;
	for (int i = 0; i < m; i++)
	{
		h = ((d*h) + str[i]) % q;
	}
	return h;
}

输出结果如下：
在这里插入图片描述

从结果可以看到在字符串的652的位置找到了AGCTA，我们打开读取的txt文件，果然在第653列找到了相匹配的字符串。(第一位是0所以652+1=653)

同理第二个相匹配的位置发现在第732列。

在这里插入图片描述

Jifu_M

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
1
评论
算法和数据结构项目练习6-基于Karp‐Rabin 算法的字符串搜索

Karp‐Rabin String Search项目介绍代码实现项目介绍本项目实现了Karp‐Rabin字符串搜索算法。程序读取的txt文件包含两个字符序列，分别在不同的测试行上。第一行是目标序列T，第二行是搜索序列S。读取这两个字符串并使用Karp‐Rabin算法找到序列S在序列T中出现的所有情况。对于每个匹配的序列，打印T中第一个匹配字符的位置。不使用STL或等价的库。代码中使用简单的哈希函数做例子，可以自行改为复杂版的避免哈希值撞车。读取文件介绍：第一行是序列T，它是基于DNA碱
复制链接

扫一扫

专栏目录