算法导论实验——串匹配和熵编码

最新推荐文章于 2021-12-12 00:20:28 发布

木子若鱼

最新推荐文章于 2021-12-12 00:20:28 发布

阅读量555

点赞数

文章标签： c++ 算法导论串匹配熵编码

本文链接：https://blog.csdn.net/jane_6091/article/details/99442044

版权

算法导论专栏收录该内容

9 篇文章 0 订阅

订阅专栏

大家如果对算法导论以及其他信息感兴趣，可以加入学习交流群一起交流 948097478

实验内容
1.字符串匹配算法的c语言实现
问题描述：已知字符串P，使用字符串匹配算法（KMP或BM）查找其在文本文件t中首次出现的位置。进一步，输出其在文本出现的位置与次数。
2.LZW编码算法的C语言实现
问题描述：设有输入字符流：“ababcbababaaaaaaa”，试对其进行LZW编码。

1.字符串匹配
KMP算法
（1）基本原理：模式串匹配。
设主串（称作T）为：a b a c a a b a c a b a c a b a a b b
模式串（称作W）为：a b a c a b
用暴力算法匹配字符串过程中，我们会把T[0] 跟 W[0] 匹配，如果相同则匹配下一个字符，直到出现不相同的情况，此时我们会丢弃前面的匹配信息，然后把T[1] 跟 W[0]匹配，循环进行，直到主串结束，或者出现匹配成功的情况。这种丢弃前面的匹配信息的方法，极大地降低了匹配效率。
而在KMP算法中，对于每一个模式串我们会事先计算出模式串的内部匹配信息，在匹配失败时最大的移动模式串，以减少匹配次数。
比如，在简单的一次匹配失败后，我们会想将模式串尽量的右移和主串进行匹配。右移的距离在KMP算法中是如此计算的：在已经匹配的模式串子串中，找出最长的相同的前缀和后缀，然后移动使它们重叠。
在第一次匹配过程中
T: a b a c a a b a c a b a c a b a a b b
W: a b a c ab
在T[5]与W[5]出现了不匹配，而T[0]_{T[4]是匹配的，现在T[0]}T[4]就是上文中说的已经匹配的模式串子串，现在移动找出最长的相同的前缀和后缀并使他们重叠：
T: a b a c aab a c a b a c a b a a b b
W: a b a c ab
然后在从上次匹配失败的地方进行匹配，这样就减少了匹配次数，增加了效率。
然而，如果每次都要计算最长的相同的前缀反而会浪费时间，所以对于模式串来说，我们会提前计算出每个匹配失败的位置应该移动的距离，花费的时间就成了常数时间。比如：
j 0 1 2 3 4 5
W[j] a b a c a b
F(j) 0 0 1 0 1 2
当W[j]与T[j]不匹配的时候，设置j = F(j-1).

主要代码（带核心注释）：

#include <stdio.h>
#include <iostream>
#include <fstream>
using namespace std;

void cal_next(char *str, int *next, int len);
int KMP(char *str, int slen, char *ptr, int plen);

void cal_next(char *str, int *next, int len)
{	
	next[0] = -1;//next[0]初始化为-1，-1表示不存在相同的最大前缀和最大后缀
	int k = -1;//k初始化为-1
	for (int q = 1; q <= len - 1; q++)
	{
		while (k > -1 && str[k + 1] != str[q])//如果下一个不同，那么k就变成next[k]
		{
			k = next[k];//往前回溯
		}
		if (str[k + 1] == str[q])//如果相同，k++
		{
			k = k + 1;
		}
		next[q] = k;//把算的k的值（就是相同的最大前缀和最大后缀长）赋给next[q]
	}
}

int KMP(char *str, int slen, char *ptr, int plen)
{
	int count=0;
	int *next = new int[plen];
	cal_next(ptr, next, plen);//计算next数组
	int k = -1;
	for (int i = 0; i < slen; i++)
	{
		while (k >-1 && ptr[k + 1] != str[i])//ptr和str不匹配，且k>-1（表示ptr和str有部分匹配）
			k = next[k];//往前回溯
		if (ptr[k + 1] == str[i])
			k = k + 1;
		if (k == plen - 1)//说明k移动到ptr的最末端
		{
			cout << "在位置" << i-plen+1<< endl;
			k = -1;//重新初始化，寻找下一个
			i = i - plen + 2;//i定位到找到位置处的下一个位置（这里默认存在两个匹配字符串可以部分重叠）
			count++;
			//return i - plen + 1;//返回相应的位置
		}
	}
	cout << "匹配次数：" << count<<endl;
	return -1;
}

int main(){
	ifstream fin("in.txt");
	char str[50];
	fin.getline(str,50);
	//char *str = "bacbababadababacambabacaddababacasdsd";
	char *ptr = "ababaca";
	int a = KMP(str, 36, ptr, 7);
	cout << "文本" << str << endl;
	cout << "匹配字符串" << ptr << endl;
	//cout <<"首次匹配的位置："<< a<<endl;
	return 0;
}

结果
在这里插入图片描述

时间复杂度：O（m+n）空间复杂度：O（2m+n）

2.BM算法
（1）基本原理：

substring searching algorithm
search

结果在第二个字符处发现不匹配，于是要把子串往后移动。SUNDAY方法就是看紧跟在当前子串之后的那个字符（‘i’)。显然，不管移动多少，这个字符是肯定要参加下一步的比较的，也就是说，如果下一步匹配到了，这个字符必须在子串内。所以，可以移动子串，使子串中的最右边的这个字符与它对齐。现在子串’search’中并不存在’i’，则说明可以直接跳过一大片，从’i’之后的那个字符开始作下一步的比较，如下：

substring searching algorithm
　　　 search

比较的结果，第一个字符就不匹配，再看子串后面的那个字符，是’r’,它在子串中出现在倒数第三位，于是把子串向前移动三位，使两个’r’对齐，如下：

substring searching algorithm
　　　　  search

匹配成功！

（2）主要代码（带核心注释）

#include <iostream>  
#include <string>  
using namespace std;

void SUNDAY(char *text, char *patt)
{
	int count=0;
	register size_t temp[256];
	size_t *shift = temp;
	size_t i, patt_size = strlen(patt), text_size = strlen(text);
	//cout << "size : " << patt_size << endl;
	for (i = 0; i < 256; i++)
	{
		*(shift + i) = patt_size + 1;
	}
	for (i = 0; i < patt_size; i++)
	{
		*(shift + unsigned char(*(patt + i))) = patt_size - i;
	}
	//shift['s']=6 步,shitf['e']=5 以此类推   
	size_t limit = text_size - patt_size + 1;
	for (i = 0; i < limit; i += shift[text[i + patt_size]])
	{
		if (text[i] == *patt)
		{
			char *match_text = text + i + 1;
			size_t match_size = 1;
			do
			{
				// 输出所有匹配的位置   
				if (match_size == patt_size)
				{
					cout << "the NO. is " << i+1 << endl;
					count++;
				}
				
			} while ((*match_text++) == patt[match_size++]);
		}
	}
	cout <<"匹配次数:"<< count << endl;
	cout << endl;
}
int main(void)
{
	char *text = new char[100];
	cout << "请输入文本：" << endl;
	cin >> text;
	//text = "substring searching algorithm search";
	char *patt = new char[10];
	cout << "请输入要匹配的字符串" << endl;
	cin >> patt;
	//patt = "search";
	cout << text << endl;
	cout << patt << endl;
	SUNDAY(text, patt);
	return 0;
}

运行结果
在这里插入图片描述

3.LZW编码
LZW算法:
（1）基本原理：LZW算法基于转换串表（字典）T，将输入字符串映射成定长（通常为12位）的码字。在12位4096种可能的代码中，256个代表单字符，剩下3840给出现的字符串。
LZW字典中的字符串具有前缀性，即 ωK∈T=>；ω！∈T。

LZW算法流程：
步骤1：开始时的词典包含所有可能的根(Root)，而当前前缀P是空的；
步骤2：当前字符© ：=字符流中的下一个字符；
步骤3：判断缀-符串P+C是否在词典中
　　(1) 如果“是”：P ：= P+C // (用C扩展P) ；
　　(2) 如果“否”
　　① 把代表当前前缀P的码字输出到码字流;
　　② 把缀-符串P+C添加到词典;
　　③ 令P ：= C //(现在的P仅包含一个字符C);
　　步骤4：判断码字流中是否还有码字要译
　　(1) 如果“是”，就返回到步骤2；
　　(2) 如果“否”
　　① 把代表当前前缀P的码字输出到码字流;
　　② 结束。

主要代码（带核心注释）：

#include<iostream>
#include<iomanip>
#include<string.h>
using namespace std;

char Dictionary[1000][1000];
char ch[1000];
int a[1000];
int s, n;
int len;
char h[2], t[100];
char p[100];

void Editcode()       //编码；
{
	cout << "请输入要编码的字符串:" << endl << endl;
	cin >> ch;
	t[0] = 0;
	s = 0;
	n = 122;
	int j;
	cout << endl;
	cout << "词典中的编码为:" << endl << endl;
	for (int i = 0; ch[i]; i++)
	{
		h[0] = ch[i];
		strcat(t, h);
		for (j = 1; j<n; j++)
		if (strcmp(t, Dictionary[j]) == 0)
			break;
		if (j == n)
		{
			cout << s << " ";
			strcpy(Dictionary[n], t);
			strcpy(t, h);
			s = h[0];
			n++;
		}
		else s = j;
	}
	cout << s << endl;
}

void Translatecode()    //译码；
{
	cout << "请输入要译码的个数:";
	cin >> len;
	cout << endl;
	cout << "请输入要译的编码:" << endl;
	for (int j = 0; j<len; j++)
	{
		cin >> a[j];
	}

	n =128;
	cout << endl;
	cout << "结果为:" << endl << endl;
	cout << Dictionary[a[0]];
	strcpy(p, Dictionary[a[0]]);

	for (int i = 1; i<len; i++)
	{
		if (Dictionary[a[i]][0])
		{
			cout << Dictionary[a[i]];
			h[0] = Dictionary[a[i]][0];
			strcpy(Dictionary[n], strcat(p, h));
			strcpy(p, Dictionary[a[i]]);
			n++;
		}
		else
		{
			h[0] = p[0];
			strcpy(Dictionary[n], strcat(p, h));
			cout << Dictionary[n];
			strcpy(p, Dictionary[a[i]]);
			n++;
		}
	}
	cout << endl;

}

void main()
{
	h[1] = 0;
	for (int i = 97; i <= 122; i++)
	{
		h[0] = char(i);
		strcpy(Dictionary[i], h);
	}

	//cout << "请选择您要进行的操作:" << endl;

	while (true)
	{
		//cout << "\n\t\t\ta.编码\t\tb.译码\n\n";
		//cout << "请选择:";
		//char cha;
		//cin >> cha;
		//cout << endl;
		char cha = 'a';
		if (cha == 'a' || cha == 'A')
		{
			Editcode();
			cout << endl << "编码词典为:" << endl << endl;
			for (int i =97; i<n; i++)
			{
				if ((i - 96) % 6 != 0)
					cout << setw(4) << i << setw(6) << Dictionary[i] << "  ";
				else
					cout << setw(4) << i << setw(6) << Dictionary[i] << endl;
			}
			/*for (int i = 33; i<n; i++)
			{
				if ((i - 32) % 6 != 0)
					cout << setw(4) << i << setw(6) << Dictionary[i] << "  ";
				else
					cout << setw(4) << i << setw(6) << Dictionary[i] << endl;
			}*/
			cout << endl;
		}
		else if (cha == 'b' || cha == 'B')
		{
			Translatecode();
			/*cout << endl << "译码词典为:" << endl << endl;
			for (int i = 33; i<n; i++)
			{
				if ((i - 32) % 6 != 0)
					cout << setw(4) << i << setw(6) << Dictionary[i] << "  ";
				else
					cout << setw(4) << i << setw(6) << Dictionary[i] << endl;
			}*/
			cout << endl;
		}
		else
		{
			cout << "请输入正确的选择!" << endl;
		}

	}

}

运行结果
在这里插入图片描述