每日总结 1.12

最新推荐文章于 2024-09-12 19:02:20 发布

不甘心蒟蒻呐

最新推荐文章于 2024-09-12 19:02:20 发布

阅读量81

点赞数

文章标签：算法 c语言

本文链接：https://blog.csdn.net/ddddcy2023/article/details/128666811

版权

1.今天ac了两题kmp算法类型的题目

2.主要学习了hash算法

【模板】KMP字符串匹配

题目描述

给出两个字符串 s_1s1 和 s_2s2，若 s_1s1 的区间 [l, r][l,r] 子串与 s_2s2 完全相同，则称 s_2s2 在 s_1s1 中出现了，其出现位置为 ll。
现在请你求出 s_2s2 在 s_1s1 中所有出现的位置。

定义一个字符串 ss 的 border 为 ss 的一个非 ss 本身的子串 tt，满足 tt 既是 ss 的前缀，又是 ss 的后缀。
对于 s_2s2，你还需要求出对于其每个前缀 s's′ 的最长 border t't′ 的长度。

输入格式

第一行为一个字符串，即为 s_1s1。
第二行为一个字符串，即为 s_2s2。

输出格式

首先输出若干行，每行一个整数，按从小到大的顺序输出 s_2s2 在 s_1s1 中出现的位置。
最后一行输出 |s_2|∣s2∣ 个整数，第 ii 个整数表示 s_2s2 的长度为 ii 的前缀的最长 border 长度。

输入输出样例

输入 #1复制

ABABABC
ABA

输出 #1复制

1
3
0 0 1

说明/提示

样例 1 解释

。

对于 s_2s2 长度为 33 的前缀 ABA，字符串 A 既是其后缀也是其前缀，且是最长的，因此最长 border 长度为 11。

数据规模与约定

本题采用多测试点捆绑测试，共有 3 个子任务。

Subtask 1（30 points）：|s_1| \leq 15∣s1∣≤15，|s_2| \leq 5∣s2∣≤5。
Subtask 2（40 points）：|s_1| \leq 10^4∣s1∣≤104，|s_2| \leq 10^2∣s2∣≤102。
Subtask 3（30 points）：无特殊约定。

对于全部的测试点，保证 1 \leq |s_1|,|s_2| \leq 10^61≤∣s1∣,∣s2∣≤106，s_1, s_2s1,s2 中均只含大写英文字母。

#include<stdio.h>
#include<string.h>
long long int next[1000005]={0},vis[1000005];
char s1[1000005],s2[1000005];
long long int l1,l2;
void getnext(char t[],long long int next[])
{
	int i;
	int j;
	next[0]=-1;
	i=0,j=-1;
	while(i<strlen(t))
	{
		if(j==-1||t[i]==t[j])
		{
			i++;
			j++;
			vis[i]=j;
			if(t[i]==t[j])
			{
				next[i]=next[j];
			}
			else
			{
				next[i]=j;
			}
		}
		else
		{
			j=next[j];
		}
	}
}
void kmp(char s1[],char s2[])
{
	getnext(s2,next);
	int j=0,i=0;
	
	while(i<l1)
	{
		if(j==-1||s1[i]==s2[j])
		{
			j++;
			i++;
			
		}
		else
		{
			j=next[j];
		}
		if(j>=l2)
		{
			printf("%d\n",i-l2+1);
			j=next[j];
		}
	}
}
int main()
{
	scanf("%s",s1);
	scanf("%s",s2);
	l1=strlen(s1),l2=strlen(s2);
	kmp(s1,s2);
	for(int i=1;i<=l2;i++)
	{
		printf("%lld ",vis[i]);
	}
	getchar();
	getchar();
	return 0;
}

在厘清了kmp算法的基本实现方式，发现该算法实现主要由三部分组成一是核心部分getnext函数，该函数的目的是求出匹配到这一步后下一步应该做的，比较麻烦的这一步使用一个while循环进行多次判断最长相连的前缀后缀，第二就是kmp算法的主体部分，加入s1字符串进行判断，根据next数组进行高效率判断，以及整合最后判断是否有完成的s2从而对该题进行求解。

Barn Echoes G

题目描述

The cows enjoy mooing at the barn because their moos echo back, although sometimes not completely. Bessie, ever the excellent

secretary, has been recording the exact wording of the moo as it goes out and returns. She is curious as to just how much overlap there is.

Given two lines of input (letters from the set a..z, total length in the range 1..80), each of which has the wording of a moo on it, determine the greatest number of characters of overlap between one string and the other. A string is an overlap between two other strings if it is a prefix of one string and a suffix of the other string.

By way of example, consider two moos:

moyooyoxyzooo

yzoooqyasdfljkamo

The last part of the first string overlaps 'yzooo' with the first part of the second string. The last part of the second string

overlaps 'mo' with the first part of the first string. The largest overlap is 'yzooo' whose length is 5.

POINTS: 50

奶牛们非常享受在牛栏中哞叫，因为她们可以听到她们哞声的回音。虽然有时候并不能完全听到完整的回音。Bessie曾经是一个出色的秘书，所以她精确地纪录了所有的哞叫声及其回声。她很好奇到底两个声音的重复部份有多长。

输入两个字符串（长度为1到80个字母），表示两个哞叫声。你要确定最长的重复部份的长度。两个字符串的重复部份指的是同时是一个字符串的前缀和另一个字符串的后缀的字符串。

我们通过一个例子来理解题目。考虑下面的两个哞声：

moyooyoxyzooo

yzoooqyasdfljkamo

第一个串的最后的部份"yzooo"跟第二个串的第一部份重复。第二个串的最后的部份"mo"跟第一个串的第一部份重复。所以"yzooo"跟"mo"都是这2个串的重复部份。其中，"yzooo"比较长，所以最长的重复部份的长度就是5。

输入格式

* Lines 1..2: Each line has the text of a moo or its echo

输出格式

* Line 1: A single line with a single integer that is the length of the longest overlap between the front of one string and end of the other.

输入输出样例

输入 #1复制

abcxxxxabcxabcd 
abcdxabcxxxxabcx

输出 #1复制

说明/提示

'abcxxxxabcx' is a prefix of the first string and a suffix of the second string.

#include<stdio.h>
#include<string.h>
#define N 1000005
int next[N];
char s1[N],s2[N];
int l1,l2;
void getnext(char s[],int next[])
{
	int i,j;
	next[0]=-1;
	i=0;
	j=-1;
	while(i<strlen(s))
	{
		if(j==-1||s[i]==s[j])
		{
			i++;
			j++;
			next[i]=j;
		}
		else
		{
			j=next[j];
		}
	}
}
int kmp(char s1[],char s2[],int l1)
{
	int i,j;
	getnext(s2,next);
	i=0;
	j=0;
	while(i<l1)
	{
		if(j==-1||s1[i]==s2[j])
		{
			i++;
			j++;
		}
		else
		{
			j=next[j];
		}
	}
	return j;
}
int max(int a,int b)
{
	if(a>b)
	{
		return a;
	}
	else
	{
		return b;
	}
}
int main()
{
	scanf("%s",s1);
	scanf("%s",s2);
	l1=strlen(s1);
	l2=strlen(s2);
	printf("%d",max(kmp(s1,s2,l1),kmp(s2,s1,l2)));
	getchar();
	getchar();
	return 0;
}

这题也是kmp算法比较典型的题目，相比于上面的一题感觉在kmp算法方面会更加淳朴一些，不过该题需要用到一个比较大小的函数因为需要判断到底是s1字符串的前缀或者是s2字符串的前缀所以我们要多进行一步对其进行比较，最后挑出比较大的作为最后的答案

通过对hash算法的学习，hash算法不过是一个更为便捷的搜索标号方式，这里印象比较深刻的是 hash函数给物体标号发生了冲突怎么办，比方说使用余数标号的方式，总会有可能发生冲突，又或者是其他的标号方式，冲突解决方法个人挺认可的是，多开一个数组，专门来存储发生冲突的数，为啥说这种方式挺好呢，根据hash函数的特性，一般发生冲突的概率比较小，所以将冲突的后来者单独放进一个新的数组里面查找时在主要的里面找不到，直接去新的里面去找就好了也不能说很方便，但是个人认为相对比较简单。

这边是hash算法的代码实现

#include<stdlib.h>
#define HASHSIZE 12
#define NULLKEY -32768
typedef struct
{
	int *elem;
	int count;
}HashTable;
int InitHashTable(HashTable *H)
{
	H->count=HASHSIZE;
	H->elem=(int *)malloc(HASHSIZE * sizeof(int));
	if(!H->elem)
	{
		return -1;
	}
	for(int i=0;i<HASHSIZE;i++)
	{
		H->elem[i]=NULLKEY;
	}
	return 0;
}
int Hash(int key)
{
	return key % HASHSIZE;
}
void InsertHash(HashTable *H,int key)
{
	int addr;
	addr=Hash(key);
	while(H->elem[addr]!=NULLKEY)
	{
		addr=(addr+1)%HASHSIZE;
	}
	H->elem[addr]=key;
}
int SearchHash(HashTable H,int key,int *addr)
{
	*addr=Hash(key);
	while(H.elem[*addr]!=key)
	{
		*addr=(*addr+1)%HASHSIZE;
		if(H.elem[*addr]==NULLKEY||*addr==Hash(key))
		{
			return -1;
		}
	}
	return 0;
}