找出出现次数前十的单词

最新推荐文章于 2021-12-11 16:04:58 发布

_暮雨潇湘_

最新推荐文章于 2021-12-11 16:04:58 发布

阅读量1.1k

点赞数

分类专栏：字典树

本文链接：https://blog.csdn.net/SG_SIQing/article/details/12205791

版权

字典树专栏收录该内容

2 篇文章 0 订阅

订阅专栏

有10⁶个英文单词，由小写字母组成，长度不超过20，要求找出其中出现次数前十的单词。

PS：如果第10名和第11名出现次数相同，任意一个皆可。

思路分析：

字典树

我们可以先用字典树记录下所有出现过的单词，然后对所有单词出现的次数进行部分排序，从而找出前十。

具体来说，可以这样做：扫描每个单词进行建字典树。在建树过程中，如果该单词没有出现过，我们就分配给该单词一个空间，s[]用来记录该单词和cnt[]用来记录该单词出现的字数，同时在字典中记录下该单词对应的空间位置pos；当下次扫到相同的单词时，直接将对应pos的cnt[pos]++就可以了。最后，当扫描完之后，我们就已经记录下了所有单词及其出现过的次数，分别存在s[]和cnt[]中，接下来，我们从cnt[]中找出前十大的数来，其对应的单词就是我们要找的结果了。部分排序呢，当然也不能在原cnt[]数组中排序了，我们用cnt[]中的值对其下标sub[]进行部分排序，之后在sub[]中找到前十的下标，直接读取出来输出就可以了。

代码如下：


#include <stdio.h>
#include <string.h>
#include <stdlib.h>

#define ch 97			// If it was upper words, change it to 65.
#define maxlen 32		// The max length of a string.
#define maxn 100000+5	// The max amount of strings.
#define N 3				// Get the max N strings.
const int start = 1;

typedef struct node {
	int pos;			// The sub in hash.
	struct node* next[26];
}node, *leaf;

leaf root = NULL;
char s[maxn][maxlen];	// The strings appears in order.
int cnt[maxn];			// The number of the string in order.
int sub[maxn];			// The array to be sorted.
int k = start;			// How many diff strings appear.

inline int getV(char c) {
	return c - ch;
}

inline leaf getNode() {
	int i;
	leaf p = (leaf)malloc(sizeof(node));
	p->pos = -1;
	for (i = 0; i < 26; i++) {
		p->next[i] = NULL;
	}
	return p;
}

int partQSort(int l, int r) {
	int t = sub[r];
	while (l < r) {
		while (l < r && cnt[sub[l]] >= cnt[t]) {
			l++;
		}
		sub[r] = l;
		while (l < r && cnt[sub[r]] <= cnt[t]) {
			r--;
		}
		sub[l] = r;
	}
	sub[l] = t;
	return l;
}

void getPreN(int l, int r) {
	if (l < r) {
		int m = partQSort(l, r);
		if (m > N + 1) {
			getPreN(l, m - 1);
		} else if (m < N) {
			getPreN(m + 1, r);
		}
	}
}

int main() {
	char ts[maxlen];
	int i;
	leaf p = root = getNode();
	memset(cnt, 0, sizeof(cnt));

	while (scanf("%s", ts) != EOF) {
		p = root;
		for (i = 0; ts[i]; i++) {
			int h = getV(ts[i]);
			if (p->next[h] == NULL) {
				p->next[h] = getNode();
			}
			p = p->next[h];
		}
		if (p->pos == -1) {
			strcpy(s[k], ts);
			sub[k] = k;
			p->pos = k++;
		}
		cnt[p->pos]++;
	}

	getPreN(1, k - 1);

	for (i = start; i < start + N; i++) {
		puts(s[sub[i]]);
	}

	return 0;
}

_暮雨潇湘_

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
找出出现次数前十的单词

有106个英文单词，由小写字母组成，长度不超过20，要求找出其中出现次数前十的单词。PS：如果第10名和第11名出现次数相同，任意一个皆可。思路分析：字典树我们可以先用字典树记录下所有出现过的单词，然后对所有单词出现的次数进行部分排序，从而找出前十。具体来说，可以这样做：扫描每个单词进行建字典树。在建树过程中，如果该单词没有出现过，我们就分配给该单词一个空间，s[]
复制链接

扫一扫

专栏目录