统计文章内各个单词出现的次数

最新推荐文章于 2024-05-13 05:16:19 发布

DayThinking

最新推荐文章于 2024-05-13 05:16:19 发布

阅读量8.6k

点赞数 1

分类专栏：算法分析文章标签： string iterator timer pair system 算法

本文链接：https://blog.csdn.net/sszgg2006/article/details/7773145

版权

算法分析专栏收录该内容

6 篇文章 0 订阅

订阅专栏

在vs2010下运行

算法的思路是：

从头到尾遍历文件，从文件中读取遍历到的每一个单词。
把遍历到的单词放到hash_map中，并统计这个单词出现的次数。
遍历hash_map，将遍历到的单词的出现次数放到优先级队列中。
当优先级队列的元素个数超过k个时就把元素级别最低的那个元素从队列中取出，这样始终保持队列的元素是k个。
遍历完hash_map，则队列中就剩下了出现次数最多的那k个元素。

具体实现和结果如下：

// 出现次数最多的K个单词.cpp : Defines the entry point for the console application.
#include "stdafx.h"
#include <hash_map>
#include <string>
#include <fstream>
#include <queue>
#include <iostream>
#include <algorithm>
#include <boost/timer.hpp> 
using namespace std;
using namespace boost;
void top_k_words()//出现次数最多的是个单词
{
	timer t;
	ifstream fin;
	fin.open("modern c.txt");
	if (!fin)
	{
		cout<<"can not open file"<<endl;
	}
	string s;
	hash_map<string,int> countwords;
	while (true)
	{
		fin>>s;
		countwords[s]++;
		if (fin.eof())
		{
			break;
		}
		
	}
	cout<<"单词总数 （重复的不计数）:"<<countwords.size()<<endl;
	priority_queue<pair<int,string>,vector<pair<int,string>>,greater<pair<int,string>>> countmax;
	for(hash_map<string,int>::const_iterator i=countwords.begin();
		i!=countwords.end();i++)
	{
		countmax.push(make_pair(i->second,i->first));
		if (countmax.size()>10)
		{
			countmax.pop();
		}
	}
	while(!countmax.empty())
	{
		cout<<countmax.top().second<<" "<<countmax.top().first<<endl;
		countmax.pop();
	}
	cout<<"time elapsed "<<t.elapsed()<<endl;
}
int main(int argc, char* argv[])
{
	top_k_words();

	system("pause");
	return 0;
}

DayThinking

关注

1
点赞
踩
5

收藏

觉得还不错? 一键收藏
1
评论
统计文章内各个单词出现的次数

在vs2010下运行算法的思路是：从头到尾遍历文件，从文件中读取遍历到的每一个单词。把遍历到的单词放到hash_map中，并统计这个单词出现的次数。遍历hash_map，将遍历到的单词的出现次数放到优先级队列中。当优先级队列的元素个数超过k个时就把元素级别最低的那个元素从队列中取出，这样始终保持队列的元素是k个。遍历完hash_map，则队列中就剩下了出现次数最多的那k个元素。
复制链接

扫一扫

专栏目录