统计英文文本中出现频率最高的10个单词

最新推荐文章于 2021-02-27 06:00:27 发布

yicoder

最新推荐文章于 2021-02-27 06:00:27 发布

阅读量8.9k

点赞数 3

分类专栏：工作面试题

2 篇文章 0 订阅

订阅专栏

在v_JULY_v的文章中找到了这个问题的解法后用C++实现了一下，发现C++的代码非常的简洁。

主要用到了标准库中的hash_map，优先级队列priority_queue。

算法的思路是：

具体实现和结果如下：

//出现次数最多的是个单词

[cpp]view plaincopy

//出现次数最多的是个单词
void top_k_words()
{
timer t;
ifstream fin;
fin.open("modern c.txt");
if (!fin)
{
cout<<"can nont open file"<<endl;
}
string s;
hash_map<string,int> countwords;
while (true)
{
fin>>s;
if (fin.eof())
{
break;
}
countwords[s]++;
}
cout<<"单词总数（重复的不计数）:"<<countwords.size()<<endl;
priority_queue<pair<int,string>,vector<pair<int,string>>,greater<pair<int,string>>> countmax;
for(hash_map<string,int>::const_iterator i=countwords.begin();
i!=countwords.end();i++)
{
countmax.push(make_pair(i->second,i->first));
if (countmax.size()>10)
{
countmax.pop();
}
}
while(!countmax.empty())
{
cout<<countmax.top().second<<" "<<countmax.top().first<<endl;
countmax.pop();
}
cout<<"time elapsed "<<t.elapsed()<<endl;
}