【C/C++ 07】词频统计

AllinTome

已于 2024-02-01 16:09:20 修改

阅读量764

点赞数 12

文章标签： c++ 开发语言

于 2024-02-01 16:06:54 首次发布

本文链接：https://blog.csdn.net/phoenixFlyzzz/article/details/135975196

版权

一、题目

读入一篇英文短文，去除介词、连词、冠词、副词、代词等非关键性单词后，统计每个单词出现的次数，并将单词按出现次数的降序和单词字符的升序进行显示5个单词。

二、算法

1. 通过<fstream>库创建fstream流对象，并从文件中读取全部字符存入内存。

2. 将内存的字符串进行空格、标点符号、换行符的分割，若分割后的单词属于需要记入统计的单词，则将其存入map容器中，通过map进行词频统计。

3. 对map中的词频统计结果写入vector中进行排序，并按照规定的排序顺序进行打印。

三、代码

#define _CRT_SECURE_NO_WARNINGS 1

#include <iostream>
#include <fstream>
#include <cstring>
#include <map>
#include <vector>
#include <algorithm>
using namespace std;

// 不计入统计的单词表：介词、连词、冠词、副词、代词
vector<string> g_delWord = {
	"to", "in", "on", "for", "of", "from", "between", "behind", "by", "about", "at", "with", "than",
	"a", "an", "the", "this", "that", "there",
	"and", "but", "or", "so", "yet",
	"often", "very", "then", "therefore",
	"i", "you", "we", "he", "she", "my", "your", "hes", "her", "our", "us", "it", "they", "them",
	"am", "is", "are", "was", "were", "be",
	"when", "where", "who", "what", "how",
	"will", "would", "can", "could"
};

// 仿函数
struct Compare
{
	bool operator()(const pair<string, int> e1, const pair<string, int> e2)
	{
		return e1.second > e2.second;
	}
};

int main()
{
	// 1. 读入文件数据
	//    ofstream：写文件
	//	  ifstream：读文件
	//	  fstream：读写文件
	fstream f;

	//	  ios::in
	//	  ios::out
	//	  ios::app，追加写，配合ios::out使用
	//	  ios::trunc，覆盖写，配合ios::out使用
	//	  ios::binary，以二进制的形式
	f.open("./test1.txt", ios::in);
	if (!f.is_open())
	{
		cout << "file open failed!" << endl;
		return 1;
	}
	char text[4096] = { 0 };
	f.read(text, 4096);

	// 2. 分割字符串存入map
	map<string, int> wordMap;
	const char* cut = " ,.!?;:\n";	// 部分单词分隔符
	char* w = strtok(text, cut);
	while (w)
	{
		string word = w;

		// 单词转小写
		string lwrWord;
		transform(word.begin(), word.end(), back_inserter(lwrWord), ::tolower);

		// 排除不计入统计的单词
		if (find(g_delWord.begin(), g_delWord.end(), lwrWord) == g_delWord.end())
		{
			wordMap[lwrWord]++;
			// map 的 “[]”重载，有插入、查询、修改功能，返回值为键值对的second值或false
		}

		w = strtok(NULL, cut);
	}

	// 3. 词频排序
	vector<pair<string, int>> wordVec;
	for (auto& e : wordMap)
	{
		wordVec.push_back(e);
	}

	// sort是基于快速排序实现的算法，是不稳定的排序算法，可用stable_sort代替
	stable_sort(wordVec.begin(), wordVec.end(), Compare());

	for (int i = 0; i < 5; ++i)
	{
		cout << "<" << wordVec[i].first << "," << wordVec[i].second << ">" << endl;
	}

	return 0;
}

四、测试

测试文档test1.txt

No one can help others as much as you do. 
No one can express himself like you. 
No one can express what you want to convey. 
No one can comfort others in your own way. 
No one can be as understanding as you are. 
No one can feel happy, carefree, and no one can smile as much as you do. 
In a word, no one can show your features to anyone else.
hi, how are you? I love you!

运行结果