【算法】英语短文单词词频统计

目录

题目

示例短文

输出示例

算法分析

源代码


题目

    1. 提供三篇英语短文,分别统计每篇短文中每个单词出现的次数

    2. 每个单词用空格、换行或标点符号隔开,忽视大小写

    3. 打印出现频率最高的5个单词,打印单词和出现的次数

    4. 单词的打印优先次数,再是根据单词字母在字典中的顺序

    5. 介词、冠词、连词、副词、代词不统计

示例短文

test1.txt

In the flood of darkness, hope is the light. It brings comfort, faith, and confidence. 
It gives us guidance when we are lost, and gives support when we are afraid.
And the moment we give up hope, we give up our lives. 
The world we live in is disintegrating into a place of malice and hatred, where we need hope and find it harder. 
In this world of fear, hope to find better, but easier said than done, the more meaningful life of faith will make life meaningful.

test2.txt

No one can help others as much as you do. 
No one can express himself like you. 
No one can express what you want to convey. 
No one can comfort others in your own way. 
No one can be as understanding as you are. 
No one can feel happy, carefree, and no one can smile as much as you do. 
In a word, no one can show your features to anyone else.

test3.txt

Keep faith and hope for the future. 
Make your most sincere dreams, and when the opportunities come, they will fight for them. 
It may take a season or more, but the ending will not change. Ambition, best, become a reality. 
An uncertain future, only one step at a time, the hope can realize the dream of the highest. 
We must treasure the dream, to protect it a season, let it in the heart quietly germinal. 
However, we have to gently protect our hearts deep expectations, slowly dream, will achieve new life.

输出示例

test1.txt: 
hope 4
faith 2
find 2
give 2
gives 2

test2.txt: 
can 8
no 8
one 8
as 6
do 2

test3.txt: 
dream 3
future 2
hope 2
protect 2
season 2

算法分析

1. 文件读取,将文件中的内容以字符串形式读取存入text字符串变量中

2. 字符串分割,将text文件字符串内容以 " ,.\n" 进行分割

3. 通过map的特性,将分割的字符串按要求存入map的同时统计次数(map默认根据key排序)

4. 将map的数据存入vector中,通过stable_sort()进行词频排序(稳定排序)

5. 打印词频出现最多的5个单词以及出现次数,已经在vector中排序完成

源代码

#include <iostream>
#include <fstream>
#include <string>
#include <map>
#include <cstring>
#include <cstdlib>
#include <algorithm>
#include <vector>
using namespace std;

// 需要删掉的 介词、冠词、连词、副词、代词
vector<string> g_delWord = {
    "to", "in", "on", "for", "of", "from", "between", "behind", "by", "about", "at", "with", "than",
    "a", "an", "the", "this", "that",
    "and", "but", "or", "so", "yet",
    "often", "very", "then", "therefore",
    "i", "you", "we", "he", "she", "my", "your", "hes", "her", "our", "us", "it",
    "am", "is", "are",
    "when", "where", "who", "what",
    "will", "would"
};

struct compare
{
    bool operator()(const pair<int, string>& l, const pair<int, string>& r)
    {
        return l.first > r.first;
    }
};

int main()
{
    for (int i = 1; i <= 3; ++i)
    {
        // 获取文件名
        string fileName = "test";
        fileName += '0' + i;
        fileName += ".txt";

        // 读取文件信息
        fstream file;
        file.open(fileName, ios::in);   // 以只读方式打开文件,ios::out(只写),ios::app(追加)
        char text[4096];
        file.read(text, 4096);
        // cout << fileName << ": " << endl;
        // cout << text << endl << endl;

        // 字符串分割,将分割的结果存入map中
        map<string, int> mWords;
        const char* s = " ,.\n";
        char* p = strtok(text, s);
        while (p)
        {
            string word = static_cast<string>(p);
            string lwrWord;
            transform(word.begin(), word.end(), back_inserter(lwrWord) ,::tolower);     // 字符串大写转小写

            // 排除 介词、连词、副词、代词
            if (find(g_delWord.begin(), g_delWord.end(), lwrWord) == g_delWord.end())
            {
                mWords[lwrWord]++;       // map的 "[]" 的重载,有插入/查询/修改功能,返回值为键值对的second值或false
            }
            p = strtok(NULL, s);
        }

        // 遍历map
        // int cnt = 0;
        // for (const auto& e: mWords)
        // {
        //     cout << "(" << e.first << ", " << e.second << ")    ";
        //     ++cnt;
        //     if (cnt % 5 == 0)
        //     {
        //         cout << endl;
        //     }
        // }
        // cout << endl <<endl;

        // 将map中的数据存入vector中
        vector< pair<int, string> > vWords;     // "> >"之间空格,防止与部分编译的 ">>" 重载冲突
        for (const auto& e: mWords)
        {
            vWords.push_back(make_pair(e.second, e.first));
        }

        // 排序,sort排序存在不稳定缺陷,可以自定义sort排序规则,也可以使用stable_sort
        stable_sort(vWords.begin(), vWords.end(), compare());
        cout << fileName << ": " << endl;
        for (int j = 0; j < 5; ++j)
        {
            cout << vWords[j].second << " " << vWords[j].first << endl;
        }
        cout << endl;
    }

    return 0;
}

  • 5
    点赞
  • 2
    收藏
    觉得还不错? 一键收藏
  • 打赏
    打赏
  • 4
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 4
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

AllinTome

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值