7-2 词频统计分数 20全屏浏览切换布局作者 DS课程组单位浙江大学

202321332059 赖嘉宏

于 2024-06-05 20:34:28 发布

阅读量375

点赞数 3

文章标签： c# 开发语言

本文链接：https://blog.csdn.net/2301_80040876/article/details/139481537

版权

7-2 词频统计

分数 20

全屏浏览

切换布局

作者 DS课程组

单位浙江大学

请编写程序，对一段英文文本，统计其中所有不同单词的个数，以及词频最大的前10%的单词。

所谓“单词”，是指由不超过80个单词字符组成的连续字符串，但长度超过15的单词将只截取保留前15个单词字符。而合法的“单词字符”为大小写字母、数字和下划线，其它字符均认为是单词分隔符。

输入格式:

输入给出一段非空文本，最后以符号#结尾。输入保证存在至少10个不同的单词。

输出格式:

在第一行中输出文本中所有不同单词的个数。注意“单词”不区分英文大小写，例如“PAT”和“pat”被认为是同一个单词。

随后按照词频递减的顺序，按照词频:单词的格式输出词频最大的前10%的单词。若有并列，则按递增字典序输出。

输入样例：

This is a test.

The word "this" is the word with the highest frequency.

Longlonglonglongword should be cut off, so is considered as the same as longlonglonglonee.  But this_8 is different than this, and this, and this...#
this line should be ignored.

输出样例：（注意：虽然单词`the`也出现了4次，但因为我们只要输出前10%（即23个单词中的前2个）单词，而按照字母序，`the`排第3位，所以不输出。）

23
5:this
4:is

本来想用map，但是map用string转char太麻烦，干脆结构体

#include<iostream>
#include<map>
#include<cstring>
#include<algorithm>
using namespace std;
struct tongji{
    int data;
    char str[100];
}T[82];
struct compareto{
    int data;
    char str[100];
}C[82];
void exchang(struct tongji *T,int k){
    int count=0;
    while(T[k].str[count]!='\0'){
        if(T[k].str[count]>'A'&&T[k].str[count]<'Z'){
            T[k].str[count]=(char)(32+T[k].str[count]);
        }
        count++;
    }
}
bool cmp1(struct compareto a,struct compareto b){
    return a.data>b.data;
}
int main(){
    char str;
    int count=0;
    int x=0;
    int i=0;
    while(scanf("%c",&str)){
        if((str>='a'&&str<='z')||(str>='A'&&str<='Z')){
            T[count].str[i++]=str;
        }
        if(str=='\n'||str==' '){
            i=0;
            x++;
            count++;
        }
        if(str=='#'){
            break;
        }
    }
    count++;
    for(int j=0;j<count;j++){
        exchang(T,j);
    }
       // for(int j=0;j<count;j++){
       //     cout<<T[j].str<<endl;
       // }
    int cmp=0;
//    for(int j=0;j<count;j++){
//        flag=1;
//        for(int p=0;p<cmp;p++){
//            if(strcmp(T[j].str,C[p].str)==0)
//
//            }
//        }
    strcpy(C[cmp].str,T[0].str);
    cmp++;
    for(int j=1;j<count;j++){
        int flag=1;
        for(int p=0;p<cmp;p++){
            if(strcmp(T[j].str,C[p].str)==0){
                flag=0;
            }
        }
        if(flag&&T[j].str[0]>='a'&&T[j].str[0]<='z'){
            strcpy(C[cmp].str,T[j].str);
            cmp++;
        }
    }
    for(int j=1;j<count;j++){
        for(int p=0;p<cmp;p++){
            if(strcmp(T[j].str,C[p].str)==0){
                C[p].data++;
            }
        }
        }
    sort(C,C+cmp,cmp1);
    cout<<cmp<<endl;
    for(int p=0;p<cmp/10;p++)
    {
        cout<<C[p].data<<":"<<C[p].str<<endl;
    }
}

202321332059 赖嘉宏

关注

3
点赞
踩
3

收藏

觉得还不错? 一键收藏
0
评论
7-2 词频统计分数 20全屏浏览切换布局作者 DS课程组单位浙江大学

7-2 词频统计分数 20全屏浏览切换布局作者 DS课程组单位浙江大学请编写程序，对一段英文文本，统计其中所有不同单词的个数，以及词频最大的前10%的单词。所谓“单词”，是指由不超过80个单词字符组成的连续字符串，但长度超过15的单词将只截取保留前15个单词字符。而合法的“单词字符”为大小写字母、数字和下划线，其它字符均认为是单词分隔符。
复制链接

扫一扫