1071 Speech Patterns

最新推荐文章于 2024-10-01 23:05:47 发布

ㄣK╰☆ぷ

最新推荐文章于 2024-10-01 23:05:47 发布

阅读量897

点赞数 1

文章标签：算法

本文链接：https://blog.csdn.net/ASBSIHD/article/details/130850264

版权

文章探讨了人们对于同义词的个人偏好，如何通过分析这些偏好来推测说话者身份，特别是在验证在线身份时。方法包括统计最常出现的词并考虑字典序。

摘要由CSDN通过智能技术生成

People often have a preference among synonyms of the same word. For example, some may prefer "the police", while others may prefer "the cops". Analyzing such patterns can help to narrow down a speaker's identity, which is useful when validating, for example, whether it's still the same person behind an online avatar.

Now given a paragraph of text sampled from someone's speech, can you find the person's most commonly used word?

Input Specification:

Each input file contains one test case. For each case, there is one line of text no more than 1048576 characters in length, terminated by a carriage return \n. The input contains at least one alphanumerical character, i.e., one character from the set [0-9 A-Z a-z].

Output Specification:

For each test case, print in one line the most commonly occurring word in the input text, followed by a space and the number of times it has occurred in the input. If there are more than one such words, print the lexicographically smallest one. The word should be printed in all lower case. Here a "word" is defined as a continuous sequence of alphanumerical characters separated by non-alphanumerical characters or the line beginning/end.

Note that words are case insensitive.

Sample Input:

Can1: "Can a can can a can?  It can!"

Sample Output:

can 5

人们通常对同一个词的同义词有偏好。例如，有些人可能更喜欢“the police”，而其他人可能更喜欢“the cops”。分析此类模式有助于缩小说话者的身份范围，这在验证时很有用，例如，在线头像背后是否仍然是同一个人。现在给定一段从某人的演讲中抽取的文本，你能找到这个人最常用的词吗？

输入规范：每个输入文件包含一个测试用例。对于每种情况，一行文本的长度不超过 1048576 个字符，以回车符 \n 结束。输入包含至少一个字母数字字符，即来自集合 [0-9 A-Z a-z] 的一个字符。

输出规范：对于每个测试用例，在一行中打印输入文本中最常出现的单词，后跟一个空格和它在输入中出现的次数。如果有多个这样的词，则打印字典序最小的一个。这个词应该全部小写。这里的“单词”被定义为由非字母数字字符或行开头/结尾分隔的连续字母数字字符序列。请注意，单词不区分大小写。

思考:

首先明确，字符串也是一个数组，我们通过一个个的下标，对应一个个字母开始操作实现提取和清除，遍历。

如何选中第一个不被接收？

不需要，照样接收，第一个连续的字符串实际上多了个数字跟纯单词不同，反正次数少就不会被输出

如何将一串字符串提取出来？

我们就将集合之外的字符作为分割点，比如” ”，一旦出现这个空字符，就把前面储存的字符串标记加一，然后清空字符继续往下遍历即可。

终点就在于出现非集合的字符就说明一个单词已经结束，就开始分割.

//struct _data

//{

// string word;

// int cnt = 0;

//};

//set<_data>s;

//不需要用容器，直接哈希就可以存放这两个数据了

//用哈希？

//还真是

//大概思路就是哈希数组存放对应字符串数据，然后记录次数，

//大写小写属于同一类都可以记录，数组对应的值就存放出现的次数

//下标就记录字符串，大小写设条件转换一下就行了

//思路正确

解题过程

#include<iostream>
#include<string>
#include<map>
#include<vector>
#include<queue>
#include<algorithm>
using namespace std;

map<string,int>mp;
bool isright(char s)
{
	if ((s >= '0' && s <='9')  || (s >= 'a' && s <= 'z') || (s >= 'A' && s <= 'Z'))
		//0-9也是字符！！！
		return true;
	
	return false;
}

int main()
{
	string str1, t;
	getline(cin, str1);
	for (int i = 0; i < str1.size(); i++)
	{
		if (isright(str1[i]))
		{
			str1[i] = tolower(str1[i]);
			//把tolower写成islower,服了
			//islower是判断是否为小写
			//tolower是改写成小写
			t += str1[i];
		}

		if (!isright(str1[i]) || i == str1.size() - 1)
		{
			if (t.size() != 0)//别漏了这个t的size要不为0，
				//不然无字符哈希会一直记录储存，结果就错了
				//无字符的情况是不考虑的！
                mp[t]++;
                 
                t = "";	
		}

		
	}

	int max = 0;
	for (auto it = mp.begin(); it != mp.end(); it++)
	{
		if (it->second > max)
		{
			t = it->first;
			max = it->second;
		}
	}
	cout << t << " " << max;
	return 0;
}

解析

#include<stdio.h>
#include<map>
#include<iostream>
#include<string>

using namespace std;
bool isright(char s) {
    if ((s >= 'a' && s <= 'z') || (s >= 'A' && s <= 'Z') || (s >= '0' && s <= '9'))
        return true;
    return false;
}
int main() {
    string str1, t;
    map<string, int> mp;
    getline(cin, str1);
    //输入的字符串其实就是数组，也是一个个字母放入进去的，每一个空间就对应一个字母。
    for (int i = 0; i < str1.size(); i++) {
        if (isright(str1[i]) == true) {
            str1[i] = tolower(str1[i]);
            t += str1[i];
            //这个加就是直接将字符插入到t这个数组中，只要类型相同可以这么写.
            //而且只要是字符类型，系统会默认加就是直接加在后面，而不是两个字母大小同时相加变成新的字母！
            //每次加一个字母就是在下一个空间中插入一个新的字母，+在字符串中是实现插入而不是叠加!
        }
        if (!isright(str1[i]) || i == str1.size() - 1){//最后一个字符时 
            if (t.size() != 0) mp[t]++;
            t = "";
            //每一次记录完之后马上把t清空，这样就可以实现逐个单词的遍历
            //然后使用哈希，逐渐增加记录次数

        }
    }
    int Max = 0;
    for (auto it = mp.begin(); it != mp.end(); it++) 
    {
        if (it->second > Max) {
            t = it->first;
            Max = it->second;
        }
    }
    cout << t << " " << Max;
    return 0;
}