7-46 新浪微博热门话题 (30分)

最新推荐文章于 2024-03-24 16:16:05 发布

狸吉、

最新推荐文章于 2024-03-24 16:16:05 发布

阅读量912

点赞数 1

分类专栏： PTA 文章标签：数据结构算法

本文链接：https://blog.csdn.net/qq_41785581/article/details/106676715

版权

PTA 专栏收录该内容

2 篇文章 0 订阅

订阅专栏

7-46 新浪微博热门话题

新浪微博可以在发言中嵌入“话题”，即将发言中的话题文字写在一对“#”之间，就可以生成话题链接，点击链接可以看到有多少人在跟自己讨论相同或者相似的话题。新浪微博还会随时更新热门话题列表，并将最热门的话题放在醒目的位置推荐大家关注。

本题目要求实现一个简化的热门话题推荐功能，从大量英文（因为中文分词处理比较麻烦）微博中解析出话题，找出被最多条微博提到的话题。

输入格式:

输入说明：输入首先给出一个正整数N（≤10⁵），随后N行，每行给出一条英文微博，其长度不超过140个字符。任何包含在一对最近的#中的内容均被认为是一个话题，输入保证#成对出现。

输出格式:

第一行输出被最多条微博提到的话题，第二行输出其被提到的微博条数。如果这样的话题不唯一，则输出按字母序最小的话题，并在第三行输出And k more ...，其中k是另外几条热门话题的条数。输入保证至少存在一条话题。

注意：两条话题被认为是相同的，如果在去掉所有非英文字母和数字的符号、并忽略大小写区别后，它们是相同的字符串；同时它们有完全相同的分词。输出时除首字母大写外，只保留小写英文字母和数字，并用一个空格分隔原文中的单词。

输入样例:

4
This is a #test of topic#.
Another #Test of topic.#
This is a #Hot# #Hot# topic
Another #hot!# #Hot# topic

输出样例:

Hot
2
And 1 more ...

思路

读入话题
将话题规范化
建立两个表（一个表作为哈希表，另一个表用于顺序存储插入哈希表中的话题，以便于后续输出）
1. 处理哈希冲突：开放地址法，即 H_i = (H(key) + d_i) % m，H(key)为散列函数，m为散列表长度，d_i为增量序列。

分析

时间复杂度：O(1)
空间复杂度：O(n)

运行结果

测试点	提示	结果	耗时	内存
0	sample 并列热门；同一微博重复提到的话题只算1次	答案正确	3 ms	384 KB
1	一个话题	答案正确	3 ms	296 KB
2	分词不同，算2个不同的话题；同一微博可包含多个话题	答案正确	2 ms	228 KB
3	最大N；最长微博；最长话题	答案正确	141 ms	768 KB

代码

#include <stdio.h>
#include <string.h>
#include <malloc.h>

#define MAXLENGTH   1000000 // 散列表大小，要保证装填因子小于一定阈值，否则会因频繁的哈希冲突而超时

typedef struct Node
{
    char *topic;            // 话题
    int count;              // 当前话题被提及的次数
    int last;               // 最后一次提及该话题的微博下标（用于去重）
}*Node;

Node hashTable[MAXLENGTH];  // 散列表
Node indices[MAXLENGTH];    // 将散列表紧凑化
int sumOfTopics;            // 话题总数

void solution();
void handle(char *text, int numberOfWeibo);     // 从文本中查找话题
void normalize(char *topic, int numberOfWeibo); // 将话题形式规范化
int hash(char *topic);                          // 计算散列值（可能越界）
int mod(int n);                                 // 对散列值取余
void insert(char *topic, int numberOfWeibo);    // 将话题插入到散列表

int main()
{
    solution();
    return 0;
}

void solution()
{
    int n;
    scanf("%d\n", &n);
    char buf[141];
    for (int i = 0; i < n; i++)
    {
        gets(buf);
        handle(buf, i);
    }
    Node maxTitle = indices[0];
    int num = 0;
    for (int i = 1; i < sumOfTopics; i++)
    {
        if (indices[i]->count > maxTitle->count)
        {
            maxTitle = indices[i];
            num = 0;
        }
        else if (indices[i]->count == maxTitle->count)
        {
            if (strcmp(indices[i]->topic, maxTitle->topic) < 0)
            {
                maxTitle = indices[i];
            }
            num++;
        }
    }
    if (maxTitle->topic[0] >= 'a' && maxTitle->topic[0] <= 'z') maxTitle->topic[0] -= 32;
    printf("%s\n%d\n", maxTitle->topic, maxTitle->count);
    if (num)
    {
        printf("And %d more ...\n", num);
    }
}

void handle(char *text, int numberOfWeibo)
{
    char *first;
    char *second;
    char buf[141];
    while ((first = strchr(text, '#')) != NULL && (second = strchr(first + 1, '#')) != NULL)
    {
        strncpy(buf, first + 1, second - first - 1);
        buf[second - first - 1] = '\0';
        normalize(buf, numberOfWeibo);
        text = second + 1;
    }
}

void normalize(char *topic, int numberOfWeibo)
{
    char *text = topic;
    if (!text || !*text) return;
    // 将非字母、数字的字符替换为空格
    while (*text)
    {
        if (*text >= 'A' && *text <= 'Z') *text += 32;
        else if (!(*text >= 'a' && *text <= 'z') && !(*text >= '0' && *text <= '9')) *text = ' ';
        text++;
    }

    char *first = NULL;
    char *second = NULL;
    char *rear = topic + strlen(topic);
    // 头部去空格
    text = topic;
    while (*text == ' ') text++;
    memmove(topic, text, rear - text + 1);
    rear -= text - topic;
    text = topic;
    // 中间和尾部去空格
    for (; *text; text++)
    {
        if (*text == ' ')
        {
            first = second = text;
            while (*second == ' ') second++;
            if (second - first != 1 && second != rear)
            {
                memmove(first + 1, second, rear - second + 1);
                rear -= second - first - 1;
            }
            else if(second == rear)
            {
                *first = '\0';
            }
        }
    }
    // 将话题插入散列表中
    insert(topic, numberOfWeibo);
}

int hash(char *topic)
{
    unsigned n = 0;
    while (*topic)
    {
        n += *topic - 'a';
        n <<= 5;
        topic++;
    }
    return n;
}

int mod(int n)
{
    while (n < 0) n += MAXLENGTH;
    return n % MAXLENGTH;
}

void insert(char *topic, int numberOfWeibo)
{
    int key = hash(topic);
    int i = 0;
    int j = 0;
    // 从中间往两侧探测，处理哈希冲突，找空位
    for (; i < MAXLENGTH / 2; i++)
    {
        j = mod(key + i); // 往右侧查找
        if (hashTable[j])
        {
            if (!strcmp(topic, hashTable[j]->topic))
            {
                if (hashTable[j]->last == numberOfWeibo) return;
                ++hashTable[j]->count;
                hashTable[j]->last = numberOfWeibo;
            }
        }
        else break;
        j = mod(key - i); // 往左侧查找
        if (hashTable[j])
        {
            if (!strcmp(topic, hashTable[j]->topic))
            {
                if (hashTable[j]->last == numberOfWeibo) return;
                ++hashTable[j]->count;
                hashTable[j]->last = numberOfWeibo;
            }
        }
        else break;
    }
    // 找到空位后，将新的结点插入到散列表中
    hashTable[j] = (Node)malloc(sizeof(struct Node));
    hashTable[j]->topic = (char*)malloc(strlen(topic) + 1);
    strcpy(hashTable[j]->topic, topic);
    hashTable[j]->count = 1;
    hashTable[j]->last = numberOfWeibo;
    indices[sumOfTopics++] = hashTable[j];
}

狸吉、

关注

1
点赞
踩
3

收藏

觉得还不错? 一键收藏
0
评论
7-46 新浪微博热门话题 (30分)

@[TOC](7-46 新浪微博热门话题 (30分))7-46 新浪微博热门话题 (30分)新浪微博可以在发言中嵌入“话题”，即将发言中的话题文字写在一对“#”之间，就可以生成话题链接，点击链接可以看到有多少人在跟自己讨论相同或者相似的话题。新浪微博还会随时更新热门话题列表，并将最热门的话题放在醒目的位置推荐大家关注。本题目要求实现一个简化的热门话题推荐功能，从大量英文（因为中文分词处理比较麻烦）微博中解析出话题，找出被最多条微博提到的话题。输入格式:输入说明：输入首先给出一个正整数N（≤105）
复制链接

扫一扫

专栏目录