微博话题正则表达式匹配 ##

最新推荐文章于 2023-03-28 21:16:45 发布

深圳市热心市民市民

最新推荐文章于 2023-03-28 21:16:45 发布

阅读量2.2k

点赞数 3

分类专栏： Spring Boot 文章标签：话题正则表达式

本文链接：https://blog.csdn.net/u013107634/article/details/89471165

版权

Spring Boot 专栏收录该内容

12 篇文章

订阅专栏

本文介绍了一个使用Java实现的话题标签解析器，能够从文本中提取符合特定格式的话题标签，支持新浪微博的话题标签格式，最大长度为40个字符。解析器通过正则表达式匹配并返回所有不重复的话题列表。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

import java.util.LinkedHashSet;
import java.util.Set;
import java.util.regex.Matcher;
import java.util.regex.Pattern;

/**
 * @author XXX
 * Date: 2019/3/20
 * Description:
 */
public class RegexpUtil {

    /**
     * topicTagPattern 匹配话题正则表达式
     */
    private static final Pattern topicTagPattern = Pattern.compile("#([^#]{1,40})#");

    public static Set<String> getTopicList(String content){
        Set<String> topicList = new LinkedHashSet<>();
        Matcher matcher = topicTagPattern.matcher(content);
        while (matcher.find()){
            String topicName = matcher.group(1);
            topicList.add(topicName);
        }
        return topicList;
    }

    public static void main(String[] args){

        String str = "#哈哈a###这是一个#好####哈哈a##哈#啊圣诞节疯狂#奥斯卡级代付款##as的开发#";
        Set<String> topicList = getTopicList(str);
        System.out.println(topicList);

    }
}

输出：

[哈哈a, 这是一个, 哈, 奥斯卡级代付款, as的开发]

新浪微博的话题长度限制好像是40个字符，

另外类似 #1#2#3# 只有1,3 才算话题 2不算相当于一但匹配成功一个话题，二个# 符号就会被消耗。

如果需要返回包含重复的话题，getTopicList 方法返回参数改为 List 就好。