正则表达式的使用,python正则匹配一个话题标签

最新推荐文章于 2024-08-21 00:06:52 发布

Oh?Geostatistics…

最新推荐文章于 2024-08-21 00:06:52 发布

阅读量791

点赞数 1

分类专栏：笔记文章标签：正则表达式 python

本文链接：https://blog.csdn.net/qq_43515555/article/details/108143025

版权

笔记专栏收录该内容

4 篇文章 0 订阅

订阅专栏

在线实时匹配正则表达式的网站在此，很好用在这里插入图片描述
[#].*?\s

import re
print(re.findall(r"[#].*?\s",txt))

在这里插入图片描述
就全都提取出来了
接着把这些hashtag删掉(注意list对象remove和pop方法不同)

li = [1, 2, 3, 4]
li.remove(3)
print(li)
# Output [1, 2, 4]

li = [1, 2, 3, 4]
li.pop(2)
print(li)
# Output [1, 2, 4]

就可以进行词频分析了（这些标签出现频率太高影响正文词频的统计）

with open(r"NLTK's list of english stopwords", 'r', encoding='utf-8') as f:
    lines = f.readlines()
    for line in lines:
        stop_words.append(line.strip())
dict1={}
for word in words:
    if word in stop_words:
        continue
    else:
        dict1[word]=dict1.get(word,0)+1
dict1

在这里插入图片描述

确定要放弃本次机会？

福利倒计时

: :

立减 ¥

普通VIP年卡可用

立即使用

Oh?Geostatistics…

关注关注

1
点赞
踩
2

收藏

觉得还不错? 一键收藏
0
评论
正则表达式的使用,python正则匹配一个话题标签

在线实时匹配正则表达式的网站在此，很好用[#].*?\simport reprint(re.findall(r"[#].*?\s",txt))就全都提取出来了接着把这些hashtag删掉(注意list对象remove和pop方法不同)li = [1, 2, 3, 4]li.remove(3)print(li)# Output [1, 2, 4]li = [1, 2, 3, 4]li.pop(2)print(li)# Output [1, 2, 4]就可以进行词频分析了（这
复制链接

扫一扫