工作中常用的Regular Expression

最新推荐文章于 2024-10-02 13:47:06 发布

chloe_au_yeung

最新推荐文章于 2024-10-02 13:47:06 发布

阅读量154

点赞数

文章标签：正则表达式 python

本文链接：https://blog.csdn.net/chloe_ou/article/details/118798640

版权

为了工作方便（每次过一段时间要用re的时候都需要进行re复健），决定把一些常用的正则表达式记下来。

抽取Weibo中的Hashtag

1、不保留#

def hashtag(s):
	pattern = re.compile('(?:\#)([^\#][\u4e00-\u9fcc\S]*?[^\#])(?:\#)', re.U)
    return re.findall(pattern, s)

结果：
Hashtag Extraction

2、保留#

def hashtag(s):
	pattern = pattern = re.compile('((?:\#)[\u4e00-\u9fcc\S\s]*?(?:\#))', re.U)
    return re.findall(pattern, s)

s = '#罗云熙[超话]##罗云熙心跳源计划# [鲜花][鲜花]#谁治愈了罗云熙##悦薇水乳# @罗云熙Leo 罗先生节日快乐，组织快困死了。 \u200b'
# 结果如下：
['#罗云熙[超话]#', '#罗云熙心跳源计划#', '#谁治愈了罗云熙#', '#悦薇水乳#']

中文切字，英文切词

def split(s):
    reg = re.compile(r'[a-z]+|[\u4e00-\u9fcc]|[\d+(\.\d+)?]+', re.U)
    return re.findall(reg, s.lower())

结果：

a = 'D Rose 10男鞋籃球場上運動鞋'
b = '三叶草POD-S3.1男女鞋经典运动鞋休闲鞋'
# 结果如下：
['d', 'rose', '10', '男', '鞋', '籃', '球', '場', '上', '運', '動', '鞋']
['三', '叶', '草', 'pod', 's', '3.1', '男', '女', '鞋', '经', '典', '运', '动', '鞋', '休', '闲', '鞋']