珍爱生命,远离正则表达式(parse库简介)

正则表达式(Regular Expression)描述了一种字符串匹配的模式(Pattern),可以用来检查一个串是否含有某种子串、将匹配的子串替换或者从某个串中取出符合某个条件的子串等。对于初级简单的正则匹配还算比较容易掌握,但是稍微复杂一点的正则就会经常使人头晕。因为正则表达式不够直观,可读性较差,可能你写的正则别人看不懂,而别人写的正则你看不懂。每次遇到需要字符串匹配的问题笔者都会感慨:太难了,我太难了!

 

不过笔者在前几天新认识了一个库:parse,感觉与它真是相见恨晚!简单来说,parse能够方便清晰地处理正则匹配的问题,非常适合一些简单的匹配问题。那么下面笔者就按照官方文档来为大家整理一份parse的简单介绍。

首先是parse的安装,我们通过pip:

pip install parse
#清华镜像
pip install -i https://pypi.tuna.tsinghua.edu.cn/simple parse

安装成功后我们就可以导入了:

from parse import parse

最简单的匹配,若匹配上返回一个Result 实例:

print(parse('python', 'python'))

 

若没有匹配上,则返回None:

print(parse('pthon', 'python'))

 

如果通过“{ }”进行匹配,最后会返回一个匹配到的可用下标索引的Result实例:

p = parse("With great {} there must come great {}", "With great power there must come great responsibility")
print(p)

print(p[0])
print(p[1])

如果通过“{ }”进行匹配并为匹配字段定义了字段名,最后会返回一个匹配到的可用键索引的Result实例:

p = parse("With great {noun1} there must come great {noun2}", "With great power there must come great responsibility")
print(p)

print(p['noun1'])
print(p['noun2'])

另外我们可以像re.compile一样编写pattern来复用规则:

from parse import compile

pattern = compile("With great {noun1} there must come great {noun2}")
p = pattern.parse("With great power there must come great responsibility")

print(p)

在定义匹配规则时也可以设定好类型的转换(默认匹配出的类型为str):

# 不定义类型转换
p = parse("My name is {name}, I have shot down {number} mobile suits.", "My name is Amuro Ray, I have shot down 72 mobile suits.")
print(p)
# 定义类型转换
p = parse("My name is {name}, I have shot down {number:d} mobile suits.", "My name is Amuro Ray, I have shot down 72 mobile suits.")
print(p)

在匹配的“{ }”中加入“:d”就可以将字符串型转换为整数型。另外还有很多可以转换的类型:

lLetters (ASCII)str
wLetters, numbers and underscorestr
WNot letters, numbers and underscorestr
sWhitespacestr
SNon-whitespacestr
dDigits (effectively integer numbers)int
DNon-digitstr
nNumbers with thousands separators (, or .)int
%Percentage (converted to value/100.0)float
fFixed-point numbersfloat
FDecimal numbersDecimal
eFloating-point numbers with exponent e.g. 1.1e-10, NAN (all case insensitive)float
gGeneral number format (either d, f or e)float
bBinary numbersint
oOctal numbersint
xHexadecimal numbers (lower and upper case)int
tiISO 8601 format date/time e.g. 1972-01-20T10:21:36Z (“T” and “Z” optional)datetime
teRFC2822 e-mail format date/time e.g. Mon, 20 Jan 1972 10:21:36 +1000datetime
tgGlobal (day/month) format date/time e.g. 20/1/1972 10:21:36 AM +1:00datetime
taUS (month/day) format date/time e.g. 1/20/1972 10:21:36 PM +10:30datetime
tcctime() format date/time e.g. Sun Sep 16 01:03:52 1973datetime
thHTTP log format date/time e.g. 21/Nov/2011:00:07:11 +0000datetime
tsLinux system log format date/time e.g. Nov 9 03:37:44datetime
ttTime e.g. 10:21:36 PM -5:30time

 

在提取匹配时直接去掉多余的空格:

p = parse("Stop talking, {}.", "Stop talking,    Char Aznable    .")
print(f"不去除空格的结果:{p}")
p = parse("Stop talking, {:^}.", "Stop talking,    Char Aznable    .")
print(f"去除空格的结果:{p}")

“:^”代表左边与右边的空格都去除,“:<”代表去除右边的空格,“:>”代表去除左边的空格。

区分大小写:

# 默认不区分大小写
p = parse("char aznable", "Char Aznable")
print(p)
# case_sensitive=True区分大小写
p = parse("char aznable", "Char Aznable", case_sensitive=True)
print(p)

Result实例的属性(fixed、named、spans):

p = parse("How dare you {} me, {name}!", "How dare you scheme against me, Char Aznable!")
print(p)

p.fixed

p.named

spans为每个匹配到的在原始字符串中的位置:

p.spans

我们还可以自定义匹配的转换形式:

def add_five(x):
    return int(x)+5

p = parse("My name is {}, I am {} years old.", "My name is Banagher Links, I am 16 years old.")
print(p)
# 用add_five函数转换
p = parse("My name is {}, I am {:add_five} years old.", "My name is Banagher Links, I am 16 years old.", dict(add_five=add_five))
print(p)

 

 

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值