正则写复杂了很麻烦,难写难调试,只需要两个函数,就能用简单正则组合构建复杂正则:
比如输入一个字符串规则,可以使用 {name}
引用前面定义的规则:
# rules definition
rules = r'''
protocol = http|https
login_name = [^:@\r\n\t ]+
login_pass = [^@\r\n\t ]+
login = {login_name}(:{login_pass})?
host = [^:/@\r\n\t ]+
port = \d+
optional_port = (?:[:]{port})?
path = /[^\r\n\t ]*
url = {protocol}://({login}[@])?{host}{optional_port}{path}?
'''
然后调用 regex_build
函数,将上面的规则转换成一个字典并输出:
# expand patterns in a dictionary
m = regex_build(rules, capture = True)
# list generated patterns
for k, v in m.items():
print(k, '=', v)
结果:
protocol = (?P<protocol>http|https)
login_name = (?P<login_name>[^:@\r\n\t ]+)
login_pass = (?P<login_pass>[^@\r\n\t ]+)
login = (?P<login>(?P<login_name>[^:@\r\n\t ]+)(:(?P<login_pass>[^@\r\n\t ]+))?)
host = (?P<host>[^:/@\r\n\t ]+)
port = (?P<port>\d+)
optional_port = (?P<optional_port>(?:[:](?P<port>\d+))?)
path = (?P<path>/[^\r\n\t ]*)
url = (?P<url>(?P<protocol>http|https)://((?P<login>(?P<login_name>[^:@\r\n\t ]+)(:(?P<login_pass>[^@\r\n\t ]+))?)[@])?(?P<host>[^:/@\r\n\t ]+)(?P<optional_port>(?:[:](?P<port>\d+))?)(?P<path>/[^\r\n\t ]*)?)
用手写直接写是很难写出这么复杂的正则的,写出来也很难调试,而组合方式构建正则的话,可以将小的简单正则提前测试好,要用的时候再组装起来,就不容易出错,上面就是组装替换后的结果。
下面用里面的 url 这个规则来匹配一下:
# 使用规则 "url" 进行匹配
pattern = m['url']
s = re.match(pattern, 'https://name:pass@www.baidu.com:8080/haha')
# 打印完整匹配结果
print('matched: "%s"'%s.group(0))
print()
# 打印分组匹配结果
for name in ('url', 'login_name', 'login_pass', 'host', 'port', 'path'):
print('subgroup:', name, '=', s.group(name<