python 爬虫之Re库

最新推荐文章于 2022-09-23 11:42:46 发布

李小渣加油鸭~

最新推荐文章于 2022-09-23 11:42:46 发布

阅读量212

点赞数

分类专栏： python

本文链接：https://blog.csdn.net/routing666/article/details/97539290

版权

python 专栏收录该内容

13 篇文章 0 订阅

订阅专栏

正则表达式

正则表达式的特点：

Re库

（减少斜杠的输出）

re.search()

>>> import re
>>> match = re.search(r'[1-9]\d{5}','BIT 100081')
>>> if match:
...     print(match.group(0))
...
100081
>>>

re.match()

>>> match = re.match(r'[1-9]\d{5}','BIT 100081')
>>> if match:
...     print(match.group(0))
...

>>> print(match.group(0))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: 'NoneType' object has no attribute 'group'
>>>

【没有匹配到，match为空】

>>> match = re.match(r'[1-9]\d{5}','100081 BIT')
>>> if match:
...     print(match.group(0))
...
100081
>>>

re.findall

re.split()

>>> import re
>>> re.split(r'[1-9]\d{5}','BIT100081 TSU100084')
['BIT', ' TSU', '']
>>>

>>> re.split(r'[1-9]\d{5}','BIT100081 TSU100084',maxsplit=1)
['BIT', ' TSU100084']
>>>

re.finditer()

>>> for m in re.finditer(r'[1-9]\d{5}','BIT100081 TSU100084'):
...     if m :
...             print(m.group(0))
...
100081
100084
>>>

【迭代】

re.sub()

>>> import re
>>> re.sub(r'[1-9]\d{5}',':zipcode','BIT100081 TSU100084')
'BIT:zipcode TSU:zipcode'
>>>

re.compile()

（使用这种方法时不需要再提供正则表达式）

match对象

>>> import re
>>> m = re.search(r'[1-9]\d{5}','BIT100081 TSU100084')
>>> m.string
'BIT100081 TSU100084'
>>> m.re
re.compile('[1-9]\\d{5}')
>>> m.pos
0
>>> m.endpos
19
>>> m.group(0)
'100081'
>>> m.start()
3
>>> m.end()
9
>>> m.span()
(3, 9)
>>>