python学习之正则表达式应用_输入一段文字,利用正则表达式,查找文字中是否存在aptho.bpython-CSDN博客

本文链接：https://blog.csdn.net/hey_girl_/article/details/49205359

1.在一段字符中找出tip 或 top

import re
st = "top tip taq twp tep"
res = r"t[io]p"
print re.findall(res,st)

输出：[‘top’, ‘tip’]

2.在一段字符在找出‘t?p’(‘?’表示除了’i’或’o’以外的任意字符)

import re
st = "top tip taq twp tep"
res = r"t[^io]p"
print re.findall(res,st)

输出：[‘twp’, ‘tep’]

3.判断字符串s是否是以hello开头

import re
s = "hello world,hello boy"
r = r"^hello"
print re.findall(r,s)

输出为：[‘hello’]

4.判断字符串s是否是以boy开头

import re
s = "hello world,hello boy"
r = r"boy$"
print re.findall(r,s)

输出为：[‘boy’]

5.匹配电话010-开头后面跟着八个数字

import re
s = "010-77189021"
r = r"^010-\d{8}$"
print re.findall(r,s)

输出为:[‘010-77189021’]

6.a后面至少一个b

import re
s = "abbbbbbb"
s1 = "a"
r = r"ab+"
print re.findall(r,s)

输出为：
[‘abbbbbbb’]
[]

7.a后面至少零个b

import re
s = "abbbbbbb"
s1 = "a"
r = r"ab*"
print re.findall(r,s)
print re.findall(r,s1)

[‘abbbbbbb’]
[‘a’]

8.a后面有一个或没有b

import re
s = "abbbbbbb"
s1 = "a"
r = r"ab?"
print re.findall(r,s)
print re.findall(r,s1)

输出为：
[‘ab’]
[‘a’]

9.a后面b的个数非贪婪（如果多个之匹配一个）

import re
s = "abbbbbbb"
r = r"ab+?"
print re.findall(r,s)

输出：[‘ab’]
其他的一些正则：
这里写图片描述

闲来无聊，附加一个爬虫。
匹配百度主页的所有汉字：

import re
import urllib
import urllib2

def get_html(url):
    request = urllib2.Request(url)
    response = urllib2.urlopen(request)
    html = response.read()
    return html

def get_china(url):
    html = unicode(get_html(url),'utf8')

    r = ur'[\u4e00-\u9fa5]+'  #ur，u表示unicode编码，r表示原始字符没有变化
    china = re.findall(r,html)

    return china

china = get_china("http://www.baidu.com")

for c in china:
    print c