Python Cookbook学习笔记ch2_01

最新推荐文章于 2023-08-06 13:53:49 发布

MrUncle德鲁

最新推荐文章于 2023-08-06 13:53:49 发布

阅读量265

点赞数 1

分类专栏： Python Cookbook

本文链接：https://blog.csdn.net/FANGLICHAOLIUJIE/article/details/82112823

版权

Python Cookbook 专栏收录该内容

28 篇文章 0 订阅

订阅专栏

第二章，也可以去这里查看笔记噢虫洞在这里

2.1使用多个界定符分隔字符串

问题：将一个字符串分隔为多个字段，但是分隔符并不是固定的
方案：string对象的split() 只适用于简单的字符串分隔，他不允许有多个分隔符或者分隔符周围不确定的空格。可以使用re.split()

line = 'asdf fsff; frf, dfsfe,asd. daffpp'
import re
re.split(r'[.;,\s]\s*',line)

['asdf', 'fsff', 'frf', 'dfsfe', 'asd', 'daffpp']

当使用re.split() 时需要注意正则表达式是否包含一个括号捕获分组。如果有，则被匹配的文本也会出现在结果列表中

fields = re.split(r'(;|,|\s)\s*',line)
fields

['asdf', ' ', 'fsff', ';', 'frf', ',', 'dfsfe', ',', 'asd.', ' ', 'daffpp']

如果你不想保留分隔字符串到结果中去，但仍需使用分组正则表达式，确保你的分组是非捕获分组，如（？：…）

re.split(r'(?:,|;|\s)\s*',line)

['asdf', 'fsff', 'frf', 'dfsfe', 'asd.', 'daffpp']

2.2字符串开头或结尾匹配

问题：需要指定文本模式去检查字符串的开头或者结尾，比如文件名后缀
方案：使用str.startswith()或者str.endswith()

filename = 'spam.txt'
filename.startswith('file:')

False

filename.endswith('.txt')

True

url = 'http://www.python.org'
url.startswith('http://')

True

如果想检查多种匹配可能，只需要将所有的匹配项放入到一个元祖中去即可

filenames = ['Makefile','foo.c','bar.py','spam.c','sapm.h']
[name for name in filenames if name.endswith(('.c','.h'))]

['foo.c', 'spam.c', 'sapm.h']

any(name.endswith('.py') for name in filenames)

True

from urllib.request import urlopen
def read_data(name):
    if name.startswith(('http:','https','ftp')):
        return urlopen(name).read()
    else:
        with open(name) as f:
            return f.read()

注意：上述两个方法中必须传入一个元祖作为参数，如果传入的是list或者其他的，需要首先调用tuple() 函数进行转换。

choices = ['http:','ftp:']
url = 'http://www.python.org'
url.startswith(choices)

---------------------------------------------------------------------------

TypeError                                 Traceback (most recent call last)

<ipython-input-23-78cd8b4bba7d> in <module>()
      1 choices = ['http:','ftp:']
      2 url = 'http://www.python.org'
----> 3 url.startswith(choices)


TypeError: startswith first arg must be str or a tuple of str, not list

url.startswith(tuple(choices))

True

startswith（）和endswith()也可以由切片来完成

filename = 'spam.txt'
filename[-4:] == '.txt'

True

也可以使用正则表达式实现

import re 
url = 'http://www.pyhton.org'
re.match('http:|https:|ftp:',url)

<_sre.SRE_Match object; span=(0, 5), match='http:'>

2.3使用shell通配符匹配字符串

问题：想使用 Unix Shell 中常用的通配符 (比如 .py , Dat[0-9].csv 等) 去匹配文本字符串
方案：fnmatch模块提供了两个函数：fnmatch()和 fnmatchcase()

from fnmatch import fnmatch,fnmatchcase
fnmatch('foo.txt','*.txt')

True

fnmatch('foo.txt','?oo.txt')

True

fnmatch('Dat45.csv','Dat[0-9]*')

True

names = ['Dat1.csv','Dat2.csv','config.ini','foo.py']
[name for name in names if fnmatch(name,'Dat[1-9].csv')]
#[name for name in names if fnmatch(name,'Dat*.csv')]

['Dat1.csv', 'Dat2.csv']

fnmatch()使用底层操作系统的大小写敏感规则,根据您的操作系统会有区别。如果您的系统是敏感的，该函数也是敏感的

fnmatch('foo.txt','.Txt')

False

fnmatchcase()函数可以代替，它完全使用你的模式大小写去匹配

fnmatchcase('foo.txt','.Txt')

False

addresses = [
'5412 N CLARK ST',
'1060 W ADDISON ST',
'1039 W GRANVILLE AVE',
'2122 N CLARK ST',
'4802 N BROADWAY',
]
from fnmatch import fnmatchcase
[addr for addr in addresses if fnmatchcase(addr,'*ST')]

['5412 N CLARK ST', '1060 W ADDISON ST', '2122 N CLARK ST']

[addr for addr in addresses if fnmatchcase(addr,'54[0-9][0-9] *CLARK*')]

['5412 N CLARK ST']

2.4字符串匹配和搜索

问题：需要匹配或者搜索特定模式的文本
方案：可以使用str.find()和str.endswith()和str.startswith() 函数

text = 'yeah, but no, but yeah,but no,but yeah'
text == 'yeah'

False

text.startswith('yeah')

True

text.endswith('no')

False

find() 函数会返回搜索文本第一次出现的位置

text.find('no')

对于复杂的匹配则需要使用正则表达式和re模块
\d+ 是指匹配一个或者多个数字

text1 = '11/27/2012'
text2 = 'Nov 27,2012'
import re
if re.match(r'\d+/\d+/\d+',text1):
    print('yes')
else:
    print('no')

yes

if re.match(r'\d+/\d+/\d+',text2):
    print('yes')
else:
    print('no')

no

如果想使用同一个模式去匹配多次，应该先将匹配模式字符串编译为模式对象
match() 函数总是从字符串的开始去匹配，如果像查找字符串的任意位置可以使用findall()方法

datepat = re.compile(r'\d+/\d+/\d+')
if re.match(data,text1):
    print('yes')
else:
    print('no')

yes

text = 'Today is 11/23/2018.Pycon starts 3/13/2019'
datepat.findall(text)

['11/23/2018', '3/13/2019']

在定义正则式的时候常会使用括号去分组捕获。因为分组捕获使得后面的处理更加简单，可以分别将每个组的内容提取出来

datepat2 = re.compile(r'(\d+)/(\d+)/(\d+)')
m = datepat2.match('11/23/2018')

<_sre.SRE_Match object; span=(0, 10), match='11/23/2018'>

m.group(0)

'11/23/2018'

m.group(1)

'11'

m.group(2)

'23'

m.group(3)

'2018'

m.groups()

('11', '23', '2018')

month,day,year = m.groups()
year

'2018'

findall()会搜素文本并以list 的形式返回匹配的结果。

text = 'Today is 11/23/2018.Pycon starts 3/13/2019'
datepat3 = re.compile(r'(\d+)/(\d+)/(\d+)')
datepat3.findall(text)

['11/23/2018', '3/13/2019']

for month,day,year in datepat3.findall(text):
    print('{}-{}-{}'.format(year,month,day))

2018-11-23
2019-3-13

使用re模块的基本方法是：先使用re.compile() 编译正则表达式字符串，然后使用match()、findall()或者finditer() 方法
match() 函数从字符串开始的地方匹配，但他的结果有可能不是期望的

datepat = re.compile(r'(\d+)/(\d+)/(\d+)')
m = datepat.match('11/27/2012asdafa')
m

<_sre.SRE_Match object; span=(0, 10), match='11/27/2012'>

m.group()

'11/27/2012'

如果需要精确匹配可以在正则表达式的末尾加上 $

datepat = re.compile(r'(\d+)/(\d+)/(\d+)$')
#不会有任何输出
datepat.match('11/27/2012asdafa')

datepat.match('11/27/2012')

<_sre.SRE_Match object; span=(0, 10), match='11/27/2012'>

2.5字符串搜索与替换

问题：想在字符串中搜索制定的模式并替换
方案：直接使用str.replace()

text = 'yeah,but no,yeah,but no,but yeah'
text.replace('yeah','yea')

'yea,but no,yea,but no,but yea'

对于更为复杂的可以使用re模块的sub()函数
*sub() 函数中的第一个参数是被匹配的模式，第二个参数是替换模式。反斜杠数字比如 \3 指向前面模式的捕获组号。

text = 'today is 11/27/2012.PyCon starts 3/13/2013'
import re
re.sub(r'(\d+)/(\d+)/(\d+)',r'\3-\1-\2',text)

'today is 2012-11-27.PyCon starts 2013-3-13'

如果想要多次匹配相同的模式，可以使用先编译来提升性能

text = 'today is 11/27/2012.PyCon starts 3/13/2013'
import re
datepat = re.compile(r"(\d+)/(\d+)/(\d+)")
datepat.sub(r'\3-\1-\2',text)

'today is 2012-11-27.PyCon starts 2013-3-13'

对于更复杂的，可以传递一个替换回调函数

from calendar import month_abbr
def change_date(m):
    mon_name = month_abbr[int(m.group(1))]
    return '{} {} {}'.format(m.group(2),mon_name,m.group(3))

text = 'today is 11/27/2012.PyCon starts 3/13/2013'
datepat.sub(change_date,text)

'today is 27 Nov 2012.PyCon starts 13 Mar 2013'

如果除了想替换并且有多少地方发生了替换可以使用subn()

text = 'today is 11/27/2012.PyCon starts 3/13/2013'
new_text,n = datepat.subn(r'\3-\1-\2',text)

new_text

'today is 2012-11-27.PyCon starts 2013-3-13'

2.5字符串忽略大小写的搜索替换

问题：以忽略大小写的方式进行搜索替换
方案：可以使用re模块的时候提供一个参数re.IGNORECASE

text = 'UPDATE PYTHON,lower python,Mixed Python'
re.findall('python',text,flags=re.IGNORECASE)

['PYTHON', 'python', 'Python']

re.sub('python','snake',text,flags=re.IGNORECASE)

'UPDATE snake,lower snake,Mixed snake'

MrUncle德鲁

关注

1
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
Python Cookbook学习笔记ch2_01

第二章，也可以去这里查看笔记噢虫洞在这里2.1使用多个界定符分隔字符串问题：将一个字符串分隔为多个字段，但是分隔符并不是固定的方案：string对象的split() 只适用于简单的字符串分隔，他不允许有多个分隔符或者分隔符周围不确定的空格。可以使用re.split()line = 'asdf fsff; frf, dfsfe,asd. daffpp'import rere...
复制链接

扫一扫

专栏目录