字符串和文本
使用多个分隔符分割字串
使用正则re.split()
方法。
>>> line = 'asdf fjdk; afed, fjek,asdf, foo'
>>> import re
>>> re.split(r'[;,\s]\s*', line)
['asdf', 'fjdk', 'afed', 'fjek', 'asdf', 'foo']
字符串开头结尾匹配
>>> filename = 'spam.txt'
>>> filename.endswith('.txt')
True
>>> filename.startswith('file:')
False
多种匹配可能,传入元组作为参数。
字符串匹配和搜索
使用正则。
>>> datepat = re.compile(r'\d+/\d+/\d+')
>>> if datepat.match(text1):
... print('yes')
... else:
... print('no')
...
yes
字符串搜索和替换
>>> text = 'yeah, but no, but yeah, but no, but yeah'
>>> text.replace('yeah', 'yep')
'yep, but no, but yep, but no, but yep'
>>> text = 'Today is 11/27/2012. PyCon starts 3/13/2013.'
>>> import re
>>> re.sub(r'(\d+)/(\d+)/(\d+)', r'\3-\1-\2', text)
'Today is 2012-11-27. PyCon starts 2013-3-13.'
忽略大小写的搜索替换
>>> text = 'UPPER PYTHON, lower python, Mixed Python'
>>> re.findall('python', text, flags=re.IGNORECASE)
['PYTHON', 'python', 'Python']
>>> re.sub('python', 'snake', text, flags=re.IGNORECASE)
'UPPER snake, lower snake, Mixed snake'
正则的贪婪匹配
正则的*操作符是贪婪的,加?即可转为最短匹配模式。
多行匹配
正则的.是不能匹配换行符的,或通过re.compile(r'.', re.DOTALL)让.匹配任意字符。
删除不需要的字符
strip()删除开头或结尾的字符,无参数则删空格,对应的还有lstrip和rstrip方法,它不能删除字串中间的字符,只能通过replace方法来替换掉。
字符串对齐
ljust()左对齐,rjust()右对齐,center()居中。
>>> text = 'Hello World'
>>> text.ljust(20)
'Hello World '
>>> text.rjust(20)
' Hello World'
>>> text.center(20)
' Hello World '
合并拼接字串
join()方法。
指定列宽格式化字符串
import textwrap
s = "Look into my eyes, look into my eyes, the eyes, the eyes, \
the eyes, not around the eyes, don't look around the eyes, \
look into my eyes, you're under."
print(textwrap.fill(s, 70))
print(textwrap.fill(s, 40, initial_indent=' '))
print(textwrap.fill(s, 40, subsequent_indent=' '))