字符串的常见操作
1.去除两边的空格
str.strip(a):删除字符串两头的空格,返回一个字符串,并不对a做改变
a=" re "
b=a.strip()
print(a)
print(b)
>>> re
>>>re
2.连接字符串
不建议直接用“+” ,这种方式是先开辟一个大内存,再把各部分字符串复制过去
推荐使用.join()函数,功能如下
def join(iterable)
S.join(iterable) -> str
Return a string which is the concatenation of the strings in the iterable. The separator between elements is S
3.查找子串
.index()方法和.find()方法功能一样,只是当查找不到时,find返回-1,index引发一个ValueError
def index(sub, start, end)
S.index(sub[, start[, end]]) -> int
Like S.find() but raise ValueError when the substring is not found.
def find(sub, start, end)
S.find(sub[, start[, end]]) -> int
Return the lowest index in S where substring sub is found, such that sub is contained within S[start:end]. Optional arguments start and end are interpreted as in slice notation.
Return -1 on failure.
6.可以直接用"<" “>” “<=” ">="进行比较大小,判断包含关系可以直接用 in 关键字
7.转换大小写,分别是.upper() & lower()
8.统计字符串
.count() 输入子串,可选的参数时起始点和终点,和切片(slice)一样,不包括终点
def count(sub, start, end)
S.count(sub[, start[, end]]) -> int
Return the number of non-overlapping occurrences of substring sub in string S[start:end]. Optional arguments start and end are interpreted as in slice notation.
9.替换字符串
.repalce(), 源字符串不变,返回一个替换的字符串。参数均是字符串
replace(old, new, count)
S.replace(old, new[, count]) -> str
Return a copy of S with all occurrences of substring
old replaced by new. If the optional argument count is
given, only the first count occurrences are replaced.
re库
提到re,熟练使用的前提是掌握正则表达式,参加下面的教程
正则表达式 by :菜鸟教程
下面开始分析re库
re库主要包含以下函数
This module exports the following functions:
match Match a regular expression pattern to the beginning of a string.
fullmatch Match a regular expression pattern to all of a string.
search Search a string for the presence of a pattern.
sub Substitute occurrences of a pattern found in a string.
subn Same as sub, but also return the number of substitutions made.
split Split a string by the occurrences of a pattern.
findall Find all occurrences of a pattern in a string.
finditer Return an iterator yielding a Match object for each match.
compile Compile a pattern into a Pattern object.
purge Clear the regular expression cache.
escape Backslash all non-alphanumerics in a string.
我们挑常用的了解
match(pattern, string, flags=0):
从string的开头匹配pattern,匹配到一个就返回match对象,匹配不到返回None
fullmatch(pattern, string, flag=0):
和match一样,只不过它不是找到一个就停,它会找到所有的匹配
search(pattern, string, flag=0):
找第一个匹配的位置,返回match对象,找不到返回None
sub(pattern, repl, string, count=0, flags=0):
找到匹配pattern的地方,用repl代替,返回一个替换过的字符串
subn()与sub()不同的是返回2-tuple (str,number),number是替换的次数
split(pattern, string, maxsplit=0, flags=0):
用pattern匹配的字符串当作分隔符,返回分隔符分割得到的列表
findall(pattern, string, flags=0):
返回一个匹配到字符串的列表
compile(pattern, flags=0):
"Compile a regular expression pattern, returning a Pattern object."
编译一个正则表达式,返回一个Pattern对象
下面说一下上述最后一个例子中的Pattern对象
Pattern对象的方法基本和上面相似,只是参数中少了pattern,用法如下,可举一反三
pattern="\b[0-9]+?"
strings=" 1happy day"
cpm=re.compile(pattern)
a=cpm.search(strings)
b=cpm.findall(strings)
关于Match对象,常用的方法是 .group(),返回所有的组,也可以传入组号或者组名,返回指定的组。关于组的概念,可以参考正则表达式,这里不再叙述。其他方法还有start(),end(),span(),分别是开始、结束的位置和起始点,均可以传入可选的参数–组号。
最后 说一下 上面函数中的flag
re.I 忽略大小写
re.M 使用本标志后,‘^’和‘$’匹配行首和行尾时,会增加换行符之前和之后的位置。
re.S 使 “.” 特殊字符完全匹配任何字符,包括换行
re.X 冗余模式, 此模式忽略正则表达式中的空白和#号的注释