python 字符串和re库简介

最新推荐文章于 2024-07-09 10:42:47 发布

hellyou

最新推荐文章于 2024-07-09 10:42:47 发布

阅读量331

点赞数 2

分类专栏： python 杂项文章标签： re 字符串 python

本文链接：https://blog.csdn.net/hellyou/article/details/102750230

版权

杂项同时被 2 个专栏收录

10 篇文章 0 订阅

订阅专栏

python

8 篇文章 0 订阅

订阅专栏

字符串的常见操作

1.去除两边的空格
str.strip(a):删除字符串两头的空格，返回一个字符串，并不对a做改变

a=" re "
b=a.strip()
print(a)
print(b)
>>> re 
>>>re

2.连接字符串
不建议直接用“+” ，这种方式是先开辟一个大内存，再把各部分字符串复制过去
推荐使用.join()函数，功能如下

def join(iterable)
S.join(iterable) -> str

Return a string which is the concatenation of the strings in the iterable. The separator between elements is S

3.查找子串
.index()方法和.find()方法功能一样，只是当查找不到时，find返回-1，index引发一个ValueError

def index(sub, start, end)
S.index(sub[, start[, end]]) -> int

Like S.find() but raise ValueError when the substring is not found.

def find(sub, start, end)
S.find(sub[, start[, end]]) -> int

Return the lowest index in S where substring sub is found, such that sub is contained within S[start:end]. Optional arguments start and end are interpreted as in slice notation.
Return -1 on failure.

6.可以直接用"<" “>” “<=” ">="进行比较大小，判断包含关系可以直接用 in 关键字
7.转换大小写，分别是.upper() & lower()
8.统计字符串
.count() 输入子串，可选的参数时起始点和终点，和切片（slice）一样，不包括终点

def count(sub, start, end)
S.count(sub[, start[, end]]) -> int

Return the number of non-overlapping occurrences of substring sub in string S[start:end]. Optional arguments start and end are interpreted as in slice notation.

9.替换字符串
.repalce(), 源字符串不变，返回一个替换的字符串。参数均是字符串

replace(old, new, count)
S.replace(old, new[, count]) -> str

Return a copy of S with all occurrences of substring
old replaced by new.  If the optional argument count is
given, only the first count occurrences are replaced.

re库

提到re，熟练使用的前提是掌握正则表达式，参加下面的教程

正则表达式 by :菜鸟教程

下面开始分析re库
re库主要包含以下函数

This module exports the following functions:
    match     Match a regular expression pattern to the beginning of a string.
    fullmatch Match a regular expression pattern to all of a string.
    search    Search a string for the presence of a pattern.
    sub       Substitute occurrences of a pattern found in a string.
    subn      Same as sub, but also return the number of substitutions made.
    split     Split a string by the occurrences of a pattern.
    findall   Find all occurrences of a pattern in a string.
    finditer  Return an iterator yielding a Match object for each match.
    compile   Compile a pattern into a Pattern object.
    purge     Clear the regular expression cache.
    escape    Backslash all non-alphanumerics in a string.

我们挑常用的了解

 match(pattern, string, flags=0)：
 从string的开头匹配pattern，匹配到一个就返回match对象，匹配不到返回None 
 
fullmatch(pattern, string, flag=0):
和match一样，只不过它不是找到一个就停，它会找到所有的匹配

search(pattern, string, flag=0):
找第一个匹配的位置，返回match对象，找不到返回None

sub(pattern, repl, string, count=0, flags=0):
找到匹配pattern的地方，用repl代替，返回一个替换过的字符串
subn()与sub()不同的是返回2-tuple (str,number),number是替换的次数

split(pattern, string, maxsplit=0, flags=0):
用pattern匹配的字符串当作分隔符，返回分隔符分割得到的列表

findall(pattern, string, flags=0):
返回一个匹配到字符串的列表

compile(pattern, flags=0):
"Compile a regular expression pattern, returning a Pattern object."
编译一个正则表达式，返回一个Pattern对象

下面说一下上述最后一个例子中的Pattern对象
Pattern对象的方法基本和上面相似，只是参数中少了pattern，用法如下，可举一反三

pattern="\b[0-9]+?"
strings=" 1happy  day"
cpm=re.compile(pattern)
a=cpm.search(strings)
b=cpm.findall(strings)

关于Match对象，常用的方法是 .group(),返回所有的组，也可以传入组号或者组名，返回指定的组。关于组的概念，可以参考正则表达式，这里不再叙述。其他方法还有start(),end(),span(),分别是开始、结束的位置和起始点，均可以传入可选的参数–组号。

最后说一下上面函数中的flag
re.I 忽略大小写
re.M 使用本标志后，‘^’和‘$’匹配行首和行尾时，会增加换行符之前和之后的位置。
re.S 使 “.” 特殊字符完全匹配任何字符，包括换行
re.X 冗余模式，此模式忽略正则表达式中的空白和#号的注释

hellyou

关注

2
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫

专栏目录