【Python基础】字符串及正则表达式-CSDN博客

本文链接：https://blog.csdn.net/qq_45695920/article/details/148125282

【Python基础】字符串及正则表达式

本章目标
一、字符串的常用操作
二、格式化字符串的三种方式
三、字符串的编码和解码
四、数据的验证
五、数据的处理
六、正则表达式

本章目标

掌握字符串的常用操作
熟练格式化字符串的使用
掌握字符串的编码和解码
掌握数据的验证
掌握数据的处理
掌握正则表达式的使用

一、字符串的常用操作

字符串是Python中的不可变数据类型

方法名	描述说明
str.lower()	将str字符串全部转成小写字母，结果为一个新的字符串
str.upper()	将str字符串全部转成大写字母，结果为一个新的字符串
str.split(sep=None)	把str按照指定的分隔符sep进行分隔，结果为列表类型
str.count(sub)	结果为sub这个字符串在str中出现的次数
str.find(sub)	查询sub这个字符串在str中是否存在，如果不存在结果为-1，如果存在结果为sub首次出现的索引
str.index(sub)	功能与find()相同，区别在于要查询的字串sub不存在时，程序报错
str.startswith(s)	查询字符串str是否以字串s开头
str.endswith(s)	查询字符串str是否以字串s结尾

实例代码：

#大小写转换
s1='HelloWorld'
new_s2=s1.lower()
print(s1,new_s2)

new_s3=s1.upper()
print(new_s3)

#字符串的分隔
e_mail='cmm@179.com'
lst=e_mail.split('@')
print('邮箱名:',lst[0],'邮箱服务器域名:',lst[1])

print(s1.count('o')) #o在字符串s1中出现了两次

#检索操作
print(s1.find('o')) #o在字符串s1中首次出现的位置
print(s1.find('p')) # -1,没有找到

print(s1.index('o'))
#print(s1.index('p'))  #ValueError: substring not found 字串没有找到

print(s1.startswith('H')) #True
print(s1.startswith('P'))  #False
print('demo.py'.endswith('.py')) #True
print('text.txt'.endswith('.txt'))  #True

方法名	描述说明
str.replace(old,news)	使用news替换字符串s中所有的old字符串，结果是一个新的字符串
str.center(width,fillchar)	字符串str在指定的宽度范围内居中，可以使用fillchar进行填充
str.join(iter)	在iter中的每个元素的后面都增加一个新的字符串str
str.strip(chars)	从字符串中去掉左侧和右侧chars中列出的字符串
str.lstrip(chars)	从字符串中去掉左侧chars中列出的字符串
str.rstrip(chars)	从字符串中去掉右侧chars中列出的字符串

实例代码：

s1='HelloWorld'
#字符串的替换
new_s=s1.replace('o','你好',1) #最后一个参数是替换次数，默认是全部替换
print(new_s)

#字符串在指定的宽度范围内居中
print(s1.center(20))
print(s1.center(20,'*'))

#去掉字符串左右的空格
s='    hello    world   '
print(s.strip())   #去除字符串左侧和右侧的空格
print(s.lstrip())  #去除字符串左侧的空格
print(s.rstrip())  #去除字符串右侧的空格

#去掉指定的字符
s3='dl-Helloworld'
print(s3.strip('ld')) #与顺序无关
print(s3.lstrip('ld'))
print(s3.rstrip('ld'))

二、格式化字符串的三种方式

占位符

%s:字符串格式
%d: 十进制整数格式
%f: 浮点数格式

f-string

Python3.6引入的格式化字符串方式，比{}标明被替换的字符

str.format()方法

模板字符串.format(逗号分隔的参数)

#使用占位符进行格式化
name='马冬梅'
age=18
score=98.5
print('姓名:%s,年龄:%d,成绩:%f'%(name,age,score))
print('姓名:%s,年龄:%d,成绩:%.1f'%(name,age,score))

#f-string
print(f'姓名:{name},年龄:{age},成绩:{score}')

#使用字符串format方法
print('姓名:{0},年龄:{1},成绩:{2}'.format(name,age,score))
print('姓名:{2},年龄:{0},成绩:{1}'.format(age,score,name))

格式化字符串的详细格式

:	填充	对齐方式	宽度	,	.精度	类型
引导符号	用于填充单个字符	<左对齐 >右对齐 ^居中对齐	字符串的输出宽度	数字的千位分隔符	浮点数小数部分的精度或字符串的最大输出长度	整数类型：b\d\o\x\X 浮点数类型:e\E\f%

实例代码：

s='helloworld'
print('{0:*<20}'.format(s)) #字符串的显示宽度为20，左对齐，空白部分使用*号填充
print('{0:*>20}'.format(s))
print('{0:*^20}'.format(s))

#居中对齐
print(s.center(20,'*'))

#千位分隔符（只适用于整数和浮点数）
print('{0:,}'.format(9852221564))
print('{0:,}'.format(9852221564.5842))

#浮点数小数部分的精度
print('{0:.2f}'.format(3.1415926))

#字符串类型 .表示是最大的显示长度
print('{0:.5}'.format('helloworld')) #hell

#整数类型
a=425
print('二进制:{0:b},十进制:{0:d},八进制:{0:x},十六进制:{0:X}'.format(a))

#浮点数类型
b=3.1415926
print('{0:.2f},{0:.2E},{0:.2e},{0:.2%}'.format(b))

三、字符串的编码和解码

字符串的编码
将str类型转换成bytes类型，需要使用到字符串的encode()方法

str.encode(encodeing='utf-8',errors='strict/ignore/replace')
#在进行编码的过程遇到错误时，
#ignore:忽略
#strict表示严格的，遇到转不了的字符，程序直接抛错
#replace表示替换

字符串的解码
将bytes类型转换成str类型，需要使用到bytes类型的decode()方法

str.decode(encodeing='utf-8',errors='strict/ignore/replace')

实例代码：

s='伟大的中国梦'
#编码str-->bytes
scode=s.encode(errors='replace') #默认是utf-8，因为utf-8中文占3个字节
print(scode)

scode_gbk=s.encode('gbk',errors='replace') #gbk中中文占2个字符
print(scode_gbk)

#编码中的出错问题
s2='✌️'
scode1=s2.encode('gbk',errors='replace')
#scode2=s2.encode('gbk',errors='strict')  #编码错误 UnicodeEncodeError: 'gbk' codec can't encode character '\u270c' in position 0: illegal multibyte sequence
print(scode1)
#print(scode2)

#解码过程bytes-->str
print(bytes.decode(scode_gbk,'gbk'))
print(bytes.decode(scode,'utf-8'))

四、数据的验证

数据的验证是指程序对用户输入的数据进行"合法"性验证

方法名	描述说明
str.isdigit()	所有字符都是数字（阿拉伯数字）
str.isnumerci()	所有字符都是数字
str.isalpha()	所有字符都是字母（包含中文字符）
str.isalnum()	所有字符都是数字或字母（包含中文字符）
str.islower()	所有字符都是小写
str.isupper()	所有字符都是大写
str.istitle()	所有字符都是首字母大写
str.isspace()	所有字符都是空白字符（\t \n等）

实例代码：

#isdigit()十进制的阿拉伯数字
print('123'.isdigit())  #True
print('一二三'.isdigit())  #False

#所有字符都是数字
print('123'.isnumeric()) #True
print('一二三'.isnumeric())
print('壹贰叁肆'.isnumeric()) #True

#所有字符都是字母（包含中文字符）
print('hello你好'.isalpha()) #True
print('hello你好123'.isalpha())  #False
print('hello你好一二三'.isalpha()) #True

print('-'*40)
#所有字符都是数字或字母
print('hello你好'.isalnum()) #True
print('hello你好123'.isalnum())  #True
print('hello你好一二三'.isalnum()) #True
print('#'*40)
#判断字符的大小写
print('helloWorld'.islower()) #False
print('helloworld'.islower()) #True
print('hello你好'.islower()) #True
print('#'*40)
#判断字符的大小写
print('helloWorld'.isupper()) #False
print('helloworld'.isupper()) #False
print('hello你好'.isupper()) #False

#所有字符都是首字母大写
print('Hello'.istitle())  #True
print('HelloWorld'.istitle())  #False
print('Helloworld'.istitle())  #True
print('Hello World'.istitle()) #True
print('hello World'.istitle())  #False

#判断是否都是空白字符
print('-'*40)
print('\t'.isspace())  #True
print(' '.isspace())  #True
print('\n'.isspace())  #True

五、数据的处理

字符串拼接的几种方式

使用str.join()方法进行拼接字符串
直接拼接
使用格式化字符串进行拼接

实例代码：

s1='hello'
s2='world'
#使用+进行拼接
print(s1+s2)

#使用字符串的join()方法
print(' '.join([s1,s2])) #使用空字符串进行拼接
print('*'.join(['hello','world','python'])) #使用*进行拼接

#直接拼接
print('hello''world')

print('-'*40)
#使用格式化字符串进行拼接
print('%s%s' % (s1,s2))
print(f'{s1}{s2}')
print('{0}{1}'.format(s1,s2))

字符串的三种去重操作

s='helloworldhelloworldhelloworld'
#字符串拼接集not in
new_s=''
for item in s:
    if item not in new_s:
        new_s+=item #拼接操作
print(new_s)

#使用索引+not in
new_s2=''
for i in range(len(s)):
    if s[i] not in new_s2:
        new_s2+=s[i]
print(new_s2)

#通过集合去重+列表排序
new_s3=set(s)
lst=list(new_s3)
lst.sort(key=s.index)  #列表排序操作
print(''.join(lst))

六、正则表达式

正则表达式就是符合一定规则的表达式，是用于匹配字符串中字符组合的模式。
使用单个字符串来描述，匹配一系列匹配某个句法规则的字符串。
在很多文本编辑器里，正则表达式通常被用来检索、替换那些匹配某个模式的文本（字符串）

元字符：

具有特殊意义的专用字符；例如"^“和”$"分别表示匹配的开始和结束

元字符	描述说明	举例	结果
.	匹配任意字符（除\n）	’ p\nytho\tn’	p、y、t、h、o、\t、n
\w	匹配字母、数字、下划线	’ python\n123’	p、y、t、h、o、n、1、2、3
\W	匹配非字母、数字、下划线	‘python\n123’	\n
\s	匹配任意空白字符	‘pytho\t123’	\t
\S	匹配任意非空白字符	‘pytho\t123’	p、y、t、h、o、n、1、2、3
\d	匹配任意十进制数	‘pytho\t123’	1、2、3

限定符

限定符指定输入中必须存在字符、组或字符类的多少个实例才能找到匹配项
用于限定匹配的次数

限定符	描述说明	举例	结果
?	匹配前面的字符0次或1次	colou?r	可以匹配color 或 colour
+	匹配前面的字符1次或多次	colou+r	可以匹配colour 或 colouu…r
*	匹配前面的字符0次或多次	colou*r	可以匹配colour 或 colouu…r
{n}	匹配前面的字符n次	colou{n}r	可以匹配colouur
{n,}	匹配前面的字符最少n次	colou{2,}r	可以匹配colour 或 colouu…r
{n,m}	匹配前面的字符最小n次，最多m次	colou{2,4}r	可以匹配colour 或 colouuur或colouuuur

其他字符

其他字符	描述说明	举例	结果
区间字符[]	匹配[]中所指定的字符	[.?!] [0-9]	匹配标点符号点、问号、感叹号匹配0、1、2、3、4、5、6、7、8、9
排除字符^	匹配不在[]中指定的字符	[^0-9]	匹配除0、1、2、3、4、5、6、7、8、9
选择字符\|	用于匹配 \| 左右的任意字符	\d{18}\|\d{15}	匹配15位身份证或18位身份证
转义字符	同Python中的转义字符	\.	将 . 作为普通字符使用
[\u4e00-\u9fa5]	匹配任意一个汉字
分组	改变限定符的作用	six\|fourth (six\|four)th	匹配six或fourth 匹配six或fourth

re模块

Python中的内置模块，用于实现Python中的正则表达式操作

函数	功能描述
re.match(pattern,string,flags=0)	用于从字符串开始的位置进行匹配，如果起始位置匹配成功，结果为match对象，否则结果为None
re.search(pattern,string,flags=0)	用于在整个字符串中搜索第一个匹配的值，如果匹配成功，结果为match对象，否则结果为None
re.findall(pattern,string,flags=0)	用于在整个字符串中搜索所有符合正则表达式的值，结果是一个列表类型
re.sub(pattern,repl,string,count,flags=0)	用于实现对字符串中指定字串的替换
re.split(pattern,repl,string,count,flags=0)	字符串中的split()方法功能相同，都是分隔字符串

match函数的使用：

import re #导入re模块
pattern='\d\.\d+' # +限定符,\d 0-9数字出现1次或多次
s='I study Python 3.10 every day' #待匹配字符
match=re.match(pattern,s,re.I)
print(match) #None

s2='3.11Python I student every day'
match2=re.match(pattern,s2)
print(match2)  #<re.Match object; span=(0, 4), match='3.11'>

print('匹配值的起始位置:',match2.start())
print('匹配值的结束位置:',match2.end())
print('匹配区间的位置元素:',match2.span())
print('待匹配的字符串:',match2.string)
print('匹配的数据:',match2.group())

search函数的使用：

import re #导入re模块
pattern='\d\.\d+' # +限定符,\d 0-9数字出现1次或多次
s='I study Python 3.10 every day Python2.8 I love you'
match=re.search(pattern,s)

s2='4.10 Python I student every day'
s3='Python I student every day'
match2=re.search(pattern,s2)
match3=re.search(pattern,s3) #None
print(match)
print(match2)
print(match3)

print(match.group())
print(match2.group())

search函数的使用

import re #导入re模块
pattern='\d\.\d+' # +限定符,\d 0-9数字出现1次或多次
s='I study Python 3.10 every day2.4' #待匹配字符
s2='4.10 Python I student every day'
s3='Python I student every day'
lst=re.findall(pattern,s)
lst2=re.findall(pattern,s2)
lst3=re.findall(pattern,s3)
print(lst)
print(lst2)
print(lst3)

sub函数和split函数的使用：

import re
pattern='黑客|反爬|破解'
s='我想学习Python,想破解一些VIP视频，Python可以实现无底线反爬吗？'
#sub函数的使用
new_s=re.sub(pattern,'xxx',s)
print(new_s)

#split函数的使用
s2='https://cn.bing.com/search?q=python&form=ANNTH1&refig=6831ce0247514059bd2a8691834662b7&pc=CNNDDB&adppc=EDGEESS'
pattern1='[?|&]'
lst=re.split(pattern1,s2)
print(lst)