python学习 Day08 字符串和正则表达式

最新推荐文章于 2024-07-22 21:38:50 发布

「已注销」

最新推荐文章于 2024-07-22 21:38:50 发布

阅读量156

点赞数

分类专栏： python

本文链接：https://blog.csdn.net/wozhomouren/article/details/119086885

版权

python 专栏收录该内容

18 篇文章 0 订阅

订阅专栏

python学习 Day08

16 字符串

1.拼接字符串+

2.计算字符串长度：len()

sr = 'hello世界'
print(len(sr))
print(len(sr.encode('utf-8')))
print(len(sr.encode('gbk')))
##
7
11
9

3.字符串切片

4.分割字符串

sr = 'hello 世界 你好'
sr1 = 'hello,世界,你好'
print(sr.split())
print(sr1.split(','))
##
['hello', '世界', '你好']
['hello', '世界', '你好']

5.检索字符串

count()

find()

index()

sr = 'hello,hi,world'
print(sr.count(','))	#返回出现的次数
print(sr.find(','))		#存在返回索引
print(sr.index(','))	#存在返回索引
print(sr.find('a'))		#不存在返回-1		
print(sr.index('a'))	#不存在则报错

6.大小写转换

str1 = 'Hllo,World'
print(str1.lower())
print(str1.upper())
#hello,world
#HELLO,WORLD

7.去掉空格

str1 = "  he llo "
print(str1.strip())
print(str1.lstrip())
print(str1.rstrip())
str2 = "helloh"
print(str2.strip('h'))
#he llo
#he llo 
#  he llo
#ello

8.替换

sr=" x hel lo , wor ld! "
print(sr.replace(' ',''))

9.格式化

sr1 = "hello,{:s}"
print(sr1.format('图图'))

print("asd{}".format([]))
sr ='hello'

print(sr)
#hello,图图
#asd[9.0]
#hello

17 正则表达式

正则：是专门用来处理字符串的一个规则

常用正则：

^开头：

$结尾：

.表示任意一个字符：除了\r\n

*表示0个或多个

+表示1个或多个

？是0个或1个

{n}表示重复n次

{n,}表示至少重复n次

{n,m}表示重复m-n次

[xyz]表示取x或y或z

[0-9]表示任意一个数字

[A-Z]表示取反

\u4e00-\u9fa5表示取任意一个汉字

\s 匹配空格，换行符\n，换页符\f，制表符\t

\d 匹配数字

\w 匹配数字，单词字符，下划线，数字，汉字也算

re模块

re.match()

使用match()方法匹配，从字符串开始匹配成功返回

str1 = "hello,world"
str2 = "world,hello"
str3 = "hello"
print(re.match(r"hello.*",str1).group())
print(re.match(r"hello.*",str2))
print(re.match(r"hello.*",str3))
#
hello,world
None
<re.Match object; span=(0, 5), match='hello'>

re.search()方法进行匹配：在整个字符串中搜索第一个匹配的值

re.findall()方法进行匹配：在整个给定字符串，搜索所有符合正则规则的字符串

import re
str1 = "hello,py_css,py_html,python,ty_java"
print(re.search(r"py\w+",str1).group())
print(re.findall(r"py\w+",str1))
##
py_css
['py_css', 'py_html', 'python']

re.sub()替换字符串

import　re
str1 = "恭喜13311321322的用户，喜中双色球大奖"
pattern = r"1[3456789]\d{9}"
str1 = re.sub(pattern,"1xxxxxxxxxx",str1)
print(str1)
##
#恭喜1xxxxxxxxxx的用户，喜中双色球大奖

使用split()分割字符串

import re
str1 = "php@python?js!java!css#html"
print(re.split('[!#@?]',str1))
#['php', 'python', 'js', 'java', 'css', 'html']

练习题：
练习题①：电话号中间4位变成* sr=“恭喜13888889999用户喜中500万大奖”

import re
sr="恭喜13888889999用户喜中500万大奖"
res = re.search(r'1[3456789]\d{9}',sr).group()
res = res[0:3]+'****'+res[8:]
res = re.sub(r'1[3456789]\d{9}',res,sr)
print(res)

练习题②：ls= ['123我','_咋hello','这php','么','不帅789','呢???']

ls= ['123我','_咋hello','这php','么','不帅789','呢???']
res = re.sub('[^\u4e00-\u9fa5]','',str(ls)).replace('不','')
print(res)

练习题③：把域名替换成https://localhost/

import re
img = (
    ['<img src="http://www.baidu.com/a.jpg" width="250" height="100" />'],
    ['<img src="http://www.taobao.com/b.jpg" width="250" height="100" />'],
    ['<img src="http://www.zijie.com/c.jpg" width="250" height="100" />']
)
for i in range(len(img)):
     print(re.sub(r'https?://.*\.[a-z]{1,}/','https://localhost/',img[i][0]))

练习题④：将字符串sr去掉所有的标点符号 sr="目前这些病例都打过疫苗吗？杨毅表示，这些病例中，绝大部分都接种过疫苗，只有一例不满18岁青年人没有接种。根据在广东、瑞丽疫情中的观察，接种过疫苗的病例在总体上症状都比较轻，转为重型病例的几率是明显比较低的，病程是比较短的。所以说疫苗接种还是有保护作用的，呼吁大家平时做好科学防护，打了疫苗还是要坚持科学预防，比如少去公众场所、坚持戴口罩、保持社交距离等。"

res = re.sub("[^\u4e00-\u9fa5\w]","",sr)
print(res)

练习题⑤：将诗词内容重新排序，并输出 sr="12.一代天骄，成吉思汗，只识弯弓射大雕。9.须晴日，看红装素裹，分外妖娆。2.千里冰封，5.惟余莽莽；3.万里雪飘，4.望长城内外，6.大河上下，1.北国风光，7.顿失滔滔。8.山舞银蛇，原驰蜡象，欲与天公试比高。11.惜秦皇汉武，略输文采；唐宗宋祖，稍逊风骚。13.俱往矣，数风流人物，还看今朝。10.江山如此多娇，引无数英雄竞折腰。"

sr="12.一代天骄，成吉思汗，只识弯弓射大雕。9.须晴日，看红装素裹，分外妖娆。2.千里冰封，5.惟余莽莽；3.万里雪飘，4.望长城内外，6.大河上下，1.北国风光，7.顿失滔滔。8.山舞银蛇，原驰蜡象，欲与天公试比高。11.惜秦皇汉武，略输文采；唐宗宋祖，稍逊风骚。13.俱往矣，数风流人物，还看今朝。10.江山如此多娇，引无数英雄竞折腰。"
res = re.findall('\d+[^\d]*',sr)
list1 = ['' for i in range(len(res))]
for i in res:
    list1[int(re.search(r'\d+',i).group())-1] = i
s=''
for i in list1:
    s+=i
print(s)