Python字符串-B08

最新推荐文章于 2023-02-08 16:30:41 发布

小螺丝2021

最新推荐文章于 2023-02-08 16:30:41 发布

阅读量203

点赞数 1

分类专栏：小螺丝的学习笔记

本文链接：https://blog.csdn.net/wwd2021/article/details/119180642

版权

小螺丝的学习笔记专栏收录该内容

13 篇文章 0 订阅

订阅专栏

字符串的定义

所谓字符串，就是由零个或多个字符组成的有限序列 。在Python程序中，通常把单个或多个字符用单引号或者双引号包围起来，就可以表示一个字符串。字符串中的字符可以是特殊符号、英文字母、中文字符、日文的平假名或片假名、希腊字母、Emoji字符等。

a='hello,world'
b="hello,world"
# 以三个双引号或单引号开头的字符串可以折行
c='''
窗前明月光，
疑是地上霜。
'''

转义字符和原始字符

可以在字符串中使用\（反斜杠）来表示转义，也就是说\后面的字符不再是它原来的意义，例如：\n不是代表反斜杠和字符n，而是表示换行；\t也不是代表反斜杠和字符t，而是表示制表符。

所以如果字符串本身又包含了'、"、\这些特殊的字符，必须要通过\进行转义处理。

例如要输出一个带单引号或反斜杠的字符串，需要用如下所示的方法。

a='\'hello,world\''
b="\"hello,world\""
print(a)            # 'hello,world'
print(b)            # "hello,world"

原始字符串，每个字符都是它原始的含义，没有转移字符，Python中的字符串可以r或R开头。

d = r'c:\Users\Administrator\abc\hello.py'
print(d)              # c:\Users\Administrator\abc\hello.py

# 带占位符的字符串（格式化字符串）
e = f'文件路径: {d}'
print(e)              # 文件路径: c:\Users\Administrator\abc\hello.py

此外，两种输出

# `\`后面还可以跟一个八进制或者十六进制数来表示字符
s1 = '\141\142\143\x61\x62\x63'  # 前面3个八进制，后面3个十六进制
print(s1)                        # abcabc
# `\u`后面跟Unicode字符编码
s2 = '\u9a86\u660a'
print(s2)

字符串的运算

a='hello, world'      # world前有空格

# 重复运算
print(a * 5)          # hello, worldhello, worldhello, worldhello, worldhello, world  
# 成员运算
print('or' in a)      # True
print('wd' in a)      # False

b = 'hello, World'    # World前有空格且W大写

# 比较运算（比较字符串的内容 ---> 字符编码大小）
print(a == b)         # False    
print(a != b)         # True

c = 'goodbye, world'

print(b > c)          # True

d = 'hello, everybody'

print(b >= d)         # False
print(ord('W'), ord('e'))       # 87 101
# 字符串的拼接
e = '!!!'
print(d + e)          # hello, everybody!!!

# 索引和切片
a = 'hello, world'

print(a[0], a[-len(a)])         # h h
print(a[len(a) - 1], a[-1])     # d d
print(a[5], a[-7])              # , ,

print(a[2:5])                   # llo
print(a[1:10:2])                # el,wr
print(a[::-1]) ---->反转        # dlrow ,olleh

# 获取字符串的长度
print(len(a))  # 12 (因为有空格，所以长度是12)

# 循环遍历字符串的每个字符
# 遍历方式1
for i in range(len(a)):
    print(a[i])
# 遍历方式2
for i in a:
    print(i)

字符串的操作

大小写相关操作

a=i LOVE you
# 转大写
print(a.upper())         # I LOVE YOU
# 转小写
print(a.lower())         # i love you
# 首字母大写
print(a.capitalize())    # I love you
# 每个单词首字母大写
print(a.title())         # I Love You

性质判断

b='abc123'

# 判断字符串是不是数字
print(b.isdigit())                # False
# 判断字符串是不是字母
print(b.isalpha())                # False     
# 判断字符串是不是字母和数字
print(b.isalnum())                # True
# 判断字符串是不是ASII码字符
print(b.isascii())                # True

c='你好呀'
print(c.isascii())                # False 

# 判断字符串是否以指定内容开头
print(c.startswith('你好'))        # True
# 判断字符串是否以指定内容结尾
print(c.endswith('啊'))            # False

查找操作

在一个字符串中从前向后查找有没有另外一个字符串，可以使用字符串的find或index方法。

a = 'Oh apple, i love apple.'
# index - 从左向右寻找指定的子串（substring），可以指定从哪开始找，默认是0
# 找到了返回子串对应的索引（下标），找不到直接报错（程序崩溃）
print(a.index('apple'))                   # 3
print(a.index('apple', 10))               # 17
# rindex -从右向左找，找到了返回子串对应的索引（下标，此标也是从左到右数的）
print(a.rindex('apple'))                  # 17
# 找不到直接报错（程序崩溃）
print(a.index('banana'))                  # 保错 ValueError: substring not found

print(a.find('apple'))                    # 3
print(a.find('apple', 10))                # 17
print(a.rfind('apple'))                   # 17
# 与index 不同的是 find 找不到直接报时输出-1
print(a.find('banana'))                   # -1
print(a.rfind('banana'))                  # -1

格式化字符串

python中居中、左对齐和右对齐的处理：

# 居中
print(a.center(20, '~'))      # ~~~~~hello,word~~~~~
# 右对齐
print(a.rjust(20, '~'))       # ~~~~~~~~~~hello,word
# 左对齐
print(a.ljust(20, '~'))       # hello,word~~~~~~~~~~

b = '123'
# 零填充(在左边补零)
print(b.zfill(6))             # 000123

字符串的格式化输出

c = 1234
d = 123
# 方式1
print('%d+%d=%d' % (c, d, c + d,))
# 方式2：Python3.6以后引入的格式化字符串便捷语法
print(f'{c}+{d}={c + d}')
# 方式3
print('{}+{}={}'.format(c, d, c + d))
# 方式4
print('{0}+{1}={2}'.format(c, d, c + d))

若需进一步控制格式化语法中变量值的形式，可以参照下面的表格来进行字符串格式化操作。

字符串的修剪

content = '   马某某是个二货   '

# 修剪字符串左右两端的空格
print(content.strip())        # 马某某是个二货
# 修剪字符串左端的空格
print(content.lstrip())       # 马某某是个二货   
# 修剪字符串右端的空格
print(content.rstrip())       #    马某某是个二货

字符串的替换

# replace把里面的字符串替换为其他的
# print(content.strip()) # 去掉两端空格  
print(content.strip().replace('马','*'))    # 张某某是个*

字符串的拆分与合并

content = 'you go your way, I will go mine.'
content2 = content.replace(',', '').replace('.', '')
# 用空格拆分字符串得到一个列表
words = content2.split()
print(words, len(words)) # ['you', 'go', 'your', 'way', 'I', 'will', 'go', 'mine'] 8
# 用空格拆分字符串，最多允许拆分3次
words = content2.split(' ', maxsplit=3)
print(words, len(words))             # ['you', 'go', 'your', 'way I will go mine'] 4
# 从有向左进行字符串拆分，最多允许拆分3次
words = content2.rsplit(' ', maxsplit=3)
print(words, len(words))             # ['you go your way I', 'will', 'go', 'mine'] 4
# 用逗号拆分字符串
items = content.split(',')
for item in items:
    print(item)                      # you go your way                                                                   *                                      I w ill go mine.

contents = [
    '请不要相信我的美丽',
    '更不要相信我的爱情',
    '因为在涂满油彩的面孔下',
    '有着一颗戏子的心'
]
# 将列表中的元素用指定的字符串连接起来
print(' '.join(contents))   
# 输出结果：请不要相信我的美丽 更不要相信我的爱情 因为在涂满油彩的面孔下 有着一颗戏子的心

字符串的编码和解码

两种方式： str(字符串)--->encode()--->bytes(字节串)

bytes(字节串)--->decode()--->str(字符串)

要点：

1、选择字符集(编码)的时候，最佳的选择(也是默认的)是UTF-8编码。

2、编码和解码的字符集要保持一致，否则就会出现乱码现象。

3、不能用ISO-8859-1编码保存中文，否则就会出现编码黑洞，中文变成?

4、UTF-8是Unicode的一种实现方案，也一种变长的编码，最少1个字节（英文和数字），最多 4个字节（Emoji），表示中文用3个字节。

a = '我爱你中国亲爱的母亲'
# GBK <--- GB2312 <---- ASCII
# UTF-8编码是Unicode（万国码）的一种实现方案
b = a.encode('utf-8')
print(type(b))
print(b, len(b))
c = b'\xe6\x88\x91\xe7\x88\xb1\xe4\xbd\xa0\xe4\xb8\xad\xe5\x9b\xbd\xe4\xba\xb2\xe7\x88\xb1\xe7\x9a\x84\xe6\xaf\x8d\xe4\xba\xb2'
print(c.decode('utf-8'))      # 我爱你中国亲爱的母亲
# 如果编码和解码的方式不一致，Python中可能会产生UnicodeDecodeError异常
# 也有可能会出现乱码现象
print(c.decode('gbk'))        # 鎴戠埍浣犱腑鍥戒翰鐖辩殑姣嶄翰

凯撒密码：通过对应字符的替换，实现对明文进行加密的一种方式。

message = 'attack at dawn.'
# 生成字符串转换的对照表
table = str.maketrans(
    'abcdefghijklmnopqrstuvwxyz',
    'defghijklmnopqrstuvwxyzabc'
)
# 通过字符串的translate方法实现字符串转译
print(message.translate(table))                 # dwwdfn dw gdzq