【03】字符串和编码

最新推荐文章于 2022-11-07 15:31:30 发布

sun_apollo

最新推荐文章于 2022-11-07 15:31:30 发布

阅读量80

点赞数

分类专栏： Python

本文链接：https://blog.csdn.net/sun_apollo/article/details/118085491

版权

Python 专栏收录该内容

8 篇文章 0 订阅

订阅专栏

本文详细讲解了Python中字符串的Unicode编码特性，介绍了ASCII、Unicode和UTF-8编码的区别，以及如何进行编码转换，包括len()函数、replace()、ord()和chr()函数的应用，以及字符串格式化输出的不同方式如%操作符、format()函数和f-string。

摘要由CSDN通过智能技术生成

Unicode编码

内存中统一使用Unicode编码
但是为了解决空间，在存储文件、网络传输、网页展示时使用UTF-8编码

字符	ASCII	Unicode	UTF-8
A	01000001	01000001 01000001	01000001
中	无	01001110 00101101	11100100 10111000 10101101

python字符串

在最新的Python 3版本中，字符串是以Unicode编码的

字符串格式化输出

# 方式1，%分隔符
# %x 表示用十六进制整数替换
# %d 表示用整数替换
# %f 表示用浮点数替换
# %s 表示用字符串替换
# %? 占位符
>>> 'Hello, %s' % 'world'
'Hello, world'
>>> 'Hi, %s, you have $%d.' % ('Michael', 1000000)
'Hi, Michael, you have $1000000.'
# %s永远有效
>>> 'Age: %s. Gender: %s' % (25, True)
'Age: 25. Gender: True'
# 用%转义%
>>> 'growth rate: %d %%' % 7
'growth rate: 7 %'

# 方式2：format()函数
# 用传入的参数依次替换字符串内的占位符{0}、{1}……
# .2f代表保留2位小数
>>> 'Hello, {0}, 成绩提升了 {1:.2f}%'.format('小明', 17.12345)
'Hello, 小明, 成绩提升了 17.12%'
# 大括号内的 key 可以是有意义的文字，参数是 key=value
>>> "{name}'s age is {age}".format( name='小明', age=18 )
"小明's age is 18"
# 大括号中也可不设定数字，按默认顺序
>>>"{} {}".format("hello", "world")    # 不设置指定位置，按默认顺序
'hello world'

# 方式3：f-string
# 以f开头的字符串，称之为f-string，它和普通字符串不同之处在于，字符串如果包含{xxx}，就会以对应的变量替换
>>> r = 2.5
>>> s = 3.1415926 * r ** 2
>>> print(f'The area of a circle with radius {r} is {s:.2f}')
The area of a circle with radius 2.5 is 19.62

编码转换

# len()函数 计算字符串长度
>>> name = 'gxj'
>>> len(name)
3

# 字符串替换字符
>>> newname = name.replace('g', 'G')
>>> newname
'Gxj'
>>> name
'gxj'


# ord()函数用以获取字符的Unicode整数表示
>>> ord('A')
65
>>> ord('中')
20013
>>> 
>>> bin(ord('中'))
'0b100111000101101'

# chr()函数，用以把Unicode编码转为字符 
>>> chr(65)
'A'
>>> chr(20013)
'中'

# Unicode十六进制 转 字符/字符串
# 通过格式  \u+十六进制Unicode
>>> '\u4e2d\u6587'
'中文'

# Unicode字符串 和 单字节字符
>>> 'ABC'  #Unicode编码，每个字符占用两个字节

>>> b'ABC'  #ASCII编码，每个字符占用一个字节

# 通过encode()函数 编码成 单字节字符串
# Unicode字符串 -> ASCII字符串
>>> 'ABC'.encode("ascii")
b'ABC'
# Unicode字符串 -> UTF-8字符串
>>> '中文'.encode('utf-8')
b'\xe4\xb8\xad\xe6\x96\x87'

# 将单字节字符串 解码成 Unicode
>>> b'\xe4\xb8\xad\xe6\x96\x87'.decode('utf-8')
'中文'
>>> b'ABC'.decode("ascii")
'ABC'


# 统计字符串个数
>>> len("ABC")
3

# 为了统一编码在py文件的头需要 写
# 也就是 UTF-8 without BOM
#!/usr/bin/env python3
# -*- coding: utf-8 -*-