Python 字符串和字节串的区别

深海蓝河

已于 2024-06-22 12:20:44 修改

阅读量278

点赞数 3

分类专栏： Python 文章标签： python

于 2024-06-22 12:16:59 首次发布

本文链接：https://blog.csdn.net/qq_38413468/article/details/139880422

版权

Python 专栏收录该内容

2 篇文章 0 订阅

订阅专栏

区别

字符串采用 Unicode 进行编码，一个字符占 1-4 个字节。
- UTF-8: 可变长度编码，每个字符占用 1 到 4 个字节。
- UTF-16: 每个字符占用 2 或 4 个字节。
- UTF-32: 每个字符固定占用 4 个字节。
字节串采用 ACSII 进行编码，一个字符占一字节。

定义

定义字符串对象

s = u'hello,world'
s = 'hello,world' # 默认为字符串

定义字节串对象

s = b'hello, world'

转换方式

从字节串到字符串, 使用 encode 方法

# 字符串编码为字节串
s = "hello, world"
b = s.encode()  # 默认使用 'utf-8' 编码
print(b)  # 输出: b'hello, world'

从字符串到字节串，使用 decode 方法

# 字节串解码为字符串
b = b"hello, world"
s = b.decode()  # 默认使用 'utf-8' 编码
print(s)  # 输出: hello, world

对读写文件的影响

一般情况下，只要文件中的字符可以用 ASCII 码表示，不会产生影响。

读取文件

当从文件中读取数据时，如果文件是以文本模式打开的，Python 会自动进行解码。默认情况下，使用系统默认编码（通常是 utf-8）。

# 读取文件内容
with open('example.txt', 'r', encoding='utf-8') as f:
    content = f.read()
    print(content)

如果文件使用的编码与 open 函数的 encoding 参数不匹配，会导致 UnicodeDecodeError。
使用默认的 ‘strict’ 错误处理模式，如果文件包含无法解码的字节，会抛出异常。

可以通过指定 errors 参数来处理解码错误：

with open('example.txt', 'r', encoding='utf-8', errors='ignore') as f:
    content = f.read()
    print(content)  # 忽略无法解码的字节

with open('example.txt', 'r', encoding='utf-8', errors='replace') as f:
    content = f.read()
    print(content)  # 将无法解码的字节替换为 `�`

【注】查看某个文件的编码，可以使用 file 命令:

$file example.txt 
example.txt: Unicode text, UTF-8 text, with no line terminators

写入文件

当向文件中写入数据时，如果文件是以文本模式打开的，Python 会自动进行编码。默认情况下，使用系统默认编码（通常是 utf-8）。

# 写入文件内容
with open('example.txt', 'w', encoding='utf-8') as f:
    f.write("hello, world")

示例代码

读取文件

# 创建一个包含特殊字符的文件
with open('example.txt', 'w', encoding='utf-8') as f:
    f.write("hello, world 😊")

# 读取文件内容，默认使用 utf-8 编码
with open('example.txt', 'r', encoding='utf-8') as f:
    content = f.read()
    print(content)  # 输出: hello, world 😊

# 读取文件内容，使用错误处理模式 'ignore'
with open('example.txt', 'r', encoding='ascii', errors='ignore') as f:
    content = f.read()
    print(content)  # 输出: hello, world 

# 读取文件内容，使用错误处理模式 'replace'
with open('example.txt', 'r', encoding='ascii', errors='replace') as f:
    content = f.read()
    print(content)  # 输出: hello, world ?

写入文件

# 尝试写入无法编码的字符，使用 ascii 编码和 'ignore' 错误处理
with open('example.txt', 'w', encoding='ascii', errors='ignore') as f:
    f.write("hello, world 😊")  # 忽略无法编码的字符
with open('example.txt', 'r', encoding='ascii') as f:
    print(f.read())  # 输出: hello, world 

# 尝试写入无法编码的字符，使用 ascii 编码和 'replace' 错误处理
with open('example.txt', 'w', encoding='ascii', errors='replace') as f:
    f.write("hello, world 😊")  # 将无法编码的字符替换为 '?'
with open('example.txt', 'r', encoding='ascii') as f:
    print(f.read())  # 输出: hello, world ?