【经验】Python 编码中容易犯的错误-CSDN博客

本文链接：https://blog.csdn.net/alun550/article/details/136333470

【经验】Python 编码中容易犯的错误

字符串拼接
numpy
除法取整和取余
路径操作
格式化字符串

字符串拼接

当需要进行大量的字符串拼接时，使用 io.StringIO 而非直接对字符串进行操作，每次对字符串进行 "+=" 之类的操作时，都会在内存中开辟一块新的地址来存放数据，这会涉及到大量的内存申请、释放操作。而使用 StringIO 则是通过不断扩大 buffer 来存放更多字符串

from time import perf_counter

string = ""
s1 = perf_counter()
for i in range(100000):
    string += f"* Some String {i} *"
e1 = perf_counter()

from io import StringIO

string_io = StringIO()
s2 = perf_counter()
for i in range(100000):
    string_io.write(f"* Some String {i} *")
s = string_io.getvalue()
e2 = perf_counter()

print(f"{e1 - s1:f}")
>>> 15.629244
print(f"{e2 - s2:f}")
>>> 0.021929

numpy

避免数据与 numpy.array 的多次变换，numpy.array 是一个非常耗时的操作，如下方法对一亿个数字取最大值

直接对 list 使用 max 函数 — 耗时 0.8 秒
转换为 numpy.array 再使用 np.max 函数 — 耗时 4.017 + 0.022 秒
转换为 numpy.array 再使用 max 函数 — 耗时 4.017 + 5.592 秒

import numpy as np

nums = list(range(10000 ** 2))
array = np.array(nums)  # cost 4.017426 s
m1 = max(nums)  # cost 0.800190s
m2 = np.max(array)  # cost 0.022275 s
m3 = max(array)  # 5.592225 s

除法取整和取余

当需要对两个数字即取整又取余的时候，可以使用 divmod() 函数来代替整除和取余的两个操作

a, b = 10 // 3, 10 % 3
print(a, b)

a, b = divmod(10, 3)
print(a, b)

路径操作

直接使用字符串拼接或者 fstring 的方式很容易出错，建议使用 pathlib 进行路径处理

原始方式

path = "/path/to/datas/file.json"
zipped_file = path.removesuffix(".json") + ".zip"
data_dir = "/".join(path.split("/")[:-1])
other_file = f"{data_dir}/other_file.txt"
deeper_dir = f"{data_dir}/abc/def"

建议方式

from pathlib import Path

path = Path("/path/to/datas/file.json")
zipped_file = path.with_suffix(".zip")
data_dir = path.parent
other_file = data_dir.with_name("other_file.txt")
deeper_dir = data_dir.joinpath("abc", "def")

格式化字符串

在格式化字符串中使用没必要的操作

建议：在使用格式化字符串时，尽量使格式化字符串的语法糖，避免多余操作，fstring 本身性能比大多数操作好得多

pi = 3.1415926
print(f"Pi is {round(pi, 2)}")  # 没有必要
print(f"Pi is {pi:.2f}")  # 直接使用格式化字符串的语法糖

如下直接使用 fstring 进行格式化，比进行 round() 操作要快四倍以上
在这里插入图片描述