按顺序检查每个文件名以查找下一个可用的文件名对于少量文件来说工作正常,但随着文件数的增加,速度会很快变慢。
以下是在日志(n)时间中查找下一个可用文件名的版本:import os
def next_path(path_pattern):
"""
Finds the next free path in an sequentially named list of files
e.g. path_pattern = 'file-%s.txt':
file-1.txt
file-2.txt
file-3.txt
Runs in log(n) time where n is the number of existing files in sequence
"""
i = 1
# First do an exponential search
while os.path.exists(path_pattern % i):
i = i * 2
# Result lies somewhere in the interval (i/2..i]
# We call this interval (a..b] and narrow it down until a + 1 = b
a, b = (i // 2, i)
while a + 1 < b:
c = (a + b) // 2 # interval midpoint
a, b = (c, b) if os.path.exists(path_pattern % c) else (a, c)
return path_pattern % b
为了衡量速度的提高,我编写了一个创建10000个文件的小测试函数:for i in range(1,10000):
with open(next_path('file-%s.foo'), 'w'):
pass
并实现了朴素的方法:def next_path_naive(path_pattern):
"""
Naive (slow) version of next_path
"""
i = 1
while os.path.exists(path_pattern % i):
i += 1
return path_pattern % i
结果如下:
快速版本:real 0m2.132s
user 0m0.773s
sys 0m1.312s
天真版:real 2m36.480s
user 1m12.671s
sys 1m22.425s
最后,请注意,如果多个参与者试图同时创建序列中的文件,则这两种方法都容易受到竞争条件的影响。