本练习是使用指定数目的进程把一个指定文件分割成指定数目的子文件并且可以使用分割信息文件和分割信息文件中指定子文件合并为一个指定名称的文件:
python源代码:
import os
from multiprocessing import Process
from time import sleep
class Split:
"""
这个类可以分割一个指定文件为指定数目的子文件并且记录一个分割信息文件,并且可以利用分割信息和对应的子文件组合为一个指定名称的文件。
"""
def __init__(self, filename=None, num=None):
"""
构造函数,指定需要分割的文件名和分割文件的数目
:param filename: 指定需要分割的文件名
:param num: 分割文件的数目
"""
self.filename = filename
self.num = num
@property
def filename(self):
return self.__filename
@filename.setter
def filename(self, filename):
if isinstance(filename, str) or filename is None:
self.__filename = filename
@property
def num(self):
return self.__num
@num.setter
def num(self, num):
if isinstance(num, int) and num > 1 or num is None:
self.__num = num
def is_file(self):
"""
测试文件名对应的文件是否存在,以及是否是文件并且可以读访问
:return:如果文件存在,并且可以读访问,则返回True,否则False
"""
if self.filename is not None and os.path.isfile(self.filename) and os.access(self.filename, os.R_OK):
return True
else:
return False
def calc_size(self):
"""
如果文件存在,并且可以读访问,且分割数目大于1,根据文件大小和分割数目计算子文件的尺寸,返回尺寸,如果条件不满足,返回-1
:return:
"""
if not self.is_file() or self.num is None or self.num <= 1:
return -1
return os.path.getsize(self.filename) // self.num
def split(self):
"""
计算每个子文件起始位置在原文件中的偏移量,以及每个子文件的大小,并且用每个子文件的偏移量和尺寸启动一个进程从原文件中复制
:return:
"""
if not self.is_file():
print("%s not exist or can not be read;splited part %d " % (self.filename, self.num))
return
size = self.calc_size()
if size < 0:
print("filename %s or splitted part %d error!" % (self.filename, self.num))
return
offsets = []
sizes = []
for i in range(self.num - 1):
offsets.append(i * size)
sizes.append(size)
offsets.append(len(offsets) * size)
sizes.append(os.path.getsize(self.filename) - len(sizes) * size)
# print("offset:", offsets)
# print("size:", sizes)
tasks = []
files = []
for i in range(self.num):
ext = ".part_" + str(i + 1)
process = Process(target=self.copy_part, args=(offsets[i], sizes[i], self.filename, ext))
tasks.append(process)
files.append(self.filename + ext)
process.start()
for task in tasks:
task.join()
# print(files)
print("split file (%s) to %s" % (self.filename, files))
self.write_info("split_info.txt", files)
def write_info(self, filename, lst_info):
"""
把分解信息写入到一个文件中
:param filename: 记录分割信息的文件名
:param lst_info: 分割信息列表
:return: 无
"""
file = open(filename, "w")
for info in lst_info:
file.write(info + "\n")
file.close()
def read_info(self, filename):
"""
读入记录分割信息文件,把子文件名记录到一个列表中,返回
:param filename:
:return:
"""
file = open(filename, "r")
files = []
filename = file.readline()
while filename != "":
files.append(filename.split('\n')[0])
filename = file.readline()
return files
def copy_part(self, offset, size, filename, ext):
"""
从根据偏移量从原文件中复制指定尺寸到一个新建文件中
:param offset:
:param size:
:param filename:
:param ext:子文件的后缀名
:return:
"""
fd_r = os.open(filename, os.O_RDONLY)
fd_w = os.open(filename + ext, os.O_WRONLY | os.O_CREAT)
os.lseek(fd_r, offset, 0)
times = size // 1024
last = size % 1024
time = 0
while time < times:
bs = os.read(fd_r, 1024)
os.write(fd_w, bs)
time += 1
bs = os.read(fd_r, last)
os.write(fd_w, bs)
os.close(fd_r)
os.close(fd_w)
def merge_part(self, filename, offset, size, file_part):
"""
把file_part指定的文件复制到filename指定文件偏移量offset的位置处
:param filename:
:param offset:
:param size:
:param file_part:
:return:
"""
fd_w = os.open(filename, os.O_WRONLY)
fd_r = os.open(file_part, os.O_RDONLY)
os.lseek(fd_w, offset * size, 0)
bs = os.read(fd_r, 1024)
while len(bs) > 0:
os.write(fd_w, bs)
bs = os.read(fd_r, 1024)
os.close(fd_w)
os.close(fd_r)
def merge(self, filename, part_filename_info):
"""
根据记录子文件名信息的文件名,把这些文件的内容复制到filename指定的文件中
:param filename:
:param part_filename_info:
:return:
"""
fd_w = os.open(filename, os.O_WRONLY | os.O_CREAT)
os.close(fd_w)
files = self.read_info(part_filename_info)
size = os.path.getsize(files[0])
# print(files, size)
tasks = []
for i in range(len(files)):
process = Process(target=self.merge_part, args=(filename, i, size, files[i]))
tasks.append(process)
process.start()
for task in tasks:
task.join()
print("merge %s into file %s" % (files, filename))
if __name__ == "__main__":
"""
分割测试
"""
sp = Split(filename="exer10.py", num=4)
sleep(5)
sp.split()
sleep(5)
"""
合并测试
"""
sp.merge("merge.py", "split_info.txt")
执行程序前目录结构:
测试结果:
分割:
从exer10.py复制出5个子文件exer10.py.part_1', 'exer10.py.part_2', 'exer10.py.part_3', 'exer10.py.part_4'以及一个记录分割信息的文件split_info.txt,其内容为:
exer10.py.part_1 exer10.py.part_2 exer10.py.part_3 exer10.py.part_4
D:\PycharmProjects\pythonstart\venv\Scripts\python.exe D:/PycharmProjects/pythonstart/lession3/spliter.py
split file (exer10.py) to ['exer10.py.part_1', 'exer10.py.part_2', 'exer10.py.part_3', 'exer10.py.part_4']
merge ['exer10.py.part_1', 'exer10.py.part_2', 'exer10.py.part_3', 'exer10.py.part_4'] into file merge.py
Process finished with exit code 0
合并:
用分割文件合并后,出现一个名为:merge.py的文件,其内容与exer10.py内容完全一致。