python 文件覆盖_知道了这些,您可以使用Python覆盖99%的文件操作

python 文件覆盖

Working with files is one of the most common tasks we do every day. Python has several built-in modules for performing file operations, such as reading files, moving files, getting file attributes, etc. This article summarizes many functions that you need to know to cover the most common file operations and good practices in Python.

Here is a graph of modules/functions you will see in this article. To know more about each operation, please continue reading.

Image for post
Created by Xiaoxu Gao

打开和关闭文件 (Open & Close a file)

When you want to read or write a file, the first thing to do is to open the file. Python has a built-in function open that opens the file and returns a file object. The type of the file object depends on the mode in which the file is opened. It can be a text file object, a raw binary file, and a buffered binary file. Every file object has methods such as read() and write().

There is a problem in this code block, can you recognize it? We will discuss it later.

file = open("test_file.txt","w+")
file.write("a new line")

Python documentation has listed all the possible file modes. The most common modes are listed in the table. An important rule is that any w related mode will first truncate the file if it exists and then create a new file. Be careful with this mode if you don’t want to overwrite the file and use a append mode if possible.

The problem in the previous code block is that we only opened the file, but didn’t close it. It’s important to always close the file when working with files. Having an open file object can cause unpredictable behaviors such as resource leak. There are two ways to make sure that a file is closed properly.

  1. Use close()

The first way is to explicitly use close(). A good practice is to put it in finally, so that we can make sure the file will be closed in any case. It brings more clarity to the code, but on the other hand, the developer should take responsibility and not forget to close it.

file = open("test_file.txt","w+")
file.write("a new line")
exception Exception as e:

2. Use context manager with open(...) as f

The second way is to use a context manager. If you are not familiar with the context manager, then check out Context Managers and the “with” Statement in Python by Dan Bader. with open() as f statement implements __enter__ and __exit__ methods to open and close the file. Besides, it encapsulates try/finally statement in the context manager, which means we will never forget to close the file.

with open("test_file","w+") as file:
file.write("a new line")

Is this context manager solution always better than close()? It depends on where you use it. The following example implements 3 different ways of writing 50,000 records to a file. As you can see from the output, use_context_manager_2() function has extremely low performance compared to the others. This is because with statement is in a separate function, it basically opens and closes the file for each record. Such expensive I/O operation influences the performance tremendously.

这个上下文管理器解决方案是否总是比close() ? 这取决于您在哪里使用它。 以下示例实现了将50,000条记录写入文件的3种不同方式。 从输出中可以看到, use_context_manager_2()函数与其他函数相比性能极低。 这是因为with语句在单独的函数中,它基本上为每个记录打开和关闭文件。 这种昂贵的I / O操作会极大地影响性能。

def _write_to_file(file, line):
    with open(file, "a") as f:

def _valid_records():
    for i in range(100000):
        if i % 2 == 0:
            yield i

def use_context_manager_2(file):
    for line in _valid_records():
        _write_to_file(file, str(line))

def use_context_manager_1(file):
    with open(file, "a") as f:
        for line in _valid_records():

def use_close_method(file):
    f = open(file, "a")
    for line in _valid_records():

# Finished 'use_close_method' in 0.0253 secs
# Finished 'use_context_manager_1' in 0.0231 secs
# Finished 'use_context_manager_2' in 4.6302 secs

读写文件 (Read & Write to a file)

After you open a file, you must want to read or write to the file. The file object provides 3 methods to read a file which are read(), readline() and readlines().

By default, read(size=-1) returns the entire contents of a file. If the file is bigger than the memory, the optional parameter size can help you to limit the size of the returned characters (text mode) or bytes (binary mode).

readline(size=-1) returns an entire line including character \n at the end. If size is bigger than 0, it will return maximum size number of characters from the line.

readlines(hint=-1) returns all the lines of a file in a list. The optional parameter hint means if the number of characters returned exceeds hint, no more lines will be returned.

Among these 3 methods, read() and readlines() are less memory efficient because by default they return the complete file either in a string or in a list. A more memory efficient way to iterate over lines is to use readline() and let it stop reading until it returns an empty string. The empty string "" means the pointer reaches the end of the file.

with open('test.txt', 'r') as reader:
    line = reader.readline()
    while line != "":
        line = reader.readline()

In terms of writing, there are 2 methods write() and writelines(). As the name suggests, write() is to write a string and writelines() is to write a list of string. It’s the responsibility of the developer to add \n at the end.

with open("test.txt", "w+") as f:
    f.writelines(["this is a line\n", "this is another line\n"])
# >>> cat test.txt 
# hi
# this is a line
# this is another line

If you write text to a special file type such as JSON or csv, then you should use Python built-in module json or csv on top of file object.


import csv
import json

with open("cities.csv", "w+") as file:
    writer = csv.DictWriter(file, fieldnames=["city", "country"])
    writer.writerow({"city": "Amsterdam", "country": "Netherlands"})
            {"city": "Berlin", "country": "Germany"},
            {"city": "Shanghai", "country": "China"},
# >>> cat cities.csv 
# city,country
# Amsterdam,Netherlands
# Berlin,Germany
# Shanghai,China

with open("cities.json", "w+") as file:
    json.dump({"city": "Amsterdam", "country": "Netherlands"}, file)

# >>> cat cities.json 
# { "city": "Amsterdam", "country": "Netherlands" }

在文件内移动指针 (Move pointer within the file)

When we open a file, we get a file handler that points to a certain position. In r and w modes, the handler points to the beginning of the file. In a mode, the handler points to the end of the file.

tell() and seek()

As we read from the file, the pointer moves to the place where the next read will start from, unless we tell the pointer to move around. You can do this using 2 methods: tell() and seek().

tell() returns the current position of the pointer as number of bytes/characters from the beginning of the file. seek(offset,whence=0) moves the handler to a position offset characters away from whence. whence can be:

  • 0: from the beginning of the file

  • 1: from the current position

  • 2: from the end of the file


In the text mode, whence should only be 0 and offset should be ≥0.

with open("text.txt", "w+") as f:
    print(f.tell()) # 9 (pointer moves to 9, next read starts from 9)
    print( # 9abcdef

了解文件状态 (Understand the file status)

The file system on the operating system can tell you a number of practical information about a file. For example, what’s the size of the file, when it was created and modified. To get this information in Python, you can use os or pathlib module. Actually there are many common things between os and pathlib. pathlib is a more object-oriented module than os.

A way to get a complete status is to useos.stat("test.txt"). It returns a result object with many statistics such as st_size (size of the file in bytes), st_atime (timestamp of the most recent access), st_mtime (timestamp of the most recent modification), etc.

print(os.stat("text.txt"))>>> os.stat_result(st_mode=33188, st_ino=8618932538, st_dev=16777220, st_nlink=1, st_uid=501, st_gid=20, st_size=16, st_atime=1597527409, st_mtime=1597527409, st_ctime=1597527409)

You can also get statistics individually using os.path.





Another way to get the complete status is to use pathlib.Path("text.txt").stat(). It returns the same object as os.stat().

获取完整状态的另一种方法是使用pathlib.Path("text.txt").stat() 。 它返回与os.stat()相同的对象。

print(>>> os.stat_result(st_mode=33188, st_ino=8618932538, st_dev=16777220, st_nlink=1, st_uid=501, st_gid=20, st_size=16, st_atime=1597528703, st_mtime=1597528703, st_ctime=1597528703)

We will compare more aspects of os and pathlib in the following sections.


复制,移动和删除文件 (Copy, Move and Delete a file)

Python has many built-in modules to handle file movement. Before you trust the first answer returned by Google, you should be aware that different choices of modules can lead to different performances. Some modules will block the thread until the file movement is done, while others might do it asynchronously.

shutil is the most well-known module for moving, copying, and deleting both files and folders. It provides 4 methods to only copy a file. copy(), copy2() and copyfile().

copy() v.s. copy2(): copy2() is very much similar to copy(). The difference is that copy2() also copies the metadata of the file such as the most recent access time, the most recent modification time. But according to Python doc, even copy2() cannot copy all the metadata due to the constrain on the operating system.

shutil.copy("1.csv", "copy.csv")
shutil.copy2("1.csv", "copy2.csv")

# 1.csv
# os.stat_result(st_mode=33152, st_ino=8618884732, st_dev=16777220, st_nlink=1, st_uid=501, st_gid=20, st_size=11, st_atime=1597570395, st_mtime=1597259421, st_ctime=1597570360)

# copy.csv
# os.stat_result(st_mode=33152, st_ino=8618983930, st_dev=16777220, st_nlink=1, st_uid=501, st_gid=20, st_size=11, st_atime=1597570387, st_mtime=1597570395, st_ctime=1597570395)

# copy2.csv
# os.stat_result(st_mode=33152, st_ino=8618983989, st_dev=16777220, st_nlink=1, st_uid=501, st_gid=20, st_size=11, st_atime=1597570395, st_mtime=1597259421, st_ctime=1597570395)

copy() v.s. copyfile(): copy() sets the permission of the new file the same as the original file, but copyfile() doesn’t copy its permission mode. Secondly, the destination of copy() can be a directory. If a file with the same name exists, it will be overwritten, otherwise, a new file will be created. But, the destination of copyfile() must be the target file name.

shutil.copy("1.csv", "copy.csv")
shutil.copyfile("1.csv", "copyfile.csv")


# 1.csv
# os.stat_result(st_mode=33152, st_ino=8618884732, st_dev=16777220, st_nlink=1, st_uid=501, st_gid=20, st_size=11, st_atime=1597570395, st_mtime=1597259421, st_ctime=1597570360)

# copy.csv
# os.stat_result(st_mode=33152, st_ino=8618983930, st_dev=16777220, st_nlink=1, st_uid=501, st_gid=20, st_size=11, st_atime=1597570387, st_mtime=1597570395, st_ctime=1597570395)

# copyfile.csv
# permission (st_mode) is changed
# os.stat_result(st_mode=33188, st_ino=8618984694, st_dev=16777220, st_nlink=1, st_uid=501, st_gid=20, st_size=11, st_atime=1597570387, st_mtime=1597570395, st_ctime=1597570395)

shutil.copyfile("1.csv", "./source")
# IsADirectoryError: [Errno 21] Is a directory: './source'



os module has a function system() that allows you to execute the command in a subshell. You need to pass the command as an argument to the system(). This has the same effect as the command executed on the operating system. For moving and deleting files, you can also use dedicated functions in os module.

# copy
os.system("cp 1.csv copy.csv")

# rename/move
os.system("mv 1.csv move.csv")
os.rename("1.csv", "move.csv")

# delete
os.system("rm move.csv")

Copy/Move a file asynchronously


So far, the solutions are always synchronous, which means the program might be blocked if the file is huge and needs more time to move. If you want to make the program asynchronous, you can use threading , multiprocessing or subprocess module to let the file operation run in a separate thread or a separate process.

import threading
import subprocess
import multiprocessing

src = "1.csv"
dst = "dst_thread.csv"

thread = threading.Thread(target=shutil.copy, args=[src, dst])

dst = "dst_multiprocessing.csv"
process = multiprocessing.Process(target=shutil.copy, args=[src, dst])

cmd = "cp 1.csv dst_subprocess.csv"
status =, shell=True)

搜索文件 (Search a file)

After copying and moving files, you will probably want to search for filenames that match a particular pattern. Python provides a number of built-in functions for you to choose from.

The glob module finds all the pathnames matching a specified pattern according to the rules used by the Unix shell. It supports wildcard characters such as * ? [].

glob.glob("*.csv") searches for all the files that have csv extension in the current directory. glob module makes it possible to search for files in the subdirectories as well.

>>> import glob
>>> glob.glob("*.csv")
['1.csv', '2.csv']
>>> glob.glob("**/*.csv",recursive=True)
['1.csv', '2.csv', 'source/3.csv']



os module is so powerful that it can basically do everything with file operation. We can simply list all the files in the directory using os.listdir() and use file.endswith() and file.startswith() to detect the pattern. If you want to traverse the directory, then use os.walk().

import os

for file in os.listdir("."):
    if file.endswith(".csv"):
for root, dirs, files in os.walk("."):
    for file in files:
        if file.endswith(".csv"):



pathlib has a similar function to the glob module. It’s possible to search filenames recursively as well. Compared to the previous solution based on os, pathlib has less code and offers a more object-oriented solution.

from pathlib import Path

p = Path(".")
for name in p.glob("**/*.csv"): # recursive

播放文件路径 (Play around with file path)

Working with a file path is another common task that we do. It can be getting the relative path and absolute path of a file. It can also be joining multiple paths and finding the parent directory, etc.

relative and absolute path


Both os and pathlib offer functions to get the relative path and absolute path of a file or a directory.


import os
import pathlib

print(os.path.abspath("1.txt"))  # absolute
print(os.path.relpath("1.txt"))  # relative

print(pathlib.Path("1.txt").absolute())  # absolute
print(pathlib.Path("1.txt"))  # relative

Joining paths


This is how we can join paths in os and pathlib independent of the environment. pathlib uses a slash to create child paths.

import os
import pathlib

print(os.path.join("/home", "file.txt"))
print(pathlib.Path("/home") / "file.txt")

Getting the parent directory


dirname() is the function to get parent directory in os, while in pathlib, you can just use Path().parent to get the parent folder.

import os
import pathlib

# relative path
# source
# source

# absolute path
# /Users/<...>/project/source
# /Users/<...>/project/source

操作系统vs pathlib (os v.s. pathlib)

Last but not least, I want to briefly talk about os and pathlib. As the Python doc says, pathlib is a more object-oriented solution than os. It represents each file path as a proper object instead of a string. This brings a lot of advantages to the developers such as making it easier to join multiple paths, being more consistent on different operation systems, methods are directly accessible from the object.

I hope this article can boost your efficiency in working with files.


参考: (Reference:)


python 文件覆盖





