python文件和目录访问File and Directory Access_第1关:学习-python文件之目录访问-CSDN博客

本文链接：https://blog.csdn.net/pipisorry/article/details/38964135

http://blog.csdn.net/pipisorry/article/details/47907589

python创建新文件

创建某个文件：直接使用写模式打开就可以了with open(json_file, 'w', encoding='utf-8') as out_file

但是如果文件名中带有路径，而路径不存在就会报错：FileNotFoundError: [Errno 2] No such file or directory: 'word_seg\\0'

这时要先判断路径是否存在，若不存在要先创建路径，再去新建文件

os.makedirs(os.path.dirname(json_file), exist_ok=True)
with open(json_file, 'w', encoding='utf-8') as out_file

注意，如果要创建的文件带有路径，而文件名又是一个路径时，用写w模式打开会出错，因为这时打开的是一个路径，将一个路径当成文件打开就会出现如下错误：

PermissionError: [Errno 13] Permission denied:

其解决方法还是上面的，先判断路径存在否，再创建文件。

pathlib面对对象路径库（仅py3）

pathlib库在python 3.4（py2没有）以后已经成为标准库，基本上可以代替os.path来处理路径。它采用完全面对对象的编程方式。

os 与 PurePath/Path 对应相同的函数映射表

Note: 尽管 os.path.relpath() 和 PurePath.relative_to() 拥有相同的重叠的用例，但它们语义相差很大，不能认为它们等价。

os 和 os.path	pathlib
os.path.abspath()	Path.resolve()
os.chmod()	Path.chmod()
os.mkdir()	Path.mkdir()
os.rename()	Path.rename()
os.replace()	Path.replace()
os.rmdir()	Path.rmdir()
os.remove(), os.unlink()	Path.unlink()
os.getcwd()	Path.cwd()
os.path.exists()	Path.exists()
os.path.expanduser()	Path.expanduser() 和 Path.home()
os.path.isdir()	Path.is_dir()
os.path.isfile()	Path.is_file()
os.path.islink()	Path.is_symlink()
os.stat()	Path.stat(), Path.owner(),Path.group()
os.path.isabs()	PurePath.is_absolute()
os.path.join()	PurePath.joinpath()
os.path.basename()	PurePath.name
os.path.dirname()	PurePath.parent
os.path.samefile()	Path.samefile()
os.path.splitext()	PurePath.suffix

[pathlib — Object-oriented filesystem paths]

[The Python Library Reference Release2.6 - 16.1.4]

目录和文件操作os.path — Common pathname manipulations

[目录和文件操作os.path — Common pathname manipulations]

os — Files and Directories

[os — Files and Directories]

python文件读取模块 - fileinput模块

fileinput模块可以对一个或多个文件中的内容进行迭代、遍历等操作。用fileinput对文件进行循环遍历，格式化输出，查找、替换等操作，非常方便。
Python的精髓在于模块的运用，运用C的思维，很难学好Python。fileinput模块可以轻松的遍历文本的所有行，可以实现类似pythonsome_script.py file_1.txt file_2.txtfile_2.txt的模式。
实际上就是一个readline（），只不过可以实现更多的功能
该模块的input()函数有点类似文件readlines()方法，区别在于:
前者是一个迭代对象，即每次只生成一行，需要用for循环迭代。后者是一次性读取所有行。在碰到大文件的读取时，前者无疑效率更高效。

【基本格式】

fileinput.input([files[, inplace[, backup[, bufsize[, mode[, openhook]]]]]])

【默认格式】

fileinput.input (files=None, inplace=False, backup='', bufsize=0, mode='r', openhook=None)

files: #文件的路径列表，默认是stdin方式，多文件['1.txt','2.txt',...]
inplace: #是否将标准输出的结果写回文件，默认不取代
backup: #备份文件的扩展名，只指定扩展名，如.bak。如果该文件的备份文件已存在，则会自动覆盖。
bufsize: #缓冲区大小，默认为0，如果文件很大，可以修改此参数，一般默认即可
mode: #读写模式，默认为只读
openhook: #该钩子用于控制打开的所有文件，比如说编码方式等;

fileinput模块中的常用函数

fileinput.input() #返回能够用于for循环遍历的对象

fileinput.filename()    #返回当前文件的名称
fileinput.lineno()      #返回当前已经读取的行的数量（或者序号）
fileinput.filelineno()  #返回当前读取的行的行号
fileinput.isfirstline() #检查当前行是否是文件的第一行
fileinput.isstdin()     #判断最后一行是否从stdin中读取
fileinput.close()       #关闭队列

[fileinput — Iterate over lines from multiple input streams]

[Python中fileinput模块介绍]

[python文件替代fileinput模块]

皮皮Blog

tempfile模块

用于生成临时文件和目录

tempfile.mkdtemp(suffix=None, prefix=None, dir=None)

Creates a temporary directory in the most secure manner possible. Thereare no race conditions in the directory’s creation. The directory isreadable, writable, and searchable only by the creating user ID.

The user of mkdtemp() is responsible for deleting the temporarydirectory and its contents when done with it.

The prefix, suffix, and dir arguments are the same as formkstemp().mkdtemp() returns the absolute pathname of the new directory.

如：

import tempfile

path = tempfile.mkdtemp(suffix='.txt', prefix='pi')
print(path)

/tmp/piv5dwqs01.txt

with tempfile.TemporaryFile() as fp:
...     fp.write(b'Hello world!')
...     fp.seek(0)
...     fp.read()
b'Hello world!'

[tempfile — Generate temporary files and directories]

Glob()查找文件

大多Python函数有着长且具有描述性的名字。但是命名为glob()的函数你可能不知道它是干什么的除非你从别处已经熟悉它了。
它像是一个更强大版本的listdir()函数。它可以让你通过使用模式匹配来搜索文件。
import glob
# get all py files
files = glob.glob('*.py')
print files

# Output
# ['arg.py', 'g.py', 'shut.py', 'test.py']

import glob
# get all py files
files = glob.glob('*.py')
print files

# Output
# ['arg.py', 'g.py', 'shut.py', 'test.py']

你可以像下面这样查找多个文件类型：

import itertools as it, glob
def multiple_file_types(*patterns):
return it.chain.from_iterable(glob.glob(pattern) for pattern in patterns)
for filename in multiple_file_types("*.txt", "*.py"): # add as many filetype arguements
print filename

# output
#=========#
# test.txt
# arg.py
# g.py
# shut.py
# test.py
如果你想得到每个文件的绝对路径，你可以在返回值上调用realpath()函数：
import itertools as it, glob, os
def multiple_file_types(*patterns):
return it.chain.from_iterable(glob.glob(pattern) for pattern in patterns)

for filename in multiple_file_types("*.txt", "*.py"): # add as many filetype arguements
realpath = os.path.realpath(filename)
print realpath

# output
#=========#
# C:\xxx\pyfunc\test.txt
# C:\xxx\pyfunc\arg.py
# C:\xxx\pyfunc\g.py

# C:\xxx\pyfunc\shut.py
# C:\xxx\pyfunc\test.py

[glob — Unix style pathname pattern expansion]

fnmatch 实现shell风格模式匹配特定字符

fnmatch.fnmatch(names, pattern)

测试name是否匹配pattern，返回true/false。

下面的例子列出了当前目录中的所有py文件：

>>>import fnmatch
>>>import os
>>>for file in os.listdir('.'):
... if fnmatch.fnmatch(file, '*.py'):
... print file
...
allfile.py
list_get.py
test.py

如果操作系统是大小写不敏感的，则在fnmatch.fnmatch()中所有的参数将被统一格式为所有大写或所有小写。

fnmatch.fnmatchcase( names, pattern)

与平台无关的大小写敏感的fnmatch.fnmatch

fnmatch.filter(names, pattern)

实现列表特殊字符的过滤或筛选，返回符合匹配模式的字符列表，例：

>>> files=['tags', 'readme.txt', 'allfile.py', 'test.py']
>>> fnmatch.filter(files, '*.py')
['allfile.py', 'test.py']
>>> fnmatch.filter(files, '[tx]*')
['tags', 'test.py']
>>> fnmatch.filter(files, '[tr]*')
['tags', 'readme.txt', 'test.py']
>>> fnmatch.filter(files, '*[tr]*')
['tags', 'readme.txt', 'test.py']
>>> fnmatch.filter(files, '?[a]*')
['tags']

注意： [seq] 匹配单个seq中的任意单个字符

fnmatch.translate(pattern)

翻译模式， fnmatch将这种全局模式转换成一个正则式，然后使用re模块来比较名字和模式。 translate() 函数是一个公共API用于将全局模式转换成正则式。

>>>import fnmatch
>>> pattern='*.py'
>>>print fnmatch.translate(pattern)
.*\.py\Z(?ms)

unix shell风格匹配方式

*表示匹配任何单个或多个字符
?表示匹配单个字符
[seq] 匹配单个seq中的任意单个字符
[!seq]匹配单个不是seq中的任意单个字符

[fnmatch — Unix filename pattern matching]

linecache

读取文件某一行的内容（测试过1G大小的文件，效率还可以）

import linecache

count = linecache.getline(filename,linenum)

str = linecache.getlines(filename) #str为列表形式，每一行为列表中的一个元素
Note:linecache是专门支持读取大文件，而且支持行式读取的函数库。 linecache预先把文件读入缓存起来，后面如果你访问该文件的话就不再从硬盘读取。

读取文件之后你不需要使用文件的缓存时需要在最后清理一下缓存，使linecache.clearcache()清理缓存，释放缓存。

这个模块是使用内存来缓存你的文件内容，所以需要耗费内存，打开文件的大小和打开速度和你的内存大小有关系。

【python计算文件的行数和读取某一行内容的实现方法】

python从第二行开始读文件到k行

1 data = open(filename)
next(data) #或者data.readline()
for e in data:
print(e)

2 lines = f.readlines()[1:]

for l in lines:

print(l)

3. a=linecache.getlines('a.txt')[0:-1]

python从第i行开始读文件到第j行

1. 获取a.txt文件中第1-4行的内容

>>> a=linecache.getlines('a.txt')[0:4]

2. lnum = 0

with open('pit.txt', 'r') as fd:

for line in fd:

lnum += 1;

if (lnum >= 10) && (lnum <= 13):

print line

fd.close()

[linecache — Random access to text lines]

[python linecache模块读取文件用法详解]
《python cookbook》中文版第二版 2.4节从文件中读取指定的行（Luther Blissett）

`shutil`

— High-level file operations

shutil.copyfile(src, dst, *, follow_symlinks=True)

shutil.copy(src, dst, *, follow_symlinks=True)

shutil.copytree(src, dst, symlinks=False, ignore=None, copy_function=copy2, ignore_dangling_symlinks=False)¶

shutil.rmtree(path, ignore_errors=False, οnerrοr=None)

Delete an entire directory tree; path must point to a directory (but not asymbolic link to a directory). If ignore_errors is true, errors resultingfrom failed removals will be ignored; if false or omitted, such errors arehandled by calling a handler specified by onerror or, if that is omitted,they raise an exception.

递归删除目录（所有子目录，包容非空子目录）：

if os.path.exists(os.path.join(CWD, 'middlewares/metadata')):  __import__('shutil').rmtree(os.path.join(CWD, 'middlewares/metadata'))

shutil.move(src, dst, copy_function=copy2)

shutil.chown(path, user=None, group=None)

[shutil — High-level file operations]

皮皮Blog

其它相关库

[stat — Interpreting stat() results]

[filecmp — File and Directory Comparisons]

[macpath — Mac OS 9 path manipulation functions]

from:http://blog.csdn.net/pipisorry/article/details/47907589

ref: [File and Directory Access]

[os — Miscellaneous operating system interfaces — Python 3.11.5 documentation]

[The Python Library Reference Release2.6 - 11.1]