Python 文件读写

最新推荐文章于 2022-07-05 13:04:14 发布

dawnredwood59

最新推荐文章于 2022-07-05 13:04:14 发布

阅读量257

点赞数

分类专栏： Pyhton 文章标签： Python 文件读写 Csv Docx

本文链接：https://blog.csdn.net/dawnredwood59/article/details/82995627

版权

Pyhton 专栏收录该内容

2 篇文章 0 订阅

订阅专栏

文件读写：
https://www.liaoxuefeng.com/wiki/0014316089557264a6b348958f449949df42a6d3a2e542c000/001431917715991ef1ebc19d15a4afdace1169a464eecc2000
http://www.runoob.com/python/file-methods.html
CSV
https://www.python.org/dev/peps/pep-0305/#id8
https://docs.python.org/3.7/library/csv.html#module-csv
https://www.cnblogs.com/pyxiaomangshe/p/8026483.html
DOCX
https://python-docx.readthedocs.io/en/latest/index.html
https://www.cnblogs.com/ontheway703/p/5266041.html
XLSX
https://www.cnblogs.com/ontheway703/p/5264517.html

一.file对象（file-like Object）

在标准库中，不需要安装

1.file 对象使用 open 函数来创建：
f = open(/path/1.txt, mode='r')

2.open函数参数：
open(file, mode=‘r’, buffering=-1, encoding=None, errors=None, newline=None, closefd=True)

file：必需，文件路径（相对或者绝对路径）；
mode：文件打开模式；
buffering：可取值有0，1， >1三个，0代表buffer关闭（只适用于二进制模式），1代表line buffer（只适用于文本模式），>1表示初始化的buffer大小；
encoding：表示的是返回的数据采用何种编码，一般采用utf8或者gbk；
errors的取值一般有strict，ignore，当取strict的时候，字符编码出现问题的时候，会报错，当取ignore的时候，编码出现问题，程序会忽略而过，继续执行下面的程序。
newline：可以取的值有None, \n, \r, ’’, ‘\r\n’ ，用于区分换行符，但是这个参数只对文本模式有效；
closefd：取值与传入的文件参数有关，默认情况下为True，传入的file参数为文件的文件名，取值为False的时候，file只能是文件描述符，什么是文件描述符，就是一个非负整数，在Unix内核的系统中，打开一个文件，便会返回一个文件描述符。

3.常见mode模式：

r或rt：默认模式，文本模式读
rb：二进制文件
w或wt：文本模式写，打开前文件存储被清空
wb：二进制写，文件存储同样被清空
a：追加模式，只能写在文件末尾
a+：可读写模式，写只能写在文件末尾
w+：可读写，与a+的区别是要清空文件内容
r+：可读写，与a+的区别是可以写到文件任何位置

4.file对象函数：

file.close()：关闭文件。关闭后文件不能再进行读写操作。
file.next()：返回文件下一行。
file.read([size])：从文件读取指定的字节数，如果未给定或为负则读取所有。
file.readline([size])：读取整行，包括 “\n” 字符。
file.readlines([sizeint])：读取所有行并返回列表，若给定sizeint>0，则是设置一次读多少字节，这是为了减轻读取压力。
file.seek(offset[, whence])：设置文件当前位置。
file.tell()：返回文件当前位置。
file.truncate([size])：截取文件，截取的字节通过size指定，默认为当前文件位置。
file.write(str)：将字符串写入文件，返回的是写入的字符长度。
file.writelines(sequence)：向文件写入一个序列字符串列表，如果需要换行则要自己加入每行的换行符。

5.with语句：
文件读写完毕后必须调用f.close()的原因：

文件使用完毕后必须关闭，因为文件对象会占用操作系统的资源，并且操作系统同一时间能打开的文件数量也是有限的；
写文件时，操作系统往往不会立刻把数据写入磁盘，而是放到内存缓存起来，空闲的时候再慢慢写入。只有调用close()方法时，操作系统才保证把没有写入的数据全部写入磁盘。忘记调用close()的后果是数据可能只写了一部分到磁盘，剩下的丢失了。

可以使用 try … finally 来实现，但是每次都这么写实在太繁琐：

try:
    f = open('/path/to/file', 'r')
    print(f.read())
finally:
    if f:
        f.close()

try:
    f = open('/Users/michael/test.txt', 'w')
    f.write('Hello, world!')
finally:
    if f:
        f.close()

Python引入了with语句来自动帮我们调用close()方法：

with open('/path/to/file', 'r') as f:
    print(f.read())
    
with open('/Users/michael/test.txt', 'w') as f:
    f.write('Hello, world!')

这和前面的try … finally是一样的，但是代码更佳简洁，并且不必调用f.close()方法。

二.CSV模块

在标准库中，不需要安装

1.reader函数：
reader(csvfile, dialect='excel', **fmtparams)

参数说明：

csvfile：必须是支持迭代(Iterator)的对象，可以是文件(file)对象或者列表(list)对象，如果是文件对象，打开时需要加"b"标志参数。
dialect：编码风格，默认为excel的风格，也就是用逗号（,）分隔，dialect方式也支持自定义，通过调用register_dialect方法来注册，下文会提到。
fmtparam：格式化参数，用来覆盖之前dialect对象指定的编码风格。

函数：

csvreader.__next__():
Return the next row of the reader’s iterable object as a list (if the object was returned from reader()) or a dict (if it is a DictReader instance), parsed according to the current dialect. Usually you should call this as next(reader).

属性：

csvreader.dialect：
A read-only description of the dialect in use by the parser.
csvreader.line_num：
The number of lines read from the source iterator. This is not the same as the number of records returned, as records can span multiple lines.

2.writer函数：
writer(csvfile, dialect='excel', **fmtparams)

函数：

csvwriter.writerow(row)：
Write the row parameter to the writer’s file object, formatted according to the current dialect.
csvwriter.writerows(rows)：
Write all elements in rows (an iterable of row objects as described above) to the writer’s file object, formatted according to the current dialect.

属性：

csvwriter.dialect：
A read-only description of the dialect in use by the writer.

3.register_dialect函数： 自定义dialect
register_dialect(name, [dialect, ]**fmtparams)

name：你所自定义的dialect的名字，比如默认的是’excel’，你可以定义成’mydialect’。
[dialect, ]**fmtparams：dialect格式参数，有delimiter（分隔符，默认的就是逗号）、quotechar、quoting等等，可以参考官方文档。

4.unregister_dialect函数： 注销自定义的dialect
unregister_dialect(name)

5.常用语句：

从文件中获取列标题：

import csv

filename = '1.csv'
with open(filename) as f:
    reader = csv.reader(f)
    header_row = next(reader)

    for index, column_header in enumerate(header_row):
        print(index, column_header)

从文件中获取某一列：

import csv

filename = '1.csv'
with open(filename) as f:
    reader = csv.reader(f)
    next(reader)

    highs = []
    for row in reader:
        highs.append(row[4])

    print(highs)

三.DOCX模块

不在标准库中，需要安装（通过命令行下载的docx安装包未完全兼容python3，需要另行下载）：
在 https://www.lfd.uci.edu/~gohlke/pythonlibs/ 中找到python_docx-x.x.x-py2.py3-none-any.whl
下载到本地后安装：pip install python_docx-x.x.x-py2.py3-none-any.whl

1.文件读取与写入：

import docx

doc = docx.Document('1.docx')
# doc = docx.Document()  仅新建一个Document对象，不读取文件
doc.save('2.docx')

2.表格：

写入表格

import docx

doc = docx.Document()

# 建一个4行3列的表
table = doc.add_table(rows=1, cols=3, style='Table Grid')

header_cells = table.rows[0].cells
header_cells[0].text = 'Name'
header_cells[1].text = 'Id'
header_cells[2].text = 'Desc'

data_lines = 3
for i in range(data_lines):
    cells = table.add_row().cells
    cells[0].text = "Name%d" %i
    cells[1].text = "Id%d" %i
    cells[2].text = "Desc%d" %i

# 建一个2行4列的表
rows = 2
cols = 4
table = doc.add_table(rows=rows, cols=cols)

val = 1
for i in range(rows):
    cells = table.rows[i].cells
    for j in range(cols):
        cells[j].text = str(val * 10)
        val += 1

doc.save('tmp.docx')

读取表格

import docx

doc = docx.Document('tmp.docx')
for table in doc.tables:  # 遍历所有表格
    print('----table------')
    for row in table.rows:  # 遍历表格的所有行
        # 方法一
        for cell in row.cells:
            print(cell.text + '\t', end='')
        print()
        # 方法二
        row_str = '\t'.join([cell.text for cell in row.cells])
        # ''.join()插入分隔符
        print(row_str)

3.段落：

import docx

doc = docx.Document('1.docx')
content = '\n'.join([para.text for para in doc.paragraphs])
print(content)

4.综合运用：

from docx import Document
from docx.shared import Inches

document = Document()

document.add_heading('Document Title', 0)

p = document.add_paragraph('A plain paragraph having some ')
p.add_run('bold').bold = True
p.add_run(' and some ')
p.add_run('italic.').italic = True

document.add_heading('Heading, level 1', level=1)
document.add_paragraph('Intense quote', style='Intense Quote')

document.add_paragraph(
    'first item in unordered list', style='List Bullet'
)
document.add_paragraph(
    'first item in ordered list', style='List Number'
)

document.add_picture('1.png', width=Inches(1.25))

records = (
    (3, '101', 'Spam'),
    (7, '422', 'Eggs'),
    (4, '631', 'Spam, spam, eggs, and spam')
)

table = document.add_table(rows=1, cols=3)
hdr_cells = table.rows[0].cells
hdr_cells[0].text = 'Qty'
hdr_cells[1].text = 'Id'
hdr_cells[2].text = 'Desc'
for qty, id, desc in records:
    row_cells = table.add_row().cells
    row_cells[0].text = str(qty)
    row_cells[1].text = id
    row_cells[2].text = desc

document.add_page_break()

document.save('demo.docx')

效果

dawnredwood59

关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
Python 文件读写

PEP：https://www.python.org/dev/peps/pep-0305/#id8文档：https://docs.python.org/3.7/library/csv.html#module-csv
复制链接

扫一扫