python3 写文件多c2_《写给系统管理员的 Python 脚本编程指南》笔记—

本章介绍主题：处理 PDF 文件

处理 Excel 文件

处理 CSV 文件

处理文本文件

9.1 处理 PDF 文件

PyPDF2 模块处理 PDF 文件，比如读取 PDF 文件、获取页数、提取文本和旋转页面。

9.1.1 读取 PDF 文件并获取页数

# example 读取 PDF

import PyPDF2

with open('habits_2007.pdf', 'rb') as f:

pf = PyPDF2.PdfFileReader(f)

print("PDF pages: ", pf.numPages)

9.1.2 提取文本

# example 提取文本

with open('habits_2007.pdf', 'rb') as f:

pf = PyPDF2.PdfFileReader(f)

first_page = pf.getPage(1)

first_page_content = first_page.extractText()

print(first_page_content)

9.1.3 旋转页面

# example 旋转页面

with open('habits_2007.pdf', 'rb') as f:

pdf_reader = PyPDF2.PdfFileReader(f)

pdf_writer = PyPDF2.PdfFileWriter()

for pg_num in range(pdf_reader.numPages):

page = pdf_reader.getPage(pg_num)

page.rotateClockwise(90)

pdf_writer.addPage(page)

with open('rotated.pdf', 'wb') as w:

pdf_writer.write(w)

9.2 处理 Excel 文件

使用 xlrd、Pandas 和 openpyxl 处理 Excel 文件。

9.2.1 使用 xlrd 模块

#example 读取单元格

import xlrd

excel_file = r"sample.xlsx"

book_obj = xlrd.open_workbook(excel_file)

excel_sheet = book_obj.sheet_by_index(0)

cell_value = excel_sheet.cell_value(0, 1)

print(cell_value)

#example 读取列名

import xlrd

excel_file = r"sample.xlsx"

book_obj = xlrd.open_workbook(excel_file)

excel_sheet = book_obj.sheet_by_index(0)

for i in range(excel_sheet.ncols):

print(excel_sheet.cell_value(0, i))

9.2.2 使用 Pandas 模块

#example 读取 Excel

import pandas

excel_file = r'sample.xlsx'

df = pandas.read_excel(excel_file)

print(df.head())

#output: 只有头的

Empty DataFrame

Columns: [ID, First Name]

Index: []

#output: 有数据的

ID First Name Last Name

0 1 zhang san

1 2 wang wu

2 3 li si

#example 读取指定列

import pandas

excel_file = r'sample.xlsx'

cols = [1, 2]

df = pandas.read_excel(excel_file, sheet_name='Sheet1', usecols=cols)

print(df.head())

9.2.3 使用 openpyxl 模块

#example 创建 Excel 文件

from openpyxl import Workbook

book_obj = Workbook()

excel_sheet = book_obj.active

excel_sheet['a1'] = 'Name'

excel_sheet['a2'] = 'Student'

excel_sheet['b1'] = 'age'

excel_sheet['b2'] = '24'

book_obj.save('text.xlsx')

print('Excel created successfully')

#example 添加若干值

from openpyxl import Workbook

book_obj = Workbook()

excel_sheet = book_obj.active

rows = [

(11, 12, 13),

(21, 22, 23),

(31, 32, 33),

(41, 42, 43)

]

for row in rows:

excel_sheet.append(row)

book_obj.save('append_values.xlsx')

print('Excel created successfully')

#example 读取多个单元格

import openpyxl

book_obj = openpyxl.load_workbook('sample.xlsx')

excel_sheet = book_obj.active

cells = excel_sheet['a1': 'c3']

for c1, c2, c3 in cells:

print("{0:10} {1:10} {2:10}".format(c1.value, c2.value, c3.value))

9.3 处理 CSV 文件

csv 文件的格式，第一行可以是表头或数据，列之间逗号分隔。

Python 内置处理 CSV 文件的模块 csv。

9.3.1 读取 CSV 文件

#example

import csv

csv_file = open('sample.csv', 'r')

with csv_file:

read_csv = csv.reader(csv_file)

for row in read_csv:

print(row)

9.3.2 写入 CSV 文件

# example

import csv

write_rows = [['name', 'age'], ['zhang san', 23], ['li si', 24], ['wang wu', 25]]

with open('csv_write.csv', 'w') as f:

writer = csv.writer(f)

writer.writerows(write_rows)

print('Create CSV Write file successfully')

9.4 处理文本文件

Python 内置函数可以处理文本文件。可以使用不同模式来创建、打开、关闭、读取和删除文件。

模式：

r : 只读，不存在会抛出 IO 异常

r+ : 读写，不存在会抛出 IO 异常

w : 只写，不存在会创建，已有会覆盖

w+ : 读写，不存在会创建，已有会覆盖

a: 追加，不存在会创建，已有会追加

a+ : 追加和读取，不存在会创建，已有会追加

9.4.1 open 函数

语法：

open("path/to/file", "mode")

example:

open("test.txt", "a") # 追加模式打开文件

9.4.2 close 函数

语法：

file_obj.close()

example:

f = open("test.txt")

f.close()

9.4.3 写入文本文件

语法：

file_obj.write("content")

example:

f = open("test.txt", "w")

f.write("Hello")

9.4.4 读取文本文件

语法：

file_obj.read()

example:

f = open("test.txt", "r")

data = f.read()

print(data)

f.close()

9.5 总结

PyPDF2 模块处理 PDF 文件，xlrd 模块处理 Excel 文件，内置 csv 模块处理 CSV 文件，内置函数 open 等处理文本文件。

python3 写文件多c2_《写给系统管理员的 Python 脚本编程指南》笔记——第九章 处理不同类型的文件...

python3 写文件多c2_《写给系统管理员的 Python 脚本编程指南》笔记——第九章处理不同类型的文件...