python-办公自动化-Word转PDF、奇数页插空白页、合并PDF

最新推荐文章于 2023-02-16 20:58:13 发布

yimenren

最新推荐文章于 2023-02-16 20:58:13 发布

阅读量768

点赞数

原文链接：https://blog.csdn.net/m0_48010654/article/details/112605971

版权

这篇博客介绍了Python在办公自动化中的应用，包括使用docx2pdf模块批量将Word文档转换为PDF，奇数页PDF插入空白页以适应双面打印，以及使用PyPDF2模块合并PDF文件。代码示例详细，适用于日常办公自动化需求。

摘要由CSDN通过智能技术生成

转自：

https://blog.csdn.net/m0_48010654/article/details/112605971

这里写自定义目录标题
前言
一、办公自动化基础
1.批量处理-import os
2.批量处理-生成文件列表
3.批量处理-循环语句
二、批量Word转PDF
三、奇数页插入空白PDF
四、合并PDF
前言
初学python，主要目的是办公自动化，使用场景为日常办公，解决的问题为批量Word转PDF、合并PDF、奇数页插空白页（方便PDF合并后双面打印）、把Excel的内容插入Word批量生成周报等。
因为初学掌握不到位，也还没学会怎样集成代码生成一个工具直接用，只能一部分一部分附上代码。以下代码是用的比较好的。python版本为3.8。
特别说明：使用的代码为网上收集，稍加修改方便使用，特别感谢代码来源博主。
python 还用过Excel内容插入Word批量生成文件、图片扫描成PDF识别，因为篇幅有限，暂不列出。感谢CSDN的前辈们。

一、办公自动化基础
办公自动化主要用到处理Word、Excel、PDF的工具，批量处理的实施。
批量处理第一要用到 os 模块-去设置要处理的文件或放置文件的路径，第二是循环语句。其中代码中用到的模块可以用 pip install 模块名称 --index-url https://pypi.douban.com/simple 下载，–index-url 为使用镜像下载，这样快很多和避免报错。

1.批量处理-import os
import os
os.getcwd(path) --获取当前路径
os.chdir(r’c:----’)–改变当前路径，输入路径记得输入 “r” ,使路径不被转义。
os.walk(path) 遍历路径下所有文件，是根目录-子目录-文件这样的遍历顺序。
os.listdir(path)当前目录下所有文件（不包含子文件夹里的文件）。
筛选PDF文件可以使用以下方法：
1.判断 os.path.splitext(file)[1]==".pdf" 将文件名和扩展名分开。
2.file.endswith(".pdf")
3.file .spilt(".")[1] = = “pdf”
补充知识【PYthon】os.path.splitext()与os.path.split()的区别

2.批量处理-生成文件列表
定义一个函数生成文件的绝对路径表，（os.path.join()合并路径与文件名）使用os.walk 遍历文件夹，使用 if 判断字符串以“PDF”结束。

def getFileName(filedir):

    file_list = [os.path.join(root, filespath) \
                 for root, dirs, files in os.walk(filedir) \
                 for filespath in files \
                 if str(filespath).endswith('pdf')
                 ]
    return file_list if file_list else []

3.批量处理-循环语句
for i in range (循环次数)：
todo(操作）
或者
for i in list（某个列表）：
todo(操作）

二、批量Word转PDF
用到docx2pdf模块，简洁，比win32报错少。路径输入采用input的方式，基本保证路径不会被转义，也不用输入"\"。
map 函数搭配lambda 自定义函数生成绝对路径的文件名`#Word 转PDF

#pip install --user -i https://pypi.tuna.tsinghua.edu.cn/simple/ docx2pdf

from docx2pdf import convert
import os

#转换的文件路径
director = input("请输入要转换的文件路径")
FileList = map(lambda x: director + '\\' + x, os.listdir(director))
for file in FileList:
    try:
        if file.endswith(".docx")or file.endswith(".doc"):
            print(file)
            convert(file, f"{file.split('.')[0]}.pdf")
    except:
                print('could not convert')
    print ("finsh")

参考链接：https://blog.csdn.net/cqcre/article/details/107218349

三、奇数页插入空白PDF
代码来源：利用python处理pdf：奇数页pdf末尾添加一个空白页
代码来源知乎，只是修改了文件路径通过input的方式。

#奇数页PDF插入空白页
import os,PyPDF2,pyperclip
pathofcwd = input("请输入要处理的PDF的文件路径")
# ^ 需要处理的pdf存放位置
class pdfReader:
    # ^ 处理pdf的一个类，把和pdf处理有关的代码都放在这里了
    blankPdfPath = input("请输入空白PDF的文件路径")
    # ^ 空白页pdf存放位置
    def __init__(self,pdfPath):
        self.pdfPath = pdfPath
        self.blankPageFile, self.blankPage = self.openAndReadit(self.blankPdfPath)
        self.pdfFile, self.pdfReader = self.openAndReadit(self.pdfPath)
    
    def openAndReadit(self,pdfpath):
        """
        generate the pdfReader object for given path in parameter
        """
        pdfFile = open(pdfpath, 'rb')
        pdfReader = PyPDF2.PdfFileReader(pdfFile)
        return (pdfFile,pdfReader)

    def appendBlank(self):
        """
        no para, return a pdf writer with blankPage appended
        """
        pdfWriter = PyPDF2.PdfFileWriter()
        for pageNum in range(self.pdfReader.numPages):
            pageObj = self.pdfReader.getPage(pageNum)
            pdfWriter.addPage(pageObj)
        # add the blank page:
        pdfWriter.addPage(self.blankPage.getPage(0))
        return pdfWriter
    
    def closeAllFile(self):
        self.blankPageFile.close()
        self.pdfFile.close()

os.chdir(pathofcwd)
fileList = os.listdir()

pdfList = filter(
    lambda e:os.path.splitext(e)[1]=='.pdf',
    fileList
)
# ^ 过滤文件列表，只保留pdf

pdfReaderList = map(
    lambda e:pdfReader(e),
    pdfList
)
# ^ 根据pdf文件路径生成pdfReader类

pdfReaderList = filter(
    lambda e: e.pdfReader.numPages % 2 == 1,
    pdfReaderList
)
# ^ 只保留奇数页pdf的pdfReader类

pdfReaderList = list(pdfReaderList)

for pdfReader in pdfReaderList:
    pdfAddBlankWriter = pdfReader.appendBlank()
    outputPath = os.path.splitext(pdfReader.pdfPath)[0]+'_addBlank'+'.pdf'
    pdfOutputFile = open(outputPath,'wb')
    pdfAddBlankWriter.write(pdfOutputFile)
    pdfOutputFile.close()
    pdfReader.closeAllFile()
    print("preparing to output as:%s" % outputPath)

四、合并PDF
只是修改了文件路径通过input的方式。
代码来源Python之合并PDF文件

# -*- coding:utf-8-*-
# 利用PyPDF2模块合并同一文件夹下的所有PDF文件
# 只需修改存放PDF文件的文件夹变量：file_dir 和 输出文件名变量: outfile

import os
from PyPDF2 import PdfFileReader, PdfFileWriter
import time

# 使用os模块的walk函数，搜索出指定目录下的全部PDF文件
# 获取同一目录下的所有PDF文件的绝对路径
def getFileName(filedir):

    file_list = [os.path.join(root, filespath) \
                 for root, dirs, files in os.walk(filedir) \
                 for filespath in files \
                 if str(filespath).endswith('pdf')
                 ]
    return file_list if file_list else []

# 合并同一目录下的所有PDF文件
def MergePDF(filepath, outfile):

    output = PdfFileWriter()
    outputPages = 0
    pdf_fileName = getFileName(filepath)

    if pdf_fileName:
        for pdf_file in pdf_fileName:
            print("路径：%s"%pdf_file)

            # 读取源PDF文件
            input = PdfFileReader(open(pdf_file, "rb"))

            # 获得源PDF文件中页面总数
            pageCount = input.getNumPages()
            outputPages += pageCount
            print("页数：%d"%pageCount)

            # 分别将page添加到输出output中
            #可以通过range选择页面
            for iPage in range(pageCount):
                output.addPage(input.getPage(iPage))

        print("合并后的总页数:%d."%outputPages)
        # 写入到目标PDF文件
        outputStream = open(os.path.join(filepath, outfile), "wb")
        output.write(outputStream)
        outputStream.close()
        print("PDF文件合并完成！")

    else:
        print("没有可以合并的PDF文件！")

# 主函数
def main():
    time1 = time.time()
    
    file_dir = input("请输入存放PDF的原文件夹路径") # 存放PDF的原文件夹  唯一修改地方
    outfile = input("输出的文件名称为") # 输出的PDF文件的名称
    MergePDF(file_dir, outfile)
    time2 = time.time()
    print('总共耗时：%s s.' %(time2 - time1))

main()

代码来源
[1]Word转PDF
[2]利用python处理pdf：奇数页pdf末尾添加一个空白页
[3]Python之合并PDF文件
————————————————
版权声明：本文为CSDN博主「echo-无声呻吟」的原创文章，遵循CC 4.0 BY-SA版权协议，转载请附上原文出处链接及本声明。
原文链接：https://blog.csdn.net/m0_48010654/article/details/112605971

yimenren

关注

0
点赞
踩
5

收藏

觉得还不错? 一键收藏
0
评论
python-办公自动化-Word转PDF、奇数页插空白页、合并PDF

转自：https://blog.csdn.net/m0_48010654/article/details/112605971这里写自定义目录标题前言一、办公自动化基础1.批量处理-import os2.批量处理-生成文件列表3.批量处理-循环语句二、批量Word转PDF三、奇数页插入空白PDF四、合并PDF前言初学python，主要目的是办公自动化，使用场景为日常办公，解决的问题为批量Word转PDF、合并PDF、奇数页插空白页（方便PDF合并后双面打印）、把Excel的内容插入
复制链接

扫一扫