docxtpl/python-docx

雪球干死黄旭东

已于 2024-06-12 11:18:39 修改

阅读量1w

点赞数 10

分类专栏： Z类分类_python包的使用文章标签： python 开发语言

于 2019-11-26 15:08:01 首次发布

本文链接：https://blog.csdn.net/yycoolsam/article/details/103255271

版权

本文介绍了基于docxtpl的Python自动化报告生成方法，包括模板设置、图片生成和数据/图片插入模板生成word文件。同时，文章还涵盖了如何解决相关错误，如ImportError: cannot import name ‘soft_unicode’，并分享了将html、pdf转化为word、pdf和md文件的多种技术，涉及到libreoffice、comtypes、pdfkit等工具的使用。

摘要由CSDN通过智能技术生成

基于docxtpl的自动化报告生成(基于word模板)
基于docx将html页面内容转化为word文档
基于comtypes将pdf/word转化为word文档-仅window系统可以使用
基于libreoffice将pdf/word转化为word文档-linux系统可以使用
- 直接调用libreoffice
基于OpenOffice将pdf/word转化为word文档-linux系统可以使用
基于pdfkit将html页面内容转化为pdf文档
基于pdf2docx将pdf文档内容转化为docx文档
- 错误修复-ImportError: DLL load failed while importing cv2
- 错误修复-TypeError: object of type 'NoneType' has no len()
基于pdfminer，markdown将pdf转化为md文件
- 安装pdfminer.six过程中出现Failed to build cryptography
- pandoc安装
pyecharts 离线使用
重新整合
- MSGraph
- MSWord

基于docxtpl的自动化报告生成(基于word模板)

在阅读这篇文章之前，可以去看我以前写过相似的自动化报告生成文章
ps:以前在新浪博客写，然后各种被删贴后换过来这边了

基于python-docx文本生成word文档 (http://blog.sina.com.cn/s/blog_1473990d60102x4fr.html)
基于tushare文本生成word文档(http://blog.sina.com.cn/s/blog_1473990d60102x4fs.html)
自动化接口文档生成(http://blog.sina.com.cn/s/blog_1473990d60102yhr3.html)

特别感谢

https://blog.csdn.net/u012117917/article/details/41604711 CSS 颜色代码大全 CSS颜色对照表
https://blog.csdn.net/sunchengquan/article/details/80369304 python批量操作word文档实战
https://blog.csdn.net/zhang__ao/article/details/80745873 echarts标题（title）配置
以上三篇文章。此文是基于以上三篇文章的实施文章。

此文分3部分

模板设置
图片生成
数据/图片插入模板中生成word文件

模板设置

在这里插入图片描述
这里,文档中的{ {qe}},{ {mie}}等都是作为指代参数参数使用，需要跟代码中的指代参数保持一致。
表示方法a为具体的某个指代
表示方法b为表格数据指代
表格的数据格式为

[{'city': '广东省', 'mie': 13885, 'toimie': '5476.71', 'qie': 18158, 'toiqie': '15736.47'},
 {'city': '浙江省', 'mie': 2350, 'toimie': '2861.79', 'qie': 1059, 'toiqie': '859.79'}]

图片生成

此处我是先用pyecharts生成html，再对html图片进行截图。
采用方法：http://pyecharts.org/#/zh-cn/render_images
中的snapshot_selenium，使用 Chrome 浏览器。具体的配置方法可以参考我以前写的

高新技术企业认定工作网(页面截图) (http://blog.sina.com.cn/s/blog_1473990d60102xaf0.html)
chrome浏览器。下载解压好的chromediver.exe文件放进python安装路径下的scripts文件夹里(或者你用的是anaconda,放进anaconda安装路径下的scripts文件夹里)
这里不再多说
相关代码如下

# ************************************画图*****************************************
    from pyecharts.charts import Bar,Grid,Pie
    from pyecharts import options as opts

    # 方形图
    subtitle = ''
    title = '图1.1：***********投资前10省分布'
    path_html = './picture/1.小册子-总体情况.html'
    path_png = "./picture/1.小册子-总体情况.png"
    draw_data = table_1[:10].sort_values(['toimie'], ascending=True)
    x_lable = draw_data['city'].to_list()
    y_data_1 = [format('%.2f'%i) for i in draw_data['toimie'].to_list()]
    y_data_2 = [format('%.2f'%i) for i in draw_data['toiqie'].to_list()]
    init_opts = opts.InitOpts(width="480px", height="360px")
    plt = (
        Bar(init_opts=init_opts)
            .set_global_opts(title_opts=opts.TitleOpts(title=title, # 标题
                                                       # subtitle = subtitle, # 副标题
                                                       pos_left='center',pos_top='bottom', # 标题位置
                                                       title_textstyle_opts = {
   
                                                           'fontSize':10.5
                                                       }
                                                       ),
                             yaxis_opts=opts.AxisOpts( # Y轴设置

                             ),
                             xaxis_opts=opts.AxisOpts(   # X轴设置
                                # type_="category"     # 行坐标类型
                             ),
                             legend_opts=opts.LegendOpts(type_='scroll',    # 图例
                                                         orient='vertical',  # 图例列表的布局朝向
                                                         pos_left="center", pos_top='center'   # 图例位置
                                                         ),
                             tooltip_opts=opts.TooltipOpts(trigger='axis'),
                             toolbox_opts=opts.ToolboxOpts(),  # 工具栏
                             # datazoom_opts=opts.DataZoomOpts(),  # 缩放功能

                             )
            .set_series_opts(label_opts=opts.LabelOpts(# is_show=False,       # 是否显示数值
                                                       position="right" ,     # 设置字体对齐
                                                       ))
        #     .extend_axis(           # 双轴
        #     yaxis=opts.AxisOpts()
        # )
            .add_xaxis(x_lable
                       )
            .add_yaxis('****投资总额(亿元)', y_data_1,label_opts=opts.LabelOpts(position='right', # 标签文字位置
                                                                        font_weight='bolder',    # 标签字体
                                                                        # color='#FFC8B4'
                                                                        ),
                                                    # color='#FFC8B4'

                       )
            .add_yaxis('对****投资总额(亿元)', y_data_2,label_opts=opts.LabelOpts(position='right',font_weight='bolder'))
            .reversal_axis()  # 转轴
    )
    grid = Grid(init_opts=init_opts)
    grid.add(plt, grid_opts=opts.GridOpts(pos_top='5'))  # 仅使用pos_top修改相对顶部的位置
    grid.render(path_html)


    # 玫瑰图
    subtitle = '图1.2：**********互投行业明细'
    title = '外圈:***投资总额  内圈:对****投资总额'
    path_html_1 = './picture/1.小册子-总体情况_1.html'
    path_png_1 = "./picture/1.小册子-总体情况_1.png"
    code_num = len(table_2['code'].to_list())
    # code_num = 10
    draw_data = \
        [[table_2['code'].to_list()[i],format('%.2f'%table_2['toimie'].to_list()[i])]
         for i in range(code_num) ]
    draw_data_1 = \
        [[table_2['code'].to_list()[i],format('%.2f'%table_2['toiqie'].to_list()[i])]
         for i in range(code_num)]
    init_opts_pie = opts.InitOpts(width="640px", height="480px")
    plt = (
        Pie(init_opts=init_opts_pie)
            .set_global_opts(title_opts=opts.TitleOpts(title=title,  # 标题
                                                       subtitle = subtitle, # 副标题
                                                       pos_left='center', pos_bottom='0',  # 标题位置
                                                       title_textstyle_opts={
                 # 主标题
                                                           'fontSize': 16.5,               # 字体大小
                                                           "fontWeight": "bolder",         # 字体:加粗
                                                           "color": "#444444"              # 字体颜色
                                                       },
                                                       subtitle_textstyle_opts={
              # 负标题
                                                           'fontSize': 16.5,
                                                            "fontWeight": "bolder",
                                                            "color": "#000000"
                                                       }
                                                       ),
                             yaxis_opts=opts.AxisOpts(  # Y轴设置

                             ),
                             xaxis_opts=opts.AxisOpts(  # X轴设置
                                 # type_="category"     # 行坐标类型
                             ),
                             legend_opts=opts.LegendOpts(type_='scroll',  # 图例
                                                         orient='vertical',  # 图例列表的布局朝向
                                                         pos_left="left", pos_top='center'  # 图例位置
                                                         ),
                             tooltip_opts=opts.TooltipOpts(trigger='axis'),
                             toolbox_opts=opts.ToolboxOpts(),  # 工具栏
                             # datazoom_opts=opts.DataZoomOpts(),  # 缩放功能

                             )
            .set_series_opts(label_opts=opts.LabelOpts(# is_show=False,       # 是否显示数值
            position="right",  # 设置字体对齐
        ))
            .add(
            "对****投资总额",
            draw_data_1,
            radius=["15%", "30%"],
            # center=["25%", "50%"],    # 中心点位置
            # rosetype="radius",
            label_opts=opts.LabelOpts(is_show=True,formatter="{b}: {c}",font_weight='bolder',),
        )
            .add(
            "****投资总额",
            draw_data,
            radius=["65%", "80%"],
            # center=["75%", "50%"],
            # rosetype="area",
            label_opts=opts.LabelOpts(is_show=True,formatter="{b}: {c}",font_weight='bolder',),
        )
    )
    grid_1 = Grid(init_opts=init_opts_pie)
    grid_1.add(plt, grid_opts=opts.GridOpts(pos_top='5'))  # 仅使用pos_top修改相对顶部的位置
    grid_1.render(path_html_1)


    # html转图片
    from pyecharts.render import make_snapshot
    from snapshot_selenium import snapshot
    make_snapshot(snapshot, grid.render(path_html), path_png,delay=2,pixel_ratio=2)
    make_snapshot(snapshot, grid_1.render(path_html_1), path_png_1,delay=2,pixel_ratio=2)

数据/图片插入模板中生成word文件

这里是将生成好的数据与图片插入到word中,使用的是jinja2,docxtpl,docx这3个包
代码如下

    import jinja2
    from jinja2.utils import Markup
    from docxtpl import DocxTemplate
    from docxtpl import InlineImage
    from docx.shared import Mm, Inches, Pt

    tpl=DocxTemplate(r'./source/from/1.小册子-总体情况.docx')


        # 20191129针对缺失值修改为 '-' 显示
    table_1 = \
         [{
   'city': row.city,
           'mie':'-' if format('%.0f' %row.mie) == 'nan' else format('%.0f' %row.mie),
           'toimie': '-' if format('%.2f' % row.toimie) == 'nan' else format('%.2f' % row.toimie),
           'qie': '-' if format('%.0f' % row.qie) == 'nan' else format('%.0f' % row.qie),
           'toiqie': '-' if format('%.2f' % row.toiqie) == 'nan' else format('%.2f' % row.toiqie)}
          for index, row in table_1.iterrows()]
     table_2 = \
         [{
   'code': row.code,
           'mie': '-' if format('%.0f' % row.mie) == 'nan' else format('%.0f' % row.mie),
           'toimie': '-' if format('%.2f' % row.toimie) == 'nan' else format('%.2f' % row.toimie),
           'qie': '-' if format('%.0f' % row.qie) == 'nan' else format('%.0f' % row.qie),
           'toiqie': '-' if format('%.2f' % row.toiqie) == 'nan' else format('%.2f' % row.toiqie), }
          for index, row in table_2.iterrows()]

    context={
   
        'year':year,
        'quarter':quarter,
        'qe':qe,
        'mie':mie,
        'toimie':toimie,
        'me':me,
        'qie':qie,
        'toiqie':toiqie,
        'pic_1': InlineImage(tpl, path_png,width=Mm(125)),
        'pic_2': InlineImage(tpl, path_png_1, width=Mm(100)),
        'table_1':table_1,
        'table_2': table_2
              }
    jinja_env = jinja2.Environment(autoescape=True)
    tpl.render(context,jinja_env)
    tpl.save(r'./result/1.小册子-*****总体情况.docx')

效果图
在这里插入图片描述

20220719补充关于ImportError: cannot import name ‘soft_unicode’ from 'markupsafe’错误解决方案。

这个错误是由于markupsafe的包更新导致，所以暂时的解决方案是用pip install --upgrade markupsafe==2.0.1把markupsafe降级为2.0.1版本。但是降级后会引发另外一个错误，就是在执行

# python-docx
from docxtpl import DocxTemplate
tpl = DocxTemplate("文件名称.docx")
tpl.paragraphs

会反馈AttributeError: 'NoneType' object has no attribute 'paragraphs'这个错误，解决方法是添加tpl.get_docx()来触发读取文件操作。
触发前：
在这里插入图片描述
触发后:

这时候就可以读取到文件了。

20230413补充docxtpl读取段落表格等内容

import pandas as pd
from docxtpl import DocxTemplate
tpl = DocxTemplate("112.docx")
tpl.get_docx()
# 读取段落文字
for i in tpl.paragraphs:
    print(i.text)
# 读取表格
def getDocxTableToDF(table):
    '''
    将docx的table转化为df
    :param table: docx.table.Table object
    :return: df
    '''
    total = []
    for row in table.rows:
        r_list = []
        for col in row.cells:
            r_list.append(col.text)
            # print(col.text)
        total.append(r_list)
    total = pd.DataFrame().from_records(total)
    return total
    
for table in tpl.tables:
    getDocxTableToDF(table)

基于docx将html页面内容转化为word文档

参考文章：https://stackoverflow.com/questions/55041766/how-to-add-waltchunk-and-its-relationship-with-python-docx

from docx.opc.constants import RELATIONSHIP_TYPE as RT
from docx.opc.part import Part
from docx.oxml import OxmlElement
from docx.oxml.ns import qn
from docx import Document

from lxml.etree import tostring
from lxml import html
import requests

def add_alt_chunk(doc: Document, html: str):
    package = doc.part.package
    partname = package.next_partname('/word/altChunk%d.html')
    alt_part = Part(partname, 'text/html', html.encode(), package)
    r_id = doc.part.relate_to(alt_part, RT.A_F_CHUNK)
    alt_chunk = OxmlElement('w:altChunk')
    alt_chunk.set(qn('r:id'), r_id)
    doc.element.body.sectPr.addprevious(alt_chunk)

if __name__ == "__main__":
	# 爬取指定的html页面内容下来，使用xpath来截取目标页面内容
    headers = {
   
        'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.75 Safari/537.36',
    }
    response = requests.get(link, headers=headers)
    tree = html.fromstring(response.text)
    article= tree.xpath('''/html/body/div[@class="wrap"]//article[@class="articleCon"]''')[0]  # 正文内容
	
	# 创建一个doc来存放该待处理的内容
	doc = Document()
	article_str = tostring(article, encoding="utf8")  # 转回字符串格式
	add_alt_chunk(doc, article_str.decode("utf8"))  
	doc.save("111.docx")

页面原型
在这里插入图片描述
转化后

但是，上述的docx直接使用docxtpl读取是无法读取的。读取不出来段落等信息。所以需要下面的方法处理一下，才能读取出段落信息。

基于comtypes将pdf/word转化为word文档-仅window系统可以使用

参考文章： https://pythonhosted.org/comtypes/
参考文章： https://github.com/enthought/comtypes
参考文章：https://stackoverflow.com/questions/6011115/doc-to-pdf-using-python
使用pip install comtypes安装相关包。

import os
import comtypes.client
word = comtypes.client.CreateObject('Word.Application')  # 打开文件使用的模式
wdFormatPDF = 17  # PDF格式匹配
PDFFormatwd = 16  # word格式匹配
source_docx = "111.docx"
source_abspath = os.path.abspath(os.path.join(file_path, source_docx))
doc = word.Documents.Open(source_abspath)

target_pdf = '112.pdf'
target_docx = '112.docx'
doc.SaveAs(os.path.join(file_path, target_pdf), FileFormat=wdFormatPDF)
doc.SaveAs(os.path.join(file_path, target_docx), FileFormat=PDFFormatwd)

doc.