python-docx是一个用于创建和更新Microsoft Word(.docx)文件的Python库。
github:https://github.com/python-openxml/python-docx
python-docx documentatioin:https://python-docx.readthedocs.io/en/latest/
wang@wang:~$ git clone https://github.com/python-openxml/python-docx.git
安装lxml:
lxml包依赖其他包,先安装依赖包:
wang@wang:~$ sudo apt-get install libxml2-dev libxslt-dev python-dev
再安装lxml:
wang@wang:~$ sudo apt-get install python-lxml
如果直接执行下面的语句会报:error: Could not find suitable distribution for Requirement.parse(‘lxml>=2.3.2’)
wang@wang:~/python-docx$ python setup.py install
...
Installed /usr/local/lib/python2.7/dist-packages/python_docx-0.8.10-py2.7.egg
Processing dependencies for python-docx==0.8.10
Searching for lxml==3.5.0
Best match: lxml 3.5.0
可用如下方式查看lxml版本:
wang@wang:~$ python
Python 2.7.12 (default, Nov 12 2018, 14:36:49)
[GCC 5.4.0 20160609] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> from lxml import etree
>>> print etree.LXML_VERSION
(3, 5, 0, 0)
测试一下,用这个库写一个.docx文件:
wang@wang:~$ vim testpy.py
内容如下:
from docx import Document
from docx.shared import Inches
#打开一个基于默认“模板”的空白文档,几乎是您使用内置默认值在Word中启动新文档时获得的文档。
document = Document()
#添加标题,第一个参数为标题名,第二个为标题的级别
document.add_heading('Document Title', 0)
#添加段落
p = document.add_paragraph('A plain paragraph having some ') #段落内容
p.add_run('bold').bold = True #内容“blod”加粗
p.add_run(' and some ')
p.add_run('italic.').italic = True #内容“italic.”斜体
document.add_heading('Heading, level 1', level=1)
#应用段落样式
document.add_paragraph('Intense quote', style='Intense Quote')
document.add_paragraph(
'first item in unordered list', style='List Bullet'
)
document.add_paragraph(
'first item in ordered list', style='List Number'
)
#添加图片
document.add_picture('monty-truth.png', width=Inches(1.25))
records = (
(3, '101', 'Spam'),
(7, '422', 'Eggs'),
(4, '631', 'Spam, spam, eggs, and spam')
)
#添加表格1行3列
table = document.add_table(rows=1, cols=3)
hdr_cells = table.rows[0].cells#第一行
hdr_cells[0].text = 'Qty'#第一行第一个cell的内容
hdr_cells[1].text = 'Id'#第一行第二个cell的内容
hdr_cells[2].text = 'Desc'#第一行第三个cell的内容
for qty, id, desc in records: #循环插入cell并把records的内容写入
row_cells = table.add_row().cells
row_cells[0].text = str(qty)
row_cells[1].text = id
row_cells[2].text = desc
#添加分页
document.add_page_break()
#如果这里再有内容,即使上面没有满一页,也会写到下一页
document.save('demo.docx')
同时home目录下放上一张图片,并命名为monty-truth.png:
然后执行:
wang@wang:~$ python testpy.py
则会在home目录生成一个demo.docx文件,内容如下:
用这个库读一个.docx文件,就读上面生成的文件的一部分:
wang@wang:~$ vim read_testpy.py
内容如下:
# -*- coding:utf-8 -*-
import sys
import docx
path = sys.argv[1]
file = docx.Document(path)
for para in file.paragraphs:
print(para.text)
wang@wang:~$ python read_testpy.py demo.docx
Document Title
A plain paragraph having some bold and some italic.
Heading, level 1
Intense quote
first item in unordered list
first item in ordered list
参考文章:https://www.cnblogs.com/ontheway703/p/5266041.html