1.安装python-docx库
cmd窗⼝输⼊pip install python-docx按下回⻋等待安装完成
2.word文档结构说明
文档--document
段落--paragraph
内容--run
run对象切分准则:
中文:以标点符号进行run的构建或切分
英文:以文本样式进行run的构建或切分
每个document包含多个paragraph,
每个paragraph有多个 run
,每个run包含有(text⽂本,font字体,color颜⾊,字号)
每个document包含多个tables,每个table中有多个rows,每个 row包含多个cells,
每个cell中包含多个paragraph。
对于写 word表格不论是 head 还是paragraph 基本操作都是先添 加对象,然后再添加run就好了 word表格的结构包含head标题、normal 正⽂、Caption表
3.基本操作
3.1 document = Documen打开文档
3.2 document.add_heading('总标题',0) document.add_heading('⼀级标题',1) document.add_heading('⼆级标题',2) 加入不同的标题
3.3 添加文本
paragraph = document.add_paragraph('⽂本内容')
3.4 设置字号
run = paragraph.add_run('设置字号、') run.font.size = Pt(32)
3.5 设置字体
run = paragraph.add_run('设置中⽂字体、')
run.font.name = '宋体' r = run._element
r.rPr.rFonts.set(qn('w:eastAsia'), '宋体')
3.6设置斜体
run = paragraph.add_run('斜体、') run.italic = True
3.7设置粗体
paragraph.add_run('粗体').bold = True
3.8 追加内容,只能追加到行尾
doc = docx.Document() p1 = doc.add_paragraph(text="xxxxxxxxxx") p2 = doc.add_paragraph(text="yyyyyyyyyyyy") p1.add_run(text='我是内容11111') p2.add_run(text='我是内容222') doc.save("D:/test2.docx")
4.读取word
----有时报错找不到文件
打开test.docx,在里面敲几个空格然后删掉,重新执行就不会报这个错了
docs = docx.Document('D:/test.docx')
print(docs)
<docx.document.Document object at 0x0000028782D571A0>
docs = docx.Document('D:/test.docx') paragraphs = docs.paragraphs for paragraph in paragraphs: print(paragraph.text) runs = paragraphs[3].runs for run in runs: print(run.text,sep="XXXXXX")
总标题
一级标题
二级标题
⽂本内容设置字号、 粗体设置中⽂字体、 斜体、 粗体
print(paragraphs[3].runs)
[<docx.text.run.Run object at 0x0000028196CDAA50>, <docx.text.run.Run object at 0x0000028196CDAB70>, xxx]
不考虑样式,只获取文本可以用函数:
def gettext(filename): docs = docx.Document(filename) paragraphs = docs.paragraphs full = [] for para in paragraphs: full.append(para.text) return '\n'.join(full) print(gettext('D:/test.docx'))
5. 综合测试,写入word文档:
from docx import Document from docx.shared import Pt from docx.shared import Inches from docx.oxml.ns import qn document = Document() document.add_heading(u'MS WORD写⼊测试', 0) document.add_heading('⼀级标题', 1) document.add_heading('⼆级标题', 2) paragraph = document.add_paragraph('我们在做⽂本测试!') run = paragraph.add_run('设置字号、') run.font.size = Pt(34) run = paragraph.add_run('设置中⽂字体、') run.font.name = '宋体' r = run._element r.rPr.rFonts.set(qn('w:eastAsia'), '宋体') run = paragraph.add_run('斜体、') run.italic = True paragraph.add_run('粗体').bold = True document.add_paragraph('Intense quote', style='Intense Quote') document.add_paragraph( '⽆序列表元素1', style='List Bullet' ) document.add_paragraph( '⽆序列表元素3', style='List Bullet' ) document.add_paragraph( '有序列表元素1', style='List Number' ) document.add_paragraph( '有序列表元素3', style='List Number' ) document.add_picture('abc.jpeg', width=Inches(1.35)) table = document.add_table(rows=1, cols=3) hdr_cells = table.rows[0].cells hdr_cells[0].text = 'Name' hdr_cells[1].text = 'Id' hdr_cells[2].text = 'Desc' for i in range(3): row_cells = table.add_row().cells row_cells[0].text = 'test' + str(i) row_cells[1].text = str(i) row_cells[2].text = 'desc' + str(i) document.add_page_break() document.save('测试.docx')
文档效果: