python-docx中文开发文档_使用pythondocx检索具有文档结构的文档内容

python-docx还没有对此提供API支持;有趣的是,Microsoft Word API也没有。在

但是你可以用下面的代码。请注意,它有点脆弱,因为它使用了python-docx内部构件,这些内部构件可能会发生变化,但我预计它在可预见的未来会很好地工作:#!/usr/bin/env python

# encoding: utf-8

"""

Testing iter_block_items()

"""

from __future__ import (

absolute_import, division, print_function, unicode_literals

)

from docx import Document

from docx.document import Document as _Document

from docx.oxml.text.paragraph import CT_P

from docx.oxml.table import CT_Tbl

from docx.table import _Cell, Table

from docx.text.paragraph import Paragraph

def iter_block_items(parent):

"""

Generate a reference to each paragraph and table child within *parent*,

in document order. Each returned value is an instance of either Table or

Paragraph. *parent* would most commonly be a reference to a main

Document object, but also works for a _Cell object, which itself can

contain paragraphs and tables.

"""

if isinstance(parent, _Document):

parent_elm = parent.element.body

# print(parent_elm.xml)

elif isinstance(parent, _Cell):

parent_elm = parent._tc

else:

raise ValueError("something's not right")

for child in parent_elm.iterchildren():

if isinstance(child, CT_P):

yield Paragraph(child, parent)

elif isinstance(child, CT_Tbl):

yield Table(child, parent)

document = Document('test.docx')

for block in iter_block_items(document):

print('found one')

print(block.text if isinstance(block, Paragraph) else '

  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值