python读取word中后缀名docx的文件的表格

最新推荐文章于 2024-01-25 14:17:25 发布

还是那个同伟伟

最新推荐文章于 2024-01-25 14:17:25 发布

阅读量734

点赞数

分类专栏： Python进阶文章标签： python word 表格解析word python-docx

本文链接：https://blog.csdn.net/wei18791957243/article/details/120174346

版权

Python进阶专栏收录该内容

80 篇文章 24 订阅

订阅专栏

1.安装所需要的解析包

pip install python-docx

pip install docx

2.使用代码例子来进行演示用法

解析出word中的如下表格：

def parse_docx(file):
    word_docx = docx.Document(file)
    table = word_docx.tables[0]    # 读取word中第一个表格
    type_list = []
    for i in range(2, len(table.rows)):    # 从第三行开始读取
        purpose = table.cell(i, 2).text  # 用途     取一行当中的第二个
        total = table.cell(i, 3).text  # 套数   取一行当中的第三个
        area = table.cell(i, 4).text  # 面积    取一行当中的第四个
        type_list.append({
            "buildingType": purpose,
            "total": total,  # 套数
            "buildingArea": area + "平方米",
        })
    documentNumber = table.cell(2, 0).text  # 预售证号    # 取第二行的第一个
    address = table.cell(2, 1).text  # 坐落    # 取第二行的第二个
    pro_info = {
        "documentNumber": documentNumber,
        "address": address,
        "type": type_list
    }
    return pro_info
'''
注意：前两列有合并的，下面解析出来的也是一样的数据
打印结果
{'address': '江山市贺村镇贺溪路与中心南街交汇处1-6号、11-17号、22-26号及10、30、33、34、35、37幢',
 'documentNumber': '江房售许字（2021）第ZJ00059号',
 'projectName': ('东旺贺悦小区1-6号、11-17号、22-26号及10、30、33、34、35、37幢',),
 'type': [{'buildingArea': '18502.16平方米',
           'buildingType': '成套住宅',
           'total': '208'},
          {'buildingArea': '838.06平方米', 'buildingType': '商业', 'total': '18'},
          {'buildingArea': '3694.70平方米', 'buildingType': '住宅', 'total': '18'},
          {'buildingArea': '平方米', 'buildingType': '', 'total': ''}]}
'''

还是那个同伟伟

关注

0
点赞
踩
4

收藏

觉得还不错? 一键收藏
打赏
0
评论
python读取word中后缀名docx的文件的表格

1.安装所需要的解析包pip install python-docxpip install docx2.使用代码例子来进行演示用法解析出word中的如下表格：def parse_docx(file): word_docx = docx.Document(file) table = word_docx.tables[0] # 读取word中第一个表格 type_list = [] for i in range(2, len(table..
复制链接

扫一扫