问题描述
python中用docx库读取word文件,若word文件中包含合并的表格表格
则通过docx读取显示:
file = docx.Document(path)
for table in file.tables:
for row in table.rows:
for cell in row.cells:
print(cell.text)
结果为:
1-1
1-1
1-3
1-4
2-1
2-2
2-3
2-4
3-1
3-1
3-3
2-4
3-1
3-1
4-3
4-3
合并的单元格会重复显示,如1-1会显示两次;
如果在循环中改变cell.text内容,则保存后会重复显示
for table in file.tables:
for row in table.rows:
for cell in row.cells:
cell.text = cell.text + 'test'
file.save(path2)
解决方案
打印cell发现合并的单元格虽然重复但公用内存地址:
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
所以,可先判断cell是否重复再修改cell.text,
代码:
cell_set = []
for table in file.tables:
for row in table.rows:
for cell in row.cells:
if cell not in cell_set:
cell_set.append(cell)
cell.text = cell.text + 'test'
执行结果: