I want to extract data from a word document with extension docx. This document contains a table. I want to fetch the data from each column and row of the table.
then I would like to process the data and insert it into an Excel file under their respective fields.
Can anyone please guide me how to do this in python.
I am using python3 on windows 7. (Might also want to run this code on windows sever 2003).
Any help will be much appreciated.
Thanks
解决方案
Try something like:
import win32com.client as w32c
Word = w32c.Dispatch("Word.Application")
Word.Visible=1
doc=Word.Documents.Open("C:\\docx_with_a_table.docx")
tables=doc.Tables
for t_cnt in range(tables.Count):
table=tables[t_cnt]
for r_cnt in range(table.Rows.Count):
row=table.Rows[r_cnt]
for c_cnt in range(row.Cells.Count):
cell=row.Cells[c_cnt]
print(cell.Range.Text)
ALT+F11 and F2 on a Word doc will show VBA objects... In Perl the above procedure is better documented.
Reading and writing to Excel is well supported by Python3's packages xlrd3 and xlwt3