python word操作_使用python操作word

a4331ab30e46b30f6c79dc0abe70fc11.png0 1   正文开始 5dace468479cb0ff548f25d79f4d3827.png 8c60982ae6d4970adc9320dfb3e9541e.png 34e12f6aedc0b71c1540edd7fcd2c9eb.png

有两种方式:

  • 使用win32com

  • 使用docx

1.使用win32com扩展包

只对windows平台有效

代码:

# coding=utf-8import win32comfrom win32com.client import Dispatch, DispatchExword = Dispatch('Word.Application')  # 打开word应用程序# word = DispatchEx('Word.Application') #启动独立的进程word.Visible = 0  # 后台运行,不显示word.DisplayAlerts = 0  # 不警告path = 'G:/WorkSpace/Python/tmp/test.docx'  # word文件路径doc = word.Documents.Open(FileName=path, Encoding='gbk')# content = doc.Range(doc.Content.Start, doc.Content.End)# content = doc.Range()print '----------------'print '段落数: ', doc.Paragraphs.count# 利用下标遍历段落for i in range(len(doc.Paragraphs)):    para = doc.Paragraphs[i]    print para.Range.textprint '-------------------------'# 直接遍历段落for para in doc.paragraphs:    print para.Range.text    # print para  #只能用于文档内容全英文的情况doc.Close()  # 关闭word文档# word.Quit  #关闭word程序

2.使用docx扩展包

优点:不依赖操作系统,跨平台

安装:

pip install python-docx

    参考文档: https://python-docx.readthedocs.io/en/latest/index.html

代码:

import docxdef read_docx(file_name):    doc = docx.Document(file_name)    content = '\n'.join([para.text for para in doc.paragraphs])    return content

创建表格

# coding=utf-8import docxdoc = docx.Document()table = doc.add_table(rows=1, cols=3, style='Table Grid') #创建带边框的表格hdr_cells = table.rows[0].cells  # 获取第0行所有所有单元格hdr_cells[0].text = 'Name'hdr_cells[1].text = 'Id'hdr_cells[2].text = 'Desc'# 添加三行数据data_lines = 3for i in range(data_lines):    cells = table.add_row().cells    cells[0].text = 'Name%s' % i    cells[1].text = 'Id%s' % i    cells[2].text = 'Desc%s' % irows = 2cols = 4table = doc.add_table(rows=rows, cols=cols)val = 1for i in range(rows):    cells = table.rows[i].cells    for j in range(cols):        cells[j].text = str(val * 10)        val += 1doc.save('tmp.docx')

读取表格

# coding=utf-8import docxdoc = docx.Document('tmp.docx')for table in doc.tables:  # 遍历所有表格    print '----table------'    for row in table.rows:  # 遍历表格的所有行        # row_str = '\t'.join([cell.text for cell in row.cells])  # 一行数据        # print row_str        for cell in row.cells:            print cell.text, '\t',        print

相关样式参考: https://python-docx.readthedocs.io/en/latest/user/styles-understanding.html

Python-docx 读取word.docx内容:

安装python-docx:

pip install python_docx

(注意:不是pip install docx  ! docx也可以安装,但总是报错,缺少exceptions,无法导入)

接下来就可以用Python_docx 来读取word文本了。

代码如下:

import docxfrom docx import Documentpath = "C:\\Users\\Administrator\\Desktop\\word.docx"document = Document(path)for paragraph in document.paragraphs:    print(paragraph.text)

运行即可输出文本。 

我尝试用docx读取.doc文本

代码如下:

import osimport docxfor filename in os.listdir(os.getcwd()):    if filename.endswith('.doc'):        print(filename[:-4])        doc = docx.Document(filename[:-4]+".docx")        for para in doc.paragraphs:            print (para.text)

结果报错:docx.opc.exceptions.PackageNotFoundError: Package not found。还是无法识别doc

        因为“改变拓展名并没有改变其编码方式,因此无法读取文本内容,    需将doc文件另存为docx文件后再用python-docx读取其内容”

# Document 还有添加标题、分页、段落、图片、章节等方法,说明如下  |  add_heading(self, text='', level=1)  |      Return a heading paragraph newly added to the end of the document,  |      containing *text* and having its paragraph style determined by  |      *level*. If *level* is 0, the style is set to `Title`. If *level* is  |      1 (or omitted), `Heading 1` is used. Otherwise the style is set to  |      `Heading {level}`. Raises |ValueError| if *level* is outside the  |      range 0-9.  |    |  add_page_break(self)  |      Return a paragraph newly added to the end of the document and  |      containing only a page break.  |    |  add_paragraph(self, text='', style=None)  |      Return a paragraph newly added to the end of the document, populated  |      with *text* and having paragraph style *style*. *text* can contain  |      tab (``\t``) characters, which are converted to the appropriate XML  |      form for a tab. *text* can also include newline (``\n``) or carriage  |      return (``\r``) characters, each of which is converted to a line  |      break.  |    |  add_picture(self, image_path_or_stream, width=None, height=None)  |      Return a new picture shape added in its own paragraph at the end of  |      the document. The picture contains the image at  |      *image_path_or_stream*, scaled based on *width* and *height*. If  |      neither width nor height is specified, the picture appears at its  |      native size. If only one is specified, it is used to compute  |      a scaling factor that is then applied to the unspecified dimension,  |      preserving the aspect ratio of the image. The native size of the  |      picture is calculated using the dots-per-inch (dpi) value specified  |      in the image file, defaulting to 72 dpi if no value is specified, as  |      is often the case.  |    |  add_section(self, start_type=2)  |      Return a |Section| object representing a new section added at the end  |      of the document. The optional *start_type* argument must be a member  |      of the :ref:`WdSectionStart` enumeration, and defaults to  |      ``WD_SECTION.NEW_PAGE`` if not provided.  |    |  add_table(self, rows, cols, style=None)  |      Add a table having row and column counts of *rows* and *cols*  |      respectively and table style of *style*. *style* may be a paragraph  |      style object or a paragraph style name. If *style* is |None|, the  |      table inherits the default table style of the document.  |    |  save(self, path_or_stream)  |      Save this document to *path_or_stream*, which can be eit a path to  |      a filesystem location (a string) or a file-like object.

docx还有许多其它功能,还正在学习中,详见官方文档:https://python-docx.readthedocs.io/en/latest/user/quickstart.html

00423dc827f0e0af8f2e3ee63a3cf387.png 640 e5a9610277d0dc2d1b81956627c701fc.png公众号:迷鹿的部落阁扫码关注最新动态 120cb3b82669d4ab3c5505b210c9b539.png c405d0ed4c7cb46f48f1a293bd5c1636.png 42d043e59534aa5db7453ecb37b488d6.png 640— END—----- 640-----

如果有一天,你偶然看到了这些文字,我希望这几分钟是真正属于你自己的,在这里你给自己加油,打气,继续去完成你曾经的梦想,勇敢的去挑战自己,历练自己!

                                                                   ——致自己

5e8d6d45994ef006f6b4decb0d641a3d.png

点击阅读原文,一起玩耍

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值