利用 python-docx 处理word文档

最新推荐文章于 2024-07-20 09:55:03 发布

ccczxacxzxcz

最新推荐文章于 2024-07-20 09:55:03 发布

阅读量386

点赞数

分类专栏： python

本文链接：https://blog.csdn.net/ruohua3kou/article/details/89809977

版权

python 专栏收录该内容

4 篇文章 0 订阅

订阅专栏

最近背单词时整理了一份词根词缀，用word记录的，但是记的很紊乱，于是想进行一次排序。搜索发现有个叫 python-docx 的轮子，粗略的翻了下文档，用了几个基本的api完成了这次的需求。
倒是因为python3的sort和cmp的变化折腾了很久… python3的sort中的cmp参数删除了，如果要用旧版cmp自定义排序的话要利用functools 库的 cmp_to_key。

节选了其中一部分，作为参考格式，按照前缀- 后缀- 其他词根的顺序来排序：

ab- = away, off; intensive	远离；脱离；加强语气
ad- (as-) (al-) = to, towards, intensive	朝着；靠近；加强语气
co- (com-) = together; intensive	一起；全；加强语气
de- = not; removal; down	表否定；除去；向下
-able = 构成adj；能…的，易…的
-age = 构成名词；表状态和行为等
-fy = 构成动词；使…
-ic = 构成形容词
-ure = 构成抽象名词
bev = to drink	喝
bulg = leather bag	皮包
ced = cess = to go	走，去
cis = to cut, to kill	砍，杀
concil = meeting	会议
crit = to judge	评论，判断

代码实现：

from docx import Document
from functools import cmp_to_key


# 排序方法
def cmp(a, b):
    a = a.split()[0]
    b = b.split()[0]
    # 前缀
    if a[-1] == '-' and b[-1] != '-':
        return -1
    if a[-1] != '-' and b[-1] == '-':
        return 1
    # 后缀
    if a[0] == '-' and b[0] != '-':
        return -1
    if a[0] != '-' and b[0] == '-':
        return 1
    # 其他词根
    if a < b:
        return -1
    if a > b:
        return 1
    return 0


src = "C:\\Users\\11018\\Desktop\\词根词缀.docx"
file = Document(src)
strList = []

for i in file.paragraphs:
    strList.append(i.text)

strList.sort(key=cmp_to_key(cmp))

for i, x in enumerate(strList):
    print(x)
    file.paragraphs[i].text = x

file.save(src)