Python_正则表达式匹配Word文档

最新推荐文章于 2024-06-26 19:29:40 发布

If I could Tell Yang

最新推荐文章于 2024-06-26 19:29:40 发布

阅读量2.7k

点赞数

分类专栏： python 文章标签： python 正则表达式

本文链接：https://blog.csdn.net/weixin_44177600/article/details/108353209

版权

python 专栏收录该内容

15 篇文章 0 订阅

订阅专栏

使用正则表达式匹配Word文档中的所有字符

1、原文内容如下：

在这里插入图片描述

2、期望得到的结果：

在这里插入图片描述

3、源码：

// 
from docx import Document
import re

pattern = re.compile("\d{6}[\u4e00-\u9fff]+") # 正则表达式
# pattern = re.compile("\d{6}[^A-Za-z0-9\!\%\[\]\,\。]+")
doc = Document('./地区码.docx') # 原文件
fo = open('id_area.txt', 'a') # 保存的文件，
for i in doc.paragraphs:# 读取所有的段落
    st = pattern.findall(i.text)# 按正则表达式匹配区号和地名
    for t in st: # 遍历分离的区号和地名
        # print(t)
        txt = t[:6] + ',' + t[6:] + '\n' # 在区号和地名间加上逗号和换行符。
        fo.writelines(txt)# 写入文件
fo.close()