利用python中的正则表达式实现文档自动填充

最新推荐文章于 2023-01-01 21:06:23 发布

scp_2032

最新推荐文章于 2023-01-01 21:06:23 发布

阅读量450

点赞数

分类专栏： python 文章标签： pycharm 正则表达式 python 字符串

本文链接：https://blog.csdn.net/scp_2032/article/details/106784003

版权

python 专栏收录该内容

15 篇文章 0 订阅

订阅专栏

先上代码和操作文档截图

import regex as re

text=open(r'G:\anaconda\Script\植物学试卷.txt',encoding='utf-8')
text=text.read()
list1=['壹','贰','叁','肆','伍','鎏','柒','捌','玖','拾','拾壹','拾贰','拾叁','拾肆','拾伍']
def gettext():
    for i in range(len(list1)-1):
        first=list1[i]
        end=list1[i+1]
        pattern='%s.*?%s'%(first,end)
        text1=re.findall(pattern,text,re.S)
        text2=open('%s.txt'%(first),'w',encoding='utf-8')
        text2.write(text1[0])
def pick():
    for i in range(len(list1)-1):
        name=list1[i]
        answer=list1[i+1]
        text3=open('%s.txt'%name,encoding='utf-8')
        text3=text3.read()
        pattern1='填空题.*?选择题'
        pattern4='名词解释.*?问答题'
        pattern5='问答题.*?%s'%answer
        pattern7='_+'
        pattern10='问答题'
        pattern11='答案'
        pattern13='[\u4e00-\u9fa5]+'
        try:
            #111
            pattern6='答案.*?%s'%answer
            pattern12='%s.*?答案'%name
            g=re.findall(pattern12,text3,re.S)
            g=g[0]#获取题目
            a=re.findall(pattern1,g,re.S)
            a=a[0]
            a=re.findall(pattern7,a)#填空题
            d=re.findall(pattern10,g,re.S)#名词解释
            e=re.findall(pattern11,g,re.S)#问答题
            #222
            f=re.findall(pattern6,text3,re.S)#答案
            f=f[0]#获取答案
            #33
            a2=re.findall(pattern1,f,re.S)
            a3=a2[0]
            a4=re.findall(pattern13,a3)
            a5=a4[1:]#填空题
            d2=re.findall(pattern4,f,re.S)#名词解释
            e2=re.findall(pattern5,f,re.S)#问答题
        except:
             continue
        with open('%s.txt'%name, 'r+', encoding='utf-8') as f1:
            f1.seek(0)
            f1.truncate()
            f1.write(g)
            f1.close()
        if len(a)<len(a5):
            for i in range(len(a)):
                with open('%s.txt'%name,'r+', encoding='utf-8') as f1:
                    text=f1.read()
                    f1.seek(0)
                    f1.truncate()
                    result1=re.sub(a[i],a5[i],text,1)
                    f1.write(result1)
                    f1.close()
        else:
            for i in range(len(a5)):
                with open('%s.txt'%name,'r+', encoding='utf-8') as f1:
                    text=f1.read()
                    f1.seek(0)
                    f1.truncate()
                    result1=re.sub(a[i],a5[i],text,1)
                    f1.write(result1)
                    f1.close()
        if len(d)<len(d2):
            for i in range(len(d)):
                with open('%s.txt' % name, 'r+', encoding='utf-8') as f1:
                    text = f1.read()
                    f1.seek(0)
                    f1.truncate()
                    result3 = re.sub(d[i], d2[i], text, 1)
                    f1.write(result3)
                    f1.close()
        else:
            for i in range(len(d2)):
                with open('%s.txt' % name, 'r+', encoding='utf-8') as f1:
                    text = f1.read()
                    f1.seek(0)
                    f1.truncate()
                    result3 = re.sub(d[i], d2[i], text, 1)
                    f1.write(result3)
                    f1.close()
        for i in range(len(e)):
            with open('%s.txt' % name, 'r+', encoding='utf-8') as f1:
                text = f1.read()
                f1.seek(0)
                f1.truncate()
                result4 = re.sub(e[i], e2[i], text, 1)
                f1.write(result4)
                f1.close()
if __name__ == '__main__':
    gettext()
    pick()

该代码主要分为两部分，第一个gettext函数是将每一章的内容从整体文档中提取出来，第二个pick函数主要是将题目中的空格和答案相对应，然后利用re中的sub函数进行替换操作。

在这里插入图片描述

操作过程中陆续碰到的一些问题

1.文档编码问题

text=open(r'G:\anaconda\Script\植物学试卷.txt',encoding='utf-8')
text=text.read()

在这里如果没有后面的encoding=‘utf-8’，则会报错说gbk在某某位置无法正常编码

UnicodeDecodeError: 'gbk' codec can't decode byte 0xba in position 8: illegal multibyte sequence

另外要记得在后续写入文件中也要统一是utf-8进行解码的，不然也会报错

2.将每一章的内容提取出来

这里用的是贪婪模式

pattern='%s.*?%s'%(first,end)
        text1=re.findall(pattern,text,re.S)

即指定开头和结尾，然后用.*?匹配中间的任意字符
记得末尾的re.S!!!
不然会发现无法匹配下一行的字符的。在这里插入图片描述
另外在写入文件时还会遇到一个问题就是程序会报错是格式错误，是因为re.findall 匹配出来的结果是一个列表，提取文本时要记得转换。

3.特殊一点的正则表达式

匹配重复的字符

pattern7='_+'

将会匹配形如‘______’的字符串，在文档中的填空题中有相应的空
匹配连续的汉字

pattern13='[\u4e00-\u9fa5]+'

在这里插入图片描述

4.实现原文件修改内容，在递归循环中更新文档内容

        with open('%s.txt'%name, 'r+', encoding='utf-8') as f1:
            f1.seek(0)
            f1.truncate()
            f1.write(g)
            f1.close()
        if len(a)<len(a5):
            for i in range(len(a)):
                with open('%s.txt'%name,'r+', encoding='utf-8') as f1:
                    text=f1.read()
                    f1.seek(0)
                    f1.truncate()
                    result1=re.sub(a[i],a5[i],text,1)
                    f1.write(result1)
                    f1.close()

f1.seek(0)
f1.truncate()

定位到文档的开头，然后删除文档的全部内容，再重新写入，就可以实现文档在循环中自动更新

scp_2032

关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
利用python中的正则表达式实现文档自动填充

先上代码和操作文档截图import regex as retext=open(r'G:\anaconda\Script\植物学试卷.txt',encoding='utf-8')text=text.read()list1=['壹','贰','叁','肆','伍','鎏','柒','捌','玖','拾','拾壹','拾贰','拾叁','拾肆','拾伍']def gettext(): for i in range(len(list1)-1): first=list1[i]
复制链接

扫一扫