利用正则表达式提取txt文本中的网址

最新推荐文章于 2024-05-13 04:52:23 发布

ayesawyer

最新推荐文章于 2024-05-13 04:52:23 发布

阅读量3.9k

点赞数 2

分类专栏： python相关

本文链接：https://blog.csdn.net/m0_38071863/article/details/102893374

版权

python相关专栏收录该内容

4 篇文章 0 订阅

订阅专栏

批量读取某一个文件夹下的txt文件，并且找出其中的网址。

    import re  
    import os   
    path=r'G:\python code\zhengze'#这个文件目录下存储需要提取的txt文件  
    file_path=[]  
    for filename in os.listdir(path):     #获取path下所有文件的路径  
        file_path.append((os.path.join(path,filename)))  
    print (file_path)  
    #对每个文本调用正则函数进行筛选,筛选过后的数据存入数组final  
    for adress in file_path:  
        file_object=open(adress,'rb')  
        lines = file_object.readlines( )     #将文本中的内容以数组的形式（每行为一个元素）赋给lines 
        file_object.close()
        new_lines=[] 
        for x in lines: 
            new_x = x.split( )     #注意：splitlines是将传入的字符串去除'\n'之后以数组的形式传出，而不是字符串形式 
            for i in new_x:
                try:      #使用try是因为抓取的txt文件会出现utf-8不能解析的字符，所以需要跳过
                    data=i;
                    data = data.decode('utf-8')
                    match_obj=re.search(r"https://.*",data)  
                    if match_obj:  
                        new_lines.append(match_obj.group())  
                except:
                    continue
    final=new_lines
    file_2=open(r'G:\python code\wangzhi1.txt','w+')#新建一个txt文本来存储提取出的网址  
    for x in final:
        for k in range(len(x)):
            if (x[k]=="\""):#网址两端可能会存在",需要将它处理掉。
                x=x[:k]
                break;
        file_2.write(x)  
        file_2.write('\n')  
    file_2.close()

ayesawyer

关注

2
点赞
踩
6

收藏

觉得还不错? 一键收藏
0
评论
利用正则表达式提取txt文本中的网址

批量读取某一个文件夹下的txt文件，并且找出其中的网址。 import reimport ospath=r'G:\python code\zhengze'#这个文件目录下存储需要提取的txt文件file_path=[]for filename in os.listdir(path):#获取path下所有文件的路径...
复制链接

扫一扫