[python每日一练]--0009:找出html里的链接

最新推荐文章于 2022-03-29 19:35:21 发布

saya_wj

最新推荐文章于 2022-03-29 19:35:21 发布

阅读量420

点赞数

分类专栏： python 文章标签： python html python每日一练

本文链接：https://blog.csdn.net/saya_wj/article/details/78315482

版权

python 专栏收录该内容

16 篇文章 0 订阅

订阅专栏

题目链接：https://github.com/Show-Me-the-Code/show-me-the-code
我的github链接：https://github.com/wjsaya/python_spider_learn/tree/master/python_daily
个人博客地址：https://wjsaya.github.io
第 0009 题：一个HTML文件，找出里面的链接。

思路：

打开html文件；
逐行读取文件;
通过正则表达式匹配http://之类的开头的链接即可。

代码：

#!/usr/bin/env python3
#coding: utf-8
#Auther: wjsaya
#第009题，一个HTML文件，找出里面的链接。
import re
import os


def analyze(file_name):
    #print (os.listdir())
    print (os.getcwd())
    line = open(file_name,'r',encoding='utf-8').read()
    R = (r'([hftps]+://[^\s]*)"')
    for i in  (re.findall(R, line)):
        print (i)
if __name__ == "__main__": 
    html = "./test.html"
    analyze(html)

效果图：

确定要放弃本次机会？

福利倒计时

: :

立减 ¥

普通VIP年卡可用

立即使用

saya_wj

关注关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
[python每日一练]--0009:找出html里的链接

题目链接：https://github.com/Show-Me-the-Code/show-me-the-code 我的github链接：https://github.com/wjsaya/python_spider_learn/tree/master/python_daily 个人博客地址：https://wjsaya.github.io 第 0009 题：一个HTML文件，找出
复制链接

扫一扫