（转）Python:正则表达式找出网页上所有链接

最新推荐文章于 2020-11-22 16:24:46 发布

weixin_30507269

最新推荐文章于 2020-11-22 16:24:46 发布

阅读量217

点赞数

文章标签： python

原文链接：http://www.cnblogs.com/youthdream/p/3527787.html

版权

转自：http://www.linuxany.com/archives/596.html

import re
import urllib
def test(html,rex):
    alist = []
    r = re.compile(rex)
    matchs = r.findall(html)
    if matchs != None:
        for found in matchs:
            if found not in alist:
                alist.append(found)         
    return alist
             
rex = r'<a\s*href=\"(.*?)\"'
page=urllib.urlopen('http://hi.baidu.com')
html=page.read()
page.close()
 
print test(html,rex)

转载于:https://www.cnblogs.com/youthdream/p/3527787.html

确定要放弃本次机会？

福利倒计时

: :

立减 ¥

普通VIP年卡可用

立即使用

weixin_30507269

关注关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
（转）Python:正则表达式找出网页上所有链接

转自：http://www.linuxany.com/archives/596.htmlimport reimport urllibdef test(html,rex): alist = [] r = re.compile(rex) matchs = r.findall(html) if matchs != None: ...
复制链接

扫一扫