python爬取网页上的超链接

最新推荐文章于 2024-04-14 23:15:17 发布

z小白

最新推荐文章于 2024-04-14 23:15:17 发布

阅读量1.8w

点赞数 16

分类专栏： python 爬虫文章标签： python 爬虫 BeautifulSoup bs4

本文链接：https://blog.csdn.net/zzc15806/article/details/85341923

版权

使用BeautifulSoup解析网页，爬取并筛选出博客链接，将结果存入txt文件。

摘要由CSDN通过智能技术生成

用bs4中的BeautifulSoup解析网页

from urllib.request import urlopen
from bs4 import BeautifulSoup

html = urlopen('https://blog.csdn.net/zzc15806/') #获取网页
bs = BeautifulSoup(html, 'html.parser') #解析网页
hyperlink = bs.find_all('a')  #获取所有超链接
for h in hyperlink:
    hh = h.get('href')
    print(hh)

结果如下：

https://blog.csdn.net/zzc15806
javascript:void(0);
https://blog.csdn.net/zzc15806?orderby=UpdateTime
https://blog.csdn.net/zzc15806?orderby=ViewCount
https://blog.csdn.net/zzc15806/rss/list
https://blog.csdn.net/yoyo_liyy/article/details/82762601
https://blog.csdn.net/yoyo_liyy/article/details/82762601
https://blog.csdn.net/zzc15806/article/details/84996039
https://blog.csdn.net/zzc15806/article/details/84996039
https://blog.csdn.net/zzc15806/article/details/84975709
https://blog.csdn.net/zzc15806/article/details/84975709
https://blog.csdn.net/zzc15806/article/details/84975539
https://blog.csdn.net/zzc15806/article/details/84975539
https://blog.csdn.net/zzc15806/article/details/84975137
https://blog.csdn.net/zzc15806/article/details/84975137
https://blog.csdn.net/zzc15806/article/details/84974458
https://blog.csdn.net/zzc15806/article/details/84974458
https://blog.csdn.net/zzc15806/article/details/84973370
https://blog.csdn.net/zzc15806/article/details/84973370
https://blog.csdn.net/zzc15806/article/details/84972108
https://blog.csdn.net/zzc15806/article/details/84972108
https://blog.csdn.net/zzc15806/article/details/84971215
https://blog.csdn.net/zzc15806/article/details/84971215
https://blog.csdn.net/zzc15806/article/details/84875070
https://blog.csdn.net/zzc15806/article/details/84875070
https://blog.csdn.net/zzc15806/article/details/84779131
https://blog.csdn.net/zzc15

最低0.47元/天解锁文章

z小白

关注

16
点赞
踩
54

收藏

觉得还不错? 一键收藏
6
评论
python爬取网页上的超链接

用bs4中的BeautifulSoup解析网页from urllib.request import urlopenfrom bs4 import BeautifulSouphtml = urlopen('https://blog.csdn.net/zzc15806/') #获取网页bs = BeautifulSoup(html, 'html.parser') #解析网页hyperl...
复制链接

扫一扫

专栏目录