爬取百度百科

最新推荐文章于 2021-11-22 00:47:36 发布

weixin_30656145

最新推荐文章于 2021-11-22 00:47:36 发布

阅读量311

点赞数

文章标签： python

原文链接：http://www.cnblogs.com/themost/p/6701757.html

版权

 1 import urllib.request
 2 from bs4 import BeautifulSoup
 3 import re
 4 
 5 def main():
 6     response= urllib.request.urlopen('http://baike.baidu.com/view/284853.htm').read()
 7     soup = BeautifulSoup(response,'html.parser')#使用python默认的解析器
 8     for each in soup.find_all(href = re.compile('view')):
 9         print(each.text,'->',''.join(['http://baike.baidu.com/',each['href']]))#join函数明显比+提高
10 if __name__=='__main__':
11     main()

转载于:https://www.cnblogs.com/themost/p/6701757.html

确定要放弃本次机会？

福利倒计时

: :

立减 ¥

普通VIP年卡可用

立即使用

weixin_30656145

关注关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
爬取百度百科

1 import urllib.request 2 from bs4 import BeautifulSoup 3 import re 4 5 def main(): 6 response= urllib.request.urlopen('http://baike.baidu.com/view/284853.htm').read() 7 soup =...
复制链接

扫一扫