![](https://img-blog.csdnimg.cn/20201014180756923.png?x-oss-process=image/resize,m_fixed,h_64,w_64)
python爬虫
dawen1937
这个作者很懒,什么都没留下…
展开
-
python爬虫(1.find和findAll函数提取文本)
from urllib.request import urlopenfrom bs4 import BeautifulSouphtml = urlopen("http://www.pythonscraping.com/pages/warandpeace.html")bsObj = BeautifulSoup(html)#根据css样式表查找nameList = bsObj.findAll(翻译 2016-12-29 16:11:45 · 28049 阅读 · 1 评论 -
python爬虫(2.获取网页外链与内链)
from urllib.request import urlopenfrom urllib.parse import urlparsefrom bs4 import BeautifulSoupimport reimport datetimeimport randompages = set()random.seed(datetime.datetime.now())#获取页面内链原创 2016-12-29 16:17:49 · 1770 阅读 · 0 评论 -
python爬虫(3.下载文件)
从网站下载图片,右键审查元素,找到 from urllib.request import urlopenfrom urllib.request import urlretrievefrom bs4 import BeautifulSoup html = urlopen("http://www.pythonscraping.com")bsObj=BeautifulSoup(html)ur原创 2017-06-13 21:18:09 · 390 阅读 · 0 评论