本博客只爬取豆瓣图书Top250的图片,各位爱书的小伙伴赶紧学起来,爬完的效果图如下:
我这段代码的目录结构如下:
代码在此:
# -*- coding:utf-8 -*-
import requests
from lxml import etree
def spider(num):
url = 'https://book.douban.com/top250?start=' + str(num)
html = requests.get(url)
selector = etree.HTML(html.text)
pic_url = selector.xpath('//a[@class="nbg"]/img/@src')
for each in range(0, len(pic_url)):
pic = requests.get(pic_url[each])
fp = open('pic\\books\\' + str(num + each) + '.jpg', 'wb')
fp.write(pic.content)
print("保存第%d本书成功" % int(each+num))
fp.close()
if __name__ == '__main__':
for i in range(10):
spider(num=i * 25)
执行过程和完成: