此次主要是记录一下,这次小白的实战功课,让自己以后通过每天一点点的练习和总结,自己的Python爬虫能够达到大师级水平。
废话不多说,直接上代码:
#! usr/bin/python3
# -*-coding:utf8-*-
# FileName:Douban_bbooks.py
# Author:alex
# Date:2018/02/07 18:42
# -*-coding:utf8-*-
# FileName:Douban_bbooks.py
# Author:alex
# Date:2018/02/07 18:42
import re
import requests
import requests
for page in range(0,50):
target_url = 'https://book.douban.com/tag/商业?start=' + str(page*20)#遍历生存全部url
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/52.0.2743.116 Safari/537.36 Edge/15.15063'}#添加头部信息
html = requests.get(target_url, headers = headers).text #通过requests的http请求,返回html网页内容
title_pattern = re.compile('<div.*?info"&g