系统环境:
操作系统:Windows10 专业版 64bit
Python:anaconda2、Python2.7
Python packages:requests、beautifulsoup os
新手入门爬虫时一般都会先从静态HTML网页下手,并且爬取HTML网页不难,容易上手。遇到没见过函数可以找度娘,去理解那些函数有什么作用,弄清楚那些参数的用途,然后用多几次,就大概知道他的套路是怎么样的了(小白我就是这样入门滴)。好了,废话不多说,上代码:
# -*- coding: utf-8 -*-
"""
Created on Thu Apr 26 18:09:20 2018
@author: zww
"""
import requests
from bs4 import BeautifulSoup
import os
proxies = { 'https': 'http://41.118.132.69:4433' }
hd={ 'User-Agent': "Mozilla/5.0 (Windows; U; MSIE 9.0; Windows NT 9.0; en-US)"}
url='http://q.10jqka.com.cn/thshy/'
req =requests.get(url,headers=hd, proxies =proxies )
#print req
bs=BeautifulSoup(req.content,'html.parser')
div_all=bs.find_all('div',attrs=