python爬虫数据可视化_python简单爬虫+数据可视化-分析近年高考录取分数线-CSDN博客

昨天高考已经结束了，虽然不关我什么事情，但突然想去看看近几年的录取分数线，于是我上网查了查，结果数据一大堆，也没有直观的图表，看起来真的费劲。于是就用上了很久以前学过的爬虫来分析一波！

于是打开网址:http://kaoshi.edu.sina.com.cn/college/scorelist?tab=batch&wl=&local=14&batch=&syear=2018.右击页面选择：检查->network->prevall(如下图)，证实了这个页面上的数据不是通过js生成的，那么也没必要用Phantomjs+selenium来爬了，直接使用最简单的requeste模块就完事了。

数据直接返回，不是通过js动态生成的

那么这个网站上的这些数据是非常好爬的，直接用request就完事。那么爬虫代码非常简单：class Spyder(object):def __init__(self,begin_year,last_year):#初始化self.url = "http://kaoshi.edu.sina.com.cn/college/scorelist?tab=batch&wl=&local=14&batch=&syear={}"self.head = hd = {"User-Agent":"Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.131 Safari/537.36"}self.begin_year = begin_yearself.last_year = last_yeardef getNowPage(self,now_year): #得到一年的所有批次数据obj_list = []url = self.url.format(now_year)res = requests.request(url=url,headers=self.head,method="get").contenteml = ET.HTML(res)#得到当前页数所有批次的信息eml_lis = eml.xpath("//tr[@class="tbl2tbody"]")#封装成对象for i in eml_lis:pc = i.xpath(".//td[4]/text()")[0];if pc == "本科一批" or pc=="本科二批" or pc == "本科第一批" or pc=="本科第二批":obj_list.append(Obj(i.xpath(".//td[1]/text()")[0],i.xpath(".//td[3]/text()")[0],pc,i.xpath(".//td[5]/text()")[0]))# for i in obj_list:# i.test()return obj_list;def run(self): #取得指定年份区间的所有数据if(self.begin_year < self.last_year):result_object_list = []for i in range(self.begin_year , self.last_year+1):result_object_list.append(self.getNowPage(i))return result_object_listreturn None

看不全可以左右滑动代码

直接写一个spyder类，封装几个方法就完事。记得发送请求的时候带上user-Agent。然后xpath抓我们想要的数据。最好把这些数据在封装到一个类里面，方便后面的可视化操作。

class Obj(object): #封装一个科目批次的数据def __init__(self,year,type,pc,sorce):self.year = year; #年份self.type = type; #文理科self.pc = pc; #本科批次self.sorce = sorce #分数线

把所有数据封装好后就可以编写可视化数据的代码了。在python里面有一个很猛的第三方库：matplotlib。这个和mathlab那个软件功能相似，都是绘图用的。那么import一下matplotlib。首先建立一个Draw类,封装一个类方法draw，draw里面编写我们的绘图逻辑。那么代码也很简单。class Draw(object):def draw(datas):matplotlib.rcParams["font.sans-serif"] = ["SimHei"]matplotlib.rcParams["axes.unicode_minus"] = Falseplt.xlabel("年份")x_lable = []#Draw.draw(datas)for i in range(0,len(datas)):x_lable.append(datas[i][0].year)sorce_h = [int(o.sorce) for o in [x for x in datas[i]]]print(sorce_h)rect_n = plt.bar([y/4+i*2 for y in range(1,5)],sorce_h,width=0.2)for rect in rect_n:height = rect.get_height()plt.text(rect.get_x() + rect.get_width() / 2, height+1, str(height), ha="center", va="bottom")plt.xticks([i*2+0.5 for i in range(0,len(datas))], x_lable)plt.show()

方法写完，那么我们只需要在man函数里面调用我们的写的draw并把我们爬到的所有数据传入进去照着这个逻辑处理就完事了。def main():b = int(input("输入开始年份"))e = int(input("输入结束年份"))datas = Spyder(b,e).run()Draw.draw(datas)

那么代码写完，运行看看。假设要分析2011-2018的湖南高考录取分数线，输入2011>>2018,回车，那么结果如下

每一年的从左到右依次代表理科一本，理科二本，文科一本，文科二本

分析了一波，并没有什么luan用，但这个爬虫+数据可视化还是确实很好玩的。