携程车次信息爬虫
工具:Pycharm,win10,Python3.6.4
1.需求分析
今天我们要爬取携程a上面的车次信息,只爬取直达的班次。我们需要的信息如下
我们以昆山到苏州为例,要获取如下信息,我们查看网页源码发现这些信息并不存在网页源码中。打开开发者工具,点击XHR,我们可以看到数据都存放在一个异步请求中。
数据格式是json,获取很方便,我们要做的就是能获取到该页面信息。我们可以看到这是一个POST请求。
请求参数也比较简单,我直接列出来
value:
{"IsBus":false,"Filter":"0","Catalog":"","IsGaoTie":false,"IsDongChe":false,"CatalogName":"","DepartureCity":"kunshan","ArrivalCity":"suzhou","HubCity":"","DepartureCityName":"昆山","ArrivalCityName":"苏州","DepartureDate":"2019-03-06","DepartureDateReturn":"2019-03-08","ArrivalDate":"","TrainNumber":""}
可以删减一部分参数,删减之后如下
{"IsBus":false,"Filter":"0","Catalog":"","IsGaoTie":false,"IsDongChe":false,"CatalogName":"","HubCity":"","DepartureCityName":"昆山","ArrivalCityName":"苏州","DepartureDate":"2019-03-06","DepartureDateReturn":"2019-03-08","ArrivalDate":"","TrainNumber":""}
我们可以看到需要修改的参数主要就是两个地名。通过修改地名我们就可以批量获取班次信息。
2.数据准备
因为我们要获取全国的班次信息,所