本文讲解的是通过Python语言实现12306火车票信息爬取的实例
主要思路为:通过查询接口获取网页信息 → 找出信息中的规律 → 对信息进行处理(主要是对字符串的处理) → 提炼相关信息 → 输出相关信息
在这里,相关接口有两类:
1、https://kyfw.12306.cn/otn/resources/js/framework/station_name.js?station_version=1.9006,在这个网页上可以获取所有车站对应的代码信息;
2、https://kyfw.12306.cn/otn/leftTicket/query?leftTicketDTO.train_date=2018-06-18&leftTicketDTO.from_station=XAY&leftTicketDTO.to_station=ZSY&purpose_codes=ADULT,在这个网页上可以可以获取相关车次的信息(车次、站点、时间、票种等)。其中,链接中的2018-06-18、XAY、ZSY、ADULT均可更改,分别表示出发时间、出发站、到达站、成人(学生用0X00表示),由于参数的不同,网页信息也会不同,以下为三种情况:
①正常情况下:
{"validateMessagesShowId":"_validatorMessage","status":true,"httpstatus":200,"data":{"result":["TrzzTVI8faB5h%2Fl%2FbI7bI8kzLbyZkbSR1gUom0cuZ3hSUgPHzhUNvUCiEeTL0hRvvVd2EaebUhSo%0AfwwAVDSUC2DylMiWbx4Ymjz9IcTu0KcEJXFkvw%2FsdEMO4ePApiK%2BhejO5TzmpCS0HyO6ZkDNTsTX%0AQQzKalxJ0WrJ9i3%2BY5Z4PLKt%2Bj4Py4XGK3qruwD1rCj4tB%2FenF5QaaiVALz7yoPlF3BWeC6JWOxs%0AXtB%2FoWeG3A4NxkywoieelcjHTgo0bZTR6w%3D%3D|预订|410000K6260S|K626|XAY|HAN|XAY|ZSY|08:33|10:20|01:47|Y|zLCzbUTvWTf2xEVVJz0KoBnJivTWR3C5Vlb%2BhKByvpXbr%2FbcIQRGHqCoAeE%3D|20180618|3|Y2|01|03|0|0||||2|||有||有|有|||||10401030|1413|0","Oh%2BkGpiSKiINjNP0UqBfRkU1iJHmw4h34knFjzias4jsK6f9OVjz%2F7TS3qhYJmqisfNJRzB9%2F9yy%0A1k%2BCYvh6J2h%2FdjCI3zhV3CCiTEWnuJmQXRwDM9LbFZ583APNpJS%2B%2FZJL9wpbGyH8REMevM5bwXPq%0AF0pHnTfyJMO5SvY8DDTolp48Oqp5brZpPjjcYXlUzvUEJaDjkEQEDs1sZGIkRrZsMRfS4XQAJ0%2F6%0ASRWrXr613t0QYK40hY8sCdLAP0tnJkDKng%3D%3D|预订|41000K10320V|K1032|XAY|GIW|XAY|ZSY|18:30|20:20|01:50|Y|J4oPbdkn2DaSdC%2BgZ8ZdFV%2Bm9D2YV%2BjUtom8C9PH6PKsmZ8S2EshXr8eMgE%3D|20180618|3|Y2|01|03|0|0||||无|||有||有|有|||||10401030|1413|1"],"flag":"1","map":{"ZSY":"柞水","XAY":"西安"}},"messages":[],"validateMes