飞机去哪了~爬虫篇

炼丹小白师

已于 2024-06-29 09:51:37 修改

阅读量1.7k

点赞数 4

分类专栏：数据处理文章标签：爬虫 python pycharm

于 2022-07-02 23:09:17 首次发布

本文链接：https://blog.csdn.net/qq_43241562/article/details/125578862

版权

数据处理专栏收录该内容

2 篇文章

订阅专栏

认识一下我们的数据采集源

提示：本期文章只教爬虫哦！
该网站拥有大量的飞机航行数据，并且，毫无反爬机制。

爬虫小郭入坑目录

认识一下我们的数据采集源
如何找到数据在哪？
爬虫（Spider🕷）
成果展示
- 数据爬取的成果展示
- 可视化展示+数据分析效果

如何找到数据在哪？

首先，你要对web了解一些基础知识，本次演示以chrome浏览器作为教学用例，以pycharm作为教学演示工具，那么开始吧！
FlightAware传送门
看人家网页做的真棒
接下来使用开发者工具，F12按键。找到这样类型的文件
在这里插入图片描述
复制链接哦！
在浏览器中打开看是这样的。

没错，这就是我们要找的数据源。

爬虫（Spider🕷）

既然要做一个爬虫，很明显，我们要有一个脑袋，写一个headers吧！

headers = {
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) "
                  "Chrome/83.0.4103.61 Safari/537.36 "
}

接下来引用requests模块中的get方法，一定要把headers对象传进去哦！像这样。

response = requests.get(url, headers=headers)

之后一定要将response对象json化，不然数据无法提取。
然后嘛，分解response对象json化后的目标对象，就可以获取你想要的数据啦！
设计token获取函数，要用到匹配方法哦！确保后面的数据获取的有实时性。

#获取自动token
def get_token():
    url = "https://zh.flightaware.com/live/map"
    r = requests.post(url)  # 发送post请求
    texts = r.text[r.text.rfind('"VICINITY_TOKEN":"'):][18:58]
    return texts

设计数据获取函数：

# 爬取数据并将数据进行保存
def dataSavedFunction(url):
    # 获取json文件并将文件转化为列表/字典格式
    response = requests.get(url, headers=headers)
    jsonResponse = response.json()
    # 遍历json处理后的数据并将相关数据添加到相应的空列表
    # print(jsonResponse)
    dataCollection = []
    length=len(jsonResponse["features"])
    for i in range(length):
        try:
            dataCollection.append([
                jsonResponse["features"][i]["properties"]["flight_id"], 
                jsonResponse["features"][i]["geometry"]["coordinates"][0],
                jsonResponse["features"][i]["geometry"]["coordinates"][1],
                jsonResponse["features"][i]["properties"]["direction"],
                jsonResponse["features"][i]["properties"]["groundspeed"],
                jsonResponse["features"][i]["properties"]["landingTimes"]["estimated"],
                jsonResponse["features"][i]["properties"]["altitude"]
            ])
        except KeyError:
            dataCollection.append([
                "",
                jsonResponse["features"][i]["geometry"]["coordinates"][0],
                jsonResponse["features"][i]["geometry"]["coordinates"][1],"","","",""
            ])
    return dataCollection

列表对应的数据为"航班号",“经度”,“纬度”, “航向”,“速度”,“到达终点时间”,“高度”。
这里就不限制大家的优秀idea了！