写在前面:
这是我见过的最严肃的数据集,几乎每一行数据背后都是生命和鲜血的代价。这次探索分析并不妄图说明什么,仅仅是对数据处理能力的锻炼。因此本次的探索分析只会展示数据该有的样子而不会进行太多的评价。有一句话叫“因为珍爱和平,我们回首战争”。这里也是,因为珍爱生命,所以回首空难。现在安全的飞行是10万多无辜的人通过性命换来的,向这些伟大的探索者致敬。
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
导入数据集
crash = pd.read_csv("./Airplane_Crashes_and_Fatalities_Since_1908.csv")
crash.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5268 entries, 0 to 5267
Data columns (total 13 columns):
Date 5268 non-null object
Time 3049 non-null object
Location 5248 non-null object
Operator 5250 non-null object
Flight # 1069 non-null object
Route 3562 non-null object
Type 5241 non-null object
Registration 4933 non-null object
cn/In 4040 non-null object
Aboard 5246 non-null float64
Fatalities 5256 non-null float64
Ground 5246 non-null float64
Summary 4878 non-null object
dtypes: float64(3), object(10)
memory usage: 535.1+ KB
crash = crash.drop(["Summary","cn/In","Flight #","Route","Location"],axis=1)
crash.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5268 entries, 0 to 5267
Data columns (total 8 columns):
Date 5268 non-null object
Time 3049 non-null object
Operator 5250 non-null object
Type 5241 non-null object
Registration 4933 non-null object
Aboard 5246 non-null float64
Fatalities 5256 non-null float64
Ground 5246 non-null float64
dtypes: float64(3), object(5)
memory usage: 329.3+ KB
print(crash[2200:2205])
Date Time Operator Type \
2200 03/06/1968 8:00 Military - U.S. Air Force Fairchild C-123K
2201 03/08/1968 19:18 Air Manila Fairchild F-27
2202 03/09/1968 23:20 Military - French Air Force Douglas DC6B
2203 03/19/1968 19:37 Viking Airways - Air Taxi Cessna 182
2204 03/23/1968 13:00 Fortaire Aviation - Air Taxi Brantly 305
Registration Aboard Fatalities Ground
2200 54-0590 49.0 49.0 0.0
2201 PI-C871 14.0 14.0 0.0
2202 43748 20.0 19.0 0.0
2203 N2623F 2.0 2.0 0.0
2204 N2224U 5.0 3.0 0.0
伤亡分析
伤亡排序
print(crash["Fatalities"].sum())
fatal_crash = crash[crash["Fatalities"].notnull()]
fatal_crash = fatal_crash.sort_values(by="Fataliti