项目任务:
- 每年空难数分析
- 机上乘客数量
- 生还数、遇难数
- 哪些航空公司空难数最多?
- 哪些机型空难数最多?
# -*-coding: utf-8 -*-
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
from bokeh.io import output_notebook, output_file, show
from bokeh.charts import Bar,TimeSeries
from bokeh.layouts import column
from math import pi
- 查看数据信息
data_path = './dataset/Airplane_Crashes_and_Fatalities_Since_1908.csv'
df_data = pd.read_csv(data_path)
print u'数据集基本信息:'
print df_data.info()
数据集基本信息: RangeIndex: 5268 entries, 0 to 5267 Data columns (total 13 columns): Date 5268 non-null object Time 3049 non-null object Location 5248 non-null object Operator 5250 non-null object Flight # 1069 non-null object Route 3562 non-null object Type 5241 non-null object Registration 4933 non-null object cn/In 4040 non-null object Aboard 5246 non-null float64 Fatalities 5256 non-null float64 Ground 5246 non-null float64 Summary 4878 non-null object dtypes: float64(3), object(10) memory usage: 535.1+ KB None
print u'数据集有%i行,%i列' %(df_data.shape[0], df_data.shape[1])
数据集有5268行,13列
print u'数据预览:'
df_data.head()
数据预览:
Date | Time | Location | Operator | Flight # | Route | Type | Registration | cn/In | Aboard | Fatalities | Ground | Summary | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 09/17/1908 | 17:18 | Fort Myer, Virginia | Military - U.S. Army | NaN | Demonstration | Wright Flyer III | NaN | 1 | 2.0 | 1.0 | 0.0 | Durin |