美国交通事故分析(2017)(项目练习_5)

该项目旨在分析2017年美国交通事故,通过数据处理、可视化及xgboost建模预测事故严重程度。主要发现事故多发生在加利福尼亚、得克萨斯等州,早晚高峰及晴天较多时。使用xgboost模型预测,得到约76%的准确率。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

1.项目摘要说明

项目目的:对于数据分析的练习
数据来源:kaggle
源码.数据集以及字段说明 百度云链接:
地址:https://pan.baidu.com/s/1UD5HD69bNEsX2EkjaQ1IPg
提取码:8gd8

本项目分析目标:

  • 对数据进行基础分析 发生事故最多的州,什么时候容易发生事故,事故发生时天气状况及可视化应用:讲述2017美国发生事故的总体情况等等
  • 利用xgboost对事故严重程度进行预测,查看事故严重程度和什么因素比较有关
2.数据处理(仅为分析处理,建模的处理放在后面)

原数据集(US_Accidents_Dec19.csv)是一个数据量49列共300W数据量包含2016到2019的交通事故,但考虑到电脑硬件及时间问题,仅选取2017年间的事故进行分析(详情源文件可见)

#截取2017年的
import pandas as pd
data = pd.read_csv('./US_Accidents_Dec19.csv')
datacopy = data.copy()
datacopy['Start_Time'] = pd.to_datetime(datacopy['Start_Time'])
datacopy['year'] = datacopy['Start_Time'].apply(lambda x:x.year)
data1 = datacopy[datacopy['year']==2017]
data1.to_csv('./USaccident2017.csv')

对USaccident2017.csv开始分析
导入需要使用的包

import numpy as np
from sklearn.preprocessing import OneHotEncoder
from sklearn.decomposition import PCA 
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline
import folium
import pandas as pd
import  webbrowser
from pyecharts import options as opts
from pyecharts.charts import Page, Pie, Bar, Line, Scatter
from sklearn.preprocessing import RobustScaler
from sklearn.metrics import accuracy_score
from sklearn.feature_selection import SelectKBest
from sklearn.feature_selection import chi2
from sklearn.model_selection import train_test_split
import xgboost as xgb
data = pd.read_csv('./USaccident2017.csv')
data.shape #(717483, 51)
data.head()
Unnamed: 0 ID Source TMC Severity Start_Time End_Time Start_Lat Start_Lng End_Lat End_Lng Distance(mi) Description Number Street Side City County State Zipcode Country Timezone Airport_Code Weather_Timestamp Temperature(F) Wind_Chill(F) Humidity(%) Pressure(in) Visibility(mi) Wind_Direction Wind_Speed(mph) Precipitation(in) Weather_Condition Amenity Bump Crossing Give_Way Junction No_Exit Railway Roundabout Station Stop Traffic_Calming Traffic_Signal Turning_Loop Sunrise_Sunset Civil_Twilight Nautical_Twilight Astronomical_Twilight year
0 9206 A-9207 MapQuest 201.0 3 2017-01-01 00:17:36 2017-01-01 00:47:12 37.925392 -122.320595 NaN NaN 0.01 Accident on I-80 Westbound at Exit 15 Cutting ... NaN I-80 E R El Cerrito Contra Costa CA 94530 US US/Pacific KCCR 2017-01-01 00:53:00 44.1 40.8 79.0 29.91 10.0 WSW 5.8 NaN Partly Cloudy False False False False False False False False False False False True False Night Night Night Night 2017
1 9207 A-9208 MapQuest 201.0 3 2017-01-01 00:26:08 2017-01-01 01:16:06 37.878185 -122.307175 NaN NaN 0.01 Accident on I-580 Southbound at Exit 12 I-80 I... NaN I-580 W R Berkeley Alameda CA 94710 US US/Pacific KOAK 2017-01-01 00:53:00 51.1 NaN 83.0 29.97 10.0 West 11.5 NaN Overcast False False True False False False False False False False False False False Night Night Night Night 2017
2 9208 A-9209 MapQuest 201.0 2 2017-01-01 00:53:41 2017-01-01 01:22:35 38.014820 -121.640579 NaN NaN 0.00 Accident on Taylor Rd Southbound at Bethel Isl... 2998.0 Taylor Ln R Oakley Contra Costa CA 94561 US US/Pacific KCCR 2017-01-01 00:53:00 44.1 40.8 79.0 29.91 10.0 WSW 5.8 NaN Partly Cloudy False False False False False False False False False False False False False Night Night Night Night 2017
3 9209 A-9210 MapQuest 241.0 3 2017-01-01 01:18:51 2017-01-01 01:48:01 37.912056 -122.323982 NaN NaN 0.01 Lane blocked and queueing traffic due to accid... NaN Bayview Ave R Richmond Contra Costa CA 94804 US US/Pacific KCCR 2017-01-01 01:11:00 44.1 42.5 82.0 29.95 9.0 SW 3.5 NaN Mostly Cloudy False False False False False False False False False False False False False Night Night Night Night 2017
4 9210 A-9211 MapQuest 222.0 3 2017-01-01 01:20:12 2017-01-01 01:49:47 37.925392 -122.320595 NaN NaN 0.01 Queueing traffic due to accident on I-80 Westb... NaN I-80 E R El Cerrito Contra Costa CA 94530 US US/Pacific KCCR 2017-01-01 01:11:00 44.1 42.5 82.0 29.95 9.0 SW 3.5 NaN Mostly Cloudy False False False False False False False False False False False True False Night Night Night Night 2017

字段说明

https://www.jianshu.com/p/9e597dc8ae71

#查看空值情况
data.isnull().sum()[data.isnull().sum()!=0]

在这里插入图片描述

#处理空值
#无影响或者不分析的列 删除
deletelist= ['Unnamed: 0', 'ID','TMC', 'End_Lat', 'End_Lng', 'Airport_Code','Weather_Timestamp','Wind_Chill(F)',
             'Civil_Twilight', 'Nautical_Twilight',
             'Astronomical_Twilight', 'year','Number']
data1 = data.drop(deletelist, axis=1)
#删除有空值的行
data1 = data1.dropna(axis = 0,subset=['City','Zipcode','Timezone','Sunrise_Sunset'])
#温度湿度气压能见度用均值填补
data1['Temperature(F)'] = data1['Temperature(F)'].fillna(data1['Temperature(F)'].mean())
data1['Humidity(%)'] = data1['Humidity(%)'].fillna(data1['Humidity(%)'].mean())
data1['Pressure(in)'] = data1['Pressure(in)'].fillna(data1['Pressure(in)'].mean())
data1['Visibility(mi)'] = data1['Visibility(mi)'].fillna(data1['Visibility(mi)'].mean())
#风速使用近邻填补
data1['Wind_Speed(mph)'] = data1['Wind_Speed(mph)'].interpolate(method='nearest', order=4)
#天气状况风向用众数填补
data1['Weather_Condition'] = data1['Weather_Condition'].fillna(data1['Weather_Condition'].
评论 30
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值