自行车租赁数据分析与可视化_自行车事故分析

自行车租赁数据分析与可视化

简介: Business Problem (Introduction: Business Problem)

This report will try to analyze the best location and time for cycling activities. Specifically, this project will target stakeholders interested in cycling activities such as individual cyclists, cycling communities, and companies/sponsors/event organizers of cycling activities.

该报告将尝试分析骑自行车活动的最佳地点和时间。 具体来说,该项目将针对对自行车活动感兴趣的利益相关者,例如个人自行车手,自行车社区以及自行车活动的公司/赞助商/活动组织者。

During a pandemic, many people choose cycling as an alternative to sports. Cycling is considered the safest way to exercise because of minimal contact with other people. There are many accidents involving cyclists. Cyclists also need protection and a sense of security while on the road. Cycling is not only interpreted as transportation activity, but also sports and recreational activities.

在大流行期间,许多人选择骑自行车代替运动。 骑自行车被认为是最安全的运动方式,因为与他人的接触很少。 骑自行车的人有很多事故。 骑自行车的人在旅途中也需要保护和安全感。 骑自行车不仅被解释为运输活动,而且还被视为体育和娱乐活动。

When an accident occurs, car drivers are still protected by car frames and car safety technology in comparison. So, the chances of surviving or being injured are still relatively low compared to cyclists. Cyclists are only protected by wearing helmets on their heads. When an accident occurs, their bodies, feet, and hands have the potential to be injured.

相比而言,发生事故时,汽车驾驶员仍受到车架和汽车安全技术的保护。 因此,与骑自行车的人相比,幸存或受伤的机会仍然相对较低。 骑自行车的人只有戴着头盔才能得到保护。 发生事故时,他们的身体,脚和手有受伤的危险。

This project will assist the Seattle Department of Transportation (SDOT) to provide different traffic signs in accident-prone areas for cyclists. This project will also help the cyclist community like Cascade Bicylcle Club, COGS (Cyclists Of Greater Seattle), Brake the Cycle, etc. to find out the right track and time to hold a cycling event.

该项目将协助西雅图交通运输部(SDOT)在容易发生事故的地区为骑自行车的人提供不同的交通标志。 该项目还将帮助诸如Cascade Bicylcle Club,COGS(大西雅图骑自行车者),Brake the Cycle等 骑自行车者社区找到举办自行车运动的正确时间和时间。

Many events are held by many cyclist communities, like Cascade Bicycle Club, for example.. This club hosts several major riding events every year including Chilly Hilly, Seattle Bike-n-Brews, Ride for Major Taylor, Flying Wheels Summer Century, Woodinville Wine Ride, Seattle Night Ride, the Red-Bell 100, Seattle to Portland (STP), Ride from Seattle to Vancouver and Party (RSVP), Ride Around Washington (RAW), High Pass Challenge (HPC), and Kitsap Color Classic (KCC) (Wikipedia). This project can help companies, sponsors, and event organizers to create safe cycling events for all participants.

许多活动是由许多骑自行车者社区举办的,例如Cascade自行车俱乐部。.该俱乐部每年举办几项主要的骑行活动 ,包括Chilly Hilly,Seattle Bike-n-Brews,Taylor Major骑行,Flight Wheels Summer Summer Century,Woodinville Wine乘车,西雅图之夜乘车,Red-Bell 100,西雅图至波特兰(STP),西雅图至温哥华和聚会乘车(RSVP),华盛顿乘车(RAW),高通挑战(HPC)和Kitsap Color Classic(KCC) )( Wikipedia )。 该项目可以帮助公司,赞助商和活动组织者为所有参与者创建安全的自行车比赛

Image for post
Open dataset ‘Data-Colllisions.csv’
打开数据集“ Data-Colllisions.csv”

数据 (Data)

Based on definition of our problem, features or columns that will influence our analysis are:

根据我们对问题的定义,将影响我们分析的功能或专栏为:

  1. Location: Latitude (X), Longitude (Y), Address Type (ADDRTYPE)

    位置 :纬度(X),经度(Y),地址类型(ADDRTYPE)

  2. Severity: A code that corresponds to the severity of the collision (SEVERITYCODE), a detailed description of the severity of the collision (SEVERITYDESC)

    严重性 :与冲突的严重性相对应的代码(SEVERITYCODE),详细说明了冲突的严重性(SEVERITYDESC)

  3. Person Count: Total number of people involved (PERSONCOUNT), number of bicycles involved in the collision (PEDCYLCOUNT)

    人数 :涉及的总人数(PERSONCOUNT),参与碰撞的自行车数(PEDCYLCOUNT)

  4. Date: The date and time of the incident (INCDTTM)

    日期 :事件的日期和时间(INCDTTM)

  5. Condition: Description of the weather conditions (WEATHER), condition of the road (ROADCOND), light conditions during the collision (LIGHTCOND)

    条件 :天气条件(WEATHER),道路条件(ROADCOND),碰撞过程中的光照条件(LIGHTCOND)的说明

条件选择-仅显示自行车参与碰撞的数据 (Conditional Selection — Only show data that bicycles involved in the collision)

In the explanation of the problem above, we will help solve the problem for cyclists. We limit data on car accidents involving cyclists. So the PEDCYLCOUNT column must be greater than zero.

在上面问题的解释中,我们将帮助自行车手解决问题。 我们限制有关骑自行车者的交通事故数据。 因此,PEDCYLCOUNT列必须大于零。

Image for post
Conditional Selection
条件选择

数据框形状 (DataFrame Shape)

The dataset used is (5484 rows, 38 columns). Not all columns will be used, will be selected according to the data description above.

使用的数据集为(5484行,38列)。 并非将使用所有列,而是将根据上面的数据描述进行选择。

Image for post

DataFrame数据类型和缺失值 (DataFrame Data Type and Missing Values)

In the description below, we will know the data types and missing values and their presentations.

在下面的描述中,我们将了解数据类型和缺失值及其表示形式。

Image for post
Checking Data Types
检查数据类型
Image for post
Checking Missing Values
检查缺失值

方法 (Methodology)

1. Features Selection

1.功能选择

Not all features are used for analysis in this project. Thus, only some data is displayed and analyzed.

在此项目中,并非所有功能都用于分析。 因此,仅显示和分析一些数据。

2. Handling Missing Values

2.处理缺失值

Missing values will interfere with the prediction and analysis results. So, we need to handle the missing values ​​by deleting them or filling them in. If there are not too many missing values, we can choose the option to delete them.

缺少值会干扰预测和分析结果。 因此,我们需要通过删除或填写缺失值来处理缺失值。如果缺失值不太多,我们可以选择删除它们的选项。

3. Handling Duplicates Values

3.处理重复值

Duplicate values will also interfere with the analysis and prediction results. First, we need to detect the number of duplicate values ​​in the dataset. Next, these duplicate values need to be removed to make the dataset cleaner.

重复的值也会干扰分析和预测结果。 首先,我们需要检测数据集中重复值的数量。 接下来,需要删除这些重复的值以使数据集更整洁。

4. Convert ‘INCDTTM’ Column to Datetime Type

4.将“ INCDTTM”列转换为日期时间类型

‘ICDDTM’ Column needs to be changed in the DateTime type. Because by converting it to a DateTime type, we can extract hour, day, month, and year data. These data can help us to analyze data more deeply.

需要在DateTime类型中更改“ ICDDTM”列。 因为通过将其转换为DateTime类型,我们可以提取小时,日,月和年数据。 这些数据可以帮助我们更深入地分析数据。

5. Exploratory Data Analysis (EDA)

5.探索性数据分析(EDA)

After cleaning the data, we can run the exploratory data analysis. The analysis framework follows the problem we have defined, namely finding the best time and location for cycling activities.

清理数据后,我们可以进行探索性数据分析。 分析框架遵循我们已定义的问题,即为骑自行车活动找到最佳时间和地点。

First, the data will be explored and analyzed based on data related to time, such as the hour, day, month, year, and weather. The data is visualized to get an overview of the best time to hold a cycling event. Second, looking for an overview of the conditions for the best place to hold a cycling event. The data visualized include light conditions, road conditions, and address types.

首先,将基于与时间相关的数据(例如小时,日,月,年和天气)探索和分析数据。 可视化数据以获取举行自行车比赛的最佳时间的概述。 其次,寻找适合举行自行车比赛的最佳地点的条件概述。 可视化的数据包括光照条件,道路条件和地址类型。

6. Model Building

6.模型制作

The machine learning model used in this project is logistic regression. Why use logistic regression? First, the data is binary. Second, we need probabilistic results to find out the time and place conditions that are most likely to cause injury collision. Before model building, the data will be encoded using the one-hot encoder and split into training and testing data.

该项目中使用的机器学习模型是逻辑回归。 为什么要使用逻辑回归? 首先,数据是二进制的。 其次,我们需要概率结果来找出最可能导致伤害碰撞的时间和地点条件。 在建立模型之前,将使用一键编码器对数据进行编码,并将其分为训练和测试数据。

分析 (Analysis)

Features Selection (Features Selection)

We use features that give support to solve the problems that have been planned.

我们使用可以为解决计划中的问题提供支持的功能。

  • 0
    点赞
  • 3
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值