CitiBike is New York City’s famous bike rental company and the largest in the USA. CitiBike launched in May 2013 and has become an essential part of the transportation network. They make commute fun, efficient, and affordable — not to mention healthy and good for the environment.
CitiBike是纽约市著名的自行车租赁公司,也是美国最大的自行车租赁公司。 花旗自行车(CitiBike)于2013年5月推出,现已成为交通网络的重要组成部分。 它们使通勤变得有趣,高效且负担得起,更不用说健康且对环境有益。
I have got the data of CityBike riders of June 2013 from Kaggle. I will walk you through the complete exploratory data analysis answering some of the questions like:
我从Kaggle获得了2013年6月的CityBike骑手数据。 我将引导您完成完整的探索性数据分析,回答一些问题,例如:
- Where do CitiBikers ride? CitiBikers骑在哪里?
- When do they ride? 他们什么时候骑?
- How far do they go? 他们走了多远?
- Which stations are most popular? 哪个电台最受欢迎?
- What days of the week are most rides taken on? 大多数游乐设施在一周的哪几天?
- And many more 还有很多
Key learning:
重点学习:
I have used many parameters to tweak the plotting functions of Matplotlib and Seaborn. It will be a good read to learn them practically.
我使用了许多参数来调整Matplotlib和Seaborn的绘图功能。 实际学习它们将是一本好书。
Note:
注意:
This article is best viewed on a larger screen like a tablet or desktop. At any point of time if you find difficulty in understanding anything I will be dropping the link to my Kaggle notebook at the end of this article, you can drop your quaries in the comment section.
最好在平板电脑或台式机等较大的屏幕上查看本文。 在任何时候,如果您发现难以理解任何内容,那么在本文结尾处,我都会删除指向我的Kaggle笔记本的链接,您可以在评论部分中删除您的查询。
让我们开始吧 (Let’s get started)
Importing necessary libraries and reading data.
导入必要的库并读取数据。
#importing necessary libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns#setting plot style to seaborn
plt.style.use('seaborn')#reading data
df = pd.read_csv('../input/citibike-system-data/201306-citibike-tripdata.csv')
df.head()
![CitiBike dataset](https://i-blog.csdnimg.cn/blog_migrate/8b6202ce2f542d757cb8621f5d58762e.png)
Let’s get some more information on the data.
让我们获取有关数据的更多信息。
df.info()
![Image for post](https://i-blog.csdnimg.cn/blog_migrate/3ec8742dad0557a9fa9daf177cd0179b.png)
#sum of missing values in each column
df.isna().sum()
![Image for post](https://i-blog.csdnimg.cn/blog_migrate/fe21b3237dc04627c077e7cbd358ade1.png)
We have whooping 5,77,703 row