分析三大美国城市的自行车共享系统相关的数据:芝加哥、纽约和华盛顿特区。写一个脚本,该脚本会接受原始输入并在终端中创建交互式体验,以展现这些统计信息。
数据集说明:
三座城市 2017 年上半年的数据。三个数据文件都包含以下六列:
起始时间 Start Time(例如 2017-01-01 00:07:57)
结束时间 End Time(例如 2017-01-01 00:20:53)
骑行时长 Trip Duration(例如 776 秒)
起始车站 Start Station(例如百老汇街和巴里大道)
结束车站 End Station(例如塞奇威克街和北大道)
用户类型 User Type(订阅者 Subscriber/Registered 或客户Customer/Casual)
芝加哥和纽约市文件还包含以下两列:
性别 Gender
出生年份 Birth Year
导入模块:
import time
import numpy as np
import pandas as pd
CITY_NAME = {'chicogo': 'chicago.csv',
'new york city': 'new_york_city.csv',
'washington': 'washington.csv'}
输入筛选条件:城市、月份、星期
(不能同时查看三个城市,因为不能同时打开三个文件)
def filter():
city = input_mod('Plese enter a city to anlyse US bikeshare data: chicago, new york city or washington: \n','Error! Please enter correct city:\n', [ 'chicago', 'new york city', 'washington'])
month = input_mod('Please enter a month to anlyse the US bikeshare data : all, january, february, ... , june: \n ','Error!Please enter correct month:\n', ['all', 'january', 'february', 'march', 'april', 'may', 'june'])
day = input_mod('Please enter a city to anlyse the US bikeshare data: all, monday, tuesday, ... sunday: \n ','Error!Please enter correct day of week:\n', ['all', 'monday', 'tuesday', 'wednesday', 'thursday', 'friday', 'saturday', 'sunday'])
return city, month, day
建立一个用于获取输入的函数,进行错误检测:
def input_mod(input_print, input_error, enter_list):
#将输入的字符转换为小写
ret = input(input_print).lower()
#如果输入的内容超出列表范围,则输出错误语句
while ret not in enter_list:
ret = input(input_error).lower()
continue
return ret
筛选出符合条件的数据:
def load_file(city, month ,day):
# 加载指定城市的数据集。索引 global CITY_DATA 字典对象来获取指定城市名对应的文件名
df = pd.read_csv(CITY_NAME[city])
#提取日期的年月、星期
df['Start Time'] = pd.to_datetime(df['Start Time'])
df['month'] = df['Start Time'].dt.month
df['day_of_week'] = df['Start Time'].dt.weekday_name
# 如果输入的月份不是all(所有月份),则根据月份筛选
if month != 'all':
months = ['january', 'february', 'march', 'april', 'may', 'june']
month = months.index[month]+1 #索引号加1
df = df[df['month'] == month]
# 如果输入的月份不是all(所有星期),则根据星期名筛选
if day != 'all':
df = df[df['day_of_week'] == day.title()]
return df
起始时间分析:
1、起始时间(Start Time 列)中哪个月份最常见?
2.起始时间中,一周的哪一天(比如 Monday, Tuesday)最常见?
3.起始时间中,一天当中哪个小时最常见?
def Start_Time(df):
popular_month = df['month'].mode()
print('popular month: ',popular_month)
popular_day = df['day_of_week'].mode()
print('popular day of week :',popular_day)
df['hour'] = df['Start Time'].dt.hour
popular_hour = df['hour'].mode()
print('popular hour: ',popular_hour)
骑行时长分析:
1、总骑行时长(Trip Duration)是多久?
2、平均骑行时长是多久?
def Trip_Duration(df):
total_trip = df['Trip Duration'].sum()
print('total trip duration: ',total_tirp)
mean_trip = df['Trip Duration'].mean()
print('mean trip duration: ',mean_tirp)
车站分析:
1、哪个起始车站(Start Station)最热门,哪个结束车站(End Station)最热门?
2.哪一趟行程最热门(即,哪一个起始站点与结束站点的组合最热门)?
def station(df):
popular_start_station = df['Start Station'].mode()
print('popular start station: ',popular_start_station)
popular_end_station = df['End Station'].mode()
print('popular end station: ',popular_end_station)
popular_combine_trip = (df['Start Station'] + '-->'+ df['End Station']).mode()
print('most frequent combination of start station and end station trip is: \n',most_frequent_trip)
用户分析:
1、每种用户类型(User Type)有多少人?
2.每种性别(Gender)有多少人?(华盛顿没有该列)
3.出生年份(BirthYear)最早的是哪一年、最晚的是哪一年,最常见的是哪一年?(华盛顿没有该列)
def user(df):
user_type = df['User Type'].value_counts()
print('user type: ', user_type)
#如果df中没有gender和birth year,要进行容错处理
#用try语句或者if语句:
#方法一:try语句
try:
gender = df['Gender'].value_counts()
print('user gender: ',gender)
earliest_year = df['Birth Year'].min()
most_recent_year = df['Birth Year'].max()
most_common_year = df['Birth Year'].mode()
print('earliest year: ',earliest_year)
print('most recent_ year: ',most_recent_year)
print('most common year: ',most_commn_year)
except:
pass
#方法二:if语句:
if 'Gender' in df:
gender = df['Gender'].value_counts()
print('user gender: ',gender)
earliest_year = df['Birth Year'].min()
most_recent_year = df['Birth Year'].max()
most_common_year = df['Birth Year'].mode()
print('earliest year: ',earliest_year)
print('most recent_ year: ',most_recent_year)
print('most common year: ',most_commn_year)
continue
if 'Birth Year' in df:
gender = df['Gender'].value_counts()
print('user gender: ',gender)
earliest_year = df['Birth Year'].min()
most_recent_year = df['Birth Year'].max()
most_common_year = df['Birth Year'].mode()
print('earliest year: ',earliest_year)
print('most recent_ year: ',most_recent_year)
print('most common year: ',most_commn_year)
continue
Python中首先执行最先出现的非函数定义和非类定义的没有缩进的代码,会从前到后执行。代码的执行顺序会从 if name == “main”:开始,执行里面的 main() 函数。
在这个 main() 函数中,是一个条件为True 的 while 循环,除非被下方的if条件语句打破,否则循环会一直继续下去。
def main():
while True:
city, month, day = filter()
df = load_file(city, month, day)
Start_Time(df)
Trip_Duration(df)
station(df)
user(df)
restart = input('\nWould you like to restart? Enter yes or no.\n')
if restart.lower() != 'yes':
break
if __name__ == "__main__":
main()