美国共享单车数据分析

分析三大美国城市的自行车共享系统相关的数据:芝加哥、纽约和华盛顿特区。写一个脚本,该脚本会接受原始输入并在终端中创建交互式体验,以展现这些统计信息。

数据集说明:

三座城市 2017 年上半年的数据。三个数据文件都包含以下六列:

起始时间 Start Time(例如 2017-01-01 00:07:57)
结束时间 End Time(例如 2017-01-01 00:20:53)
骑行时长 Trip Duration(例如 776 秒)
起始车站 Start Station(例如百老汇街和巴里大道)
结束车站 End Station(例如塞奇威克街和北大道)
用户类型 User Type(订阅者 Subscriber/Registered 或客户Customer/Casual)

芝加哥和纽约市文件还包含以下两列:
性别 Gender
出生年份 Birth Year

导入模块:

import time
import numpy as np
import pandas as pd

CITY_NAME = {'chicogo': 'chicago.csv', 
           'new york city': 'new_york_city.csv',        
           'washington': 'washington.csv'}

输入筛选条件:城市、月份、星期

(不能同时查看三个城市,因为不能同时打开三个文件)

def filter():
    city = input_mod('Plese enter a city to anlyse US bikeshare data: chicago, new york city or washington: \n','Error! Please enter correct city:\n', [ 'chicago', 'new york city', 'washington'])
    
        
    month = input_mod('Please enter a month to anlyse the US bikeshare data :  all, january, february, ... , june: \n ','Error!Please enter correct month:\n', ['all', 'january', 'february', 'march', 'april', 'may', 'june'])

    day = input_mod('Please enter a city to anlyse the US bikeshare data: all, monday, tuesday, ... sunday: \n ','Error!Please enter correct day of week:\n', ['all', 'monday', 'tuesday', 'wednesday', 'thursday', 'friday', 'saturday', 'sunday'])

    
    return city, month, day

建立一个用于获取输入的函数,进行错误检测:

def input_mod(input_print, input_error, enter_list):
    #将输入的字符转换为小写
    ret = input(input_print).lower()
    #如果输入的内容超出列表范围,则输出错误语句
    while ret not in enter_list:
        ret = input(input_error).lower()
        continue
        
    return ret

筛选出符合条件的数据:

def load_file(city, month ,day):
    
    # 加载指定城市的数据集。索引 global CITY_DATA 字典对象来获取指定城市名对应的文件名
    df = pd.read_csv(CITY_NAME[city])
    
    #提取日期的年月、星期
    df['Start Time'] = pd.to_datetime(df['Start Time'])
    df['month'] = df['Start Time'].dt.month
    df['day_of_week'] = df['Start Time'].dt.weekday_name
    
     # 如果输入的月份不是all(所有月份),则根据月份筛选
    if month != 'all':
        months = ['january', 'february', 'march', 'april', 'may', 'june']
        month = months.index[month]+1 #索引号加1
        df = df[df['month'] == month]       
    
     # 如果输入的月份不是all(所有星期),则根据星期名筛选
    if day != 'all':
        df = df[df['day_of_week'] == day.title()]
        
    return df

起始时间分析:

1、起始时间(Start Time 列)中哪个月份最常见?

2.起始时间中,一周的哪一天(比如 Monday, Tuesday)最常见?

3.起始时间中,一天当中哪个小时最常见?

def Start_Time(df):

    popular_month = df['month'].mode()
    print('popular month: ',popular_month)
    
    popular_day = df['day_of_week'].mode()
    print('popular day of week :',popular_day)
    
    df['hour'] = df['Start Time'].dt.hour
    popular_hour = df['hour'].mode()
    print('popular hour: ',popular_hour)
    

骑行时长分析:

1、总骑行时长(Trip Duration)是多久?

2、平均骑行时长是多久?

def Trip_Duration(df):

    total_trip = df['Trip Duration'].sum()
    print('total trip duration: ',total_tirp)
    
    mean_trip = df['Trip Duration'].mean()
    print('mean trip duration: ',mean_tirp)
    

车站分析:

1、哪个起始车站(Start Station)最热门,哪个结束车站(End Station)最热门?

2.哪一趟行程最热门(即,哪一个起始站点与结束站点的组合最热门)?

def station(df):
    popular_start_station = df['Start Station'].mode()
    print('popular start station: ',popular_start_station)
    
    popular_end_station = df['End Station'].mode()
    print('popular end station: ',popular_end_station)
    
    popular_combine_trip = (df['Start Station'] + '-->'+ df['End Station']).mode()
    print('most frequent combination of start station and end station trip is: \n',most_frequent_trip)
    

用户分析:

1、每种用户类型(User Type)有多少人?

2.每种性别(Gender)有多少人?(华盛顿没有该列)

3.出生年份(BirthYear)最早的是哪一年、最晚的是哪一年,最常见的是哪一年?(华盛顿没有该列)

def user(df):
    user_type = df['User Type'].value_counts()
    print('user type: ', user_type)

#如果df中没有gender和birth year,要进行容错处理
#用try语句或者if语句:
#方法一:try语句
    try:
    
        gender = df['Gender'].value_counts()
        print('user gender: ',gender)
    
        earliest_year = df['Birth Year'].min()
        most_recent_year = df['Birth Year'].max()
        most_common_year = df['Birth Year'].mode()
    
        print('earliest year: ',earliest_year)
        print('most recent_ year: ',most_recent_year)
        print('most common year: ',most_commn_year)
    
    except:
        pass
    #方法二:if语句:
    if 'Gender' in df:
        gender = df['Gender'].value_counts()
        print('user gender: ',gender)
    
        earliest_year = df['Birth Year'].min()
        most_recent_year = df['Birth Year'].max()
        most_common_year = df['Birth Year'].mode()
    
        print('earliest year: ',earliest_year)
        print('most recent_ year: ',most_recent_year)
        print('most common year: ',most_commn_year)
        continue
        
    if 'Birth Year' in df:
       gender = df['Gender'].value_counts()
        print('user gender: ',gender)
    
        earliest_year = df['Birth Year'].min()
        most_recent_year = df['Birth Year'].max()
        most_common_year = df['Birth Year'].mode()
    
        print('earliest year: ',earliest_year)
        print('most recent_ year: ',most_recent_year)
        print('most common year: ',most_commn_year)
        continue
    

Python中首先执行最先出现的非函数定义和非类定义的没有缩进的代码,会从前到后执行。代码的执行顺序会从 if name == “main”:开始,执行里面的 main() 函数。

在这个 main() 函数中,是一个条件为True 的 while 循环,除非被下方的if条件语句打破,否则循环会一直继续下去

def main():
    while True:
        city, month, day = filter()
        df = load_file(city, month, day)

        Start_Time(df)
        Trip_Duration(df)
        station(df)
        user(df)

        restart = input('\nWould you like to restart? Enter yes or no.\n')
        if restart.lower() != 'yes':
            break

if __name__ == "__main__":
	main()

           
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值