美国共享单车数据分析

最新推荐文章于 2024-07-17 12:04:45 发布

NeverthelessEnd

最新推荐文章于 2024-07-17 12:04:45 发布

阅读量2.7k

点赞数 1

文章标签： python

本文链接：https://blog.csdn.net/sinat_41177607/article/details/106269598

版权

分析三大美国城市的自行车共享系统相关的数据：芝加哥、纽约和华盛顿特区。写一个脚本，该脚本会接受原始输入并在终端中创建交互式体验，以展现这些统计信息。

数据集说明:

三座城市 2017 年上半年的数据。三个数据文件都包含以下六列：

起始时间 Start Time（例如 2017-01-01 00:07:57）
结束时间 End Time（例如 2017-01-01 00:20:53）
骑行时长 Trip Duration（例如 776 秒）
起始车站 Start Station（例如百老汇街和巴里大道）
结束车站 End Station（例如塞奇威克街和北大道）
用户类型 User Type（订阅者 Subscriber/Registered 或客户Customer/Casual）

芝加哥和纽约市文件还包含以下两列：
性别 Gender
出生年份 Birth Year

导入模块：

import time
import numpy as np
import pandas as pd

CITY_NAME = {'chicogo': 'chicago.csv', 
           'new york city': 'new_york_city.csv',        
           'washington': 'washington.csv'}

输入筛选条件：城市、月份、星期

(不能同时查看三个城市，因为不能同时打开三个文件)

def filter():
    city = input_mod('Plese enter a city to anlyse US bikeshare data: chicago, new york city or washington: \n','Error! Please enter correct city:\n', [ 'chicago', 'new york city', 'washington'])
    
        
    month = input_mod('Please enter a month to anlyse the US bikeshare data :  all, january, february, ... , june: \n ','Error!Please enter correct month:\n', ['all', 'january', 'february', 'march', 'april', 'may', 'june'])

    day = input_mod('Please enter a city to anlyse the US bikeshare data: all, monday, tuesday, ... sunday: \n ','Error!Please enter correct day of week:\n', ['all', 'monday', 'tuesday', 'wednesday', 'thursday', 'friday', 'saturday', 'sunday'])

    
    return city, month, day

建立一个用于获取输入的函数，进行错误检测：

def input_mod(input_print, input_error, enter_list):
    #将输入的字符转换为小写
    ret = input(input_print).lower()
    #如果输入的内容超出列表范围，则输出错误语句
    while ret not in enter_list:
        ret = input(input_error).lower()
        continue
        
    return ret

筛选出符合条件的数据：

def load_file(city, month ,day):
    
    # 加载指定城市的数据集。索引 global CITY_DATA 字典对象来获取指定城市名对应的文件名
    df = pd.read_csv(CITY_NAME[city])
    
    #提取日期的年月、星期
    df['Start Time'] = pd.to_datetime(df['Start Time'])
    df['month'] = df['Start Time'].dt.month
    df['day_of_week'] = df['Start Time'].dt.weekday_name
    
     # 如果输入的月份不是all（所有月份），则根据月份筛选
    if month != 'all':
        months = ['january', 'february', 'march', 'april', 'may', 'june']
        month = months.index[month]+1 ＃索引号加１
        df = df[df['month'] == month]       
    
     #　如果输入的月份不是all（所有星期），则根据星期名筛选
    if day != 'all':
        df = df[df['day_of_week'] == day.title()]
        
    return df

起始时间分析：

1、起始时间（Start Time 列）中哪个月份最常见？

2.起始时间中，一周的哪一天（比如 Monday, Tuesday）最常见？

3.起始时间中，一天当中哪个小时最常见？

def Start_Time(df):

    popular_month = df['month'].mode()
    print('popular month: ',popular_month)
    
    popular_day = df['day_of_week'].mode()
    print('popular day of week :',popular_day)
    
    df['hour'] = df['Start Time'].dt.hour
    popular_hour = df['hour'].mode()
    print('popular hour: ',popular_hour)

骑行时长分析：

1、总骑行时长（Trip Duration）是多久？

2、平均骑行时长是多久？

def Trip_Duration(df):

    total_trip = df['Trip Duration'].sum()
    print('total trip duration: ',total_tirp)
    
    mean_trip = df['Trip Duration'].mean()
    print('mean trip duration: ',mean_tirp)

车站分析：

1、哪个起始车站（Start Station）最热门，哪个结束车站（End Station）最热门？

2.哪一趟行程最热门（即，哪一个起始站点与结束站点的组合最热门）？

def station(df):
    popular_start_station = df['Start Station'].mode()
    print('popular start station: ',popular_start_station)
    
    popular_end_station = df['End Station'].mode()
    print('popular end station: ',popular_end_station)
    
    popular_combine_trip = (df['Start Station'] + '-->'+ df['End Station']).mode()
    print('most frequent combination of start station and end station trip is: \n',most_frequent_trip)

用户分析：

1、每种用户类型(User Type）有多少人？

2.每种性别（Gender）有多少人？(华盛顿没有该列)

3.出生年份(BirthYear)最早的是哪一年、最晚的是哪一年，最常见的是哪一年？（华盛顿没有该列)

def user(df):
    user_type = df['User Type'].value_counts()
    print('user type: ', user_type)

#如果df中没有gender和birth year，要进行容错处理
#用try语句或者if语句：
#方法一：try语句
    try:
    
        gender = df['Gender'].value_counts()
        print('user gender: ',gender)
    
        earliest_year = df['Birth Year'].min()
        most_recent_year = df['Birth Year'].max()
        most_common_year = df['Birth Year'].mode()
    
        print('earliest year: ',earliest_year)
        print('most recent_ year: ',most_recent_year)
        print('most common year: ',most_commn_year)
    
    except:
        pass

    #方法二：if语句：
    if 'Gender' in df:
        gender = df['Gender'].value_counts()
        print('user gender: ',gender)
    
        earliest_year = df['Birth Year'].min()
        most_recent_year = df['Birth Year'].max()
        most_common_year = df['Birth Year'].mode()
    
        print('earliest year: ',earliest_year)
        print('most recent_ year: ',most_recent_year)
        print('most common year: ',most_commn_year)
        continue
        
    if 'Birth Year' in df:
       gender = df['Gender'].value_counts()
        print('user gender: ',gender)
    
        earliest_year = df['Birth Year'].min()
        most_recent_year = df['Birth Year'].max()
        most_common_year = df['Birth Year'].mode()
    
        print('earliest year: ',earliest_year)
        print('most recent_ year: ',most_recent_year)
        print('most common year: ',most_commn_year)
        continue

Python中首先执行最先出现的非函数定义和非类定义的没有缩进的代码，会从前到后执行。代码的执行顺序会从 if name == “main”:开始，执行里面的 main() 函数。

在这个 main() 函数中，是一个条件为True 的 while 循环，除非被下方的if条件语句打破，否则循环会一直继续下去。

def main():
    while True:
        city, month, day = filter()
        df = load_file(city, month, day)

        Start_Time(df)
        Trip_Duration(df)
        station(df)
        user(df)

        restart = input('\nWould you like to restart? Enter yes or no.\n')
        if restart.lower() != 'yes':
            break

if __name__ == "__main__":
	main()