用pandas以时间为维度处理 2017年chinavis挑战赛2数据

2017年chinavis挑战赛2的数据内容我就不介绍了,具体请参考我的上一篇博客。之前我以网吧编号(或则说是地理维度)处理统计了一些数据,这里我将以从2016年10月1日至12月31日的时间为维度,粒度大小分别为day、hour、min。

代码如下:

import pandas as pd
from datetime import datetime
import time

class time:
    da = pd.read_csv('C:\\Users\\Administrator\\Desktop\\2017\\hydata_swjl_0.csv', sep=',', low_memory=False, header=0)
    da = da.drop(['PERSONID', 'XB', 'CUSTOMERNAME'], axis=1)
    t_serirs=pd.date_range(start=datetime(2016, 10, 1), end=datetime(2016, 12, 31), freq='Min')
    dict_m = {'TIME': t_serirs}
    df_m = pd.DataFrame(dict_m)
    df_m[['TIME']] = df_m[['TIME']].astype(str)
    df_m['TIME'] = df_m['TIME'].apply(lambda x: time.mktime(time.strptime(x, '%Y-%m-%d %H:%M:%S')))
    time_local = [time.localtime(x) for x in df_m['TIME']]

    #以天为粒度
    t1 = [time.strftime('%Y%m%d', x)for x in time_local]
    dict_m1 = {'day_time': t1}
    day_time = pd.DataFrame(dict_m1)
    day_time = day_time.groupby('day_time').sum()
    da[['ONLINETIME']] = da[['ONLINETIME']].astype(str)
    a = da[['ONLINETIME']].astype(float)
    b = a / 1000000
    b = b.astype(int)
    c = b.astype(str)
    da['day_time'] = c

    #以小时为粒度
    t2 = [time.strftime('%Y%m%d%H', x) for x in time_local]
    dict_m2 = {'hour_time': t2}
    hour_time = pd.DataFrame(dict_m2)
    hour_time = hour_time.groupby('hour_time').sum()
    da[['ONLINETIME']] = da[['ONLINETIME']].astype(str)
    a = da[['ONLINETIME']].astype(float)
    b = a / 10000
    b = b.astype(int)
    c = b.astype(str)
    da['hour_time'] = c

    #以分钟为粒度
    t3 = [time.strftime('%Y%m%d%H%M', x) for x in time_local]
    dict_m3 = {'min_time': t3}
    min_time = pd.DataFrame(dict_m3)
    min_time = min_time.astype(float)
    min_time = min_time.astype(str)
    da[['ONLINETIME']] = da[['ONLINETIME']].astype(str)
    a = da[['ONLINETIME']].astype(float)
    b = a / 100
    b = b.round(decimals=0)
    c = b.astype(str)
    da['min_time'] = c


    #计算每天上网的人数
    M = da['day_time'].value_counts()
    dict_m = {'day_time': M.index, 'day_freq': M.values}
    df_m = pd.DataFrame(dict_m)

    merged1 = pd.merge(day_time, df_m, how='left', left_on=['day_time'], right_on=['day_time'])
    merged1.fillna(0, inplace=True)
    merged1[['day_freq']] = merged1[['day_freq']].astype(int)
    merged1.to_csv('C:\\Users\\Administrator\\Desktop\\2017\\day_time.csv', index=0, sep=',')

    #计算每个小时上网的人数
    M = da['hour_time'].value_counts()
    dict_m = {'hour_time': M.index, 'hour_freq': M.values}
    df_m = pd.DataFrame(dict_m)

    merged2 = pd.merge(hour_time, df_m, how='left', left_on=['hour_time'], right_on=['hour_time'])
    merged2.fillna(0, inplace=True)
    merged2[['hour_freq']] = merged2[['hour_freq']].astype(int)
    merged2.to_csv('C:\\Users\\Administrator\\Desktop\\2017\\hour_time.csv', index=0, sep=',')


    #计算每分钟上网的人数
    M = da['min_time'].value_counts()

    dict_m = {'min_time': M.index, 'min_freq': M.values}
    df_m = pd.DataFrame(dict_m)
    
    merged3 = pd.merge(min_time, df_m, how='left', left_on=['min_time'], right_on=['min_time'])
    merged3.fillna(0, inplace=True)
    merged3[['min_freq']] = merged3[['min_freq']].astype(int)
    merged3.to_csv('C:\\Users\\Administrator\\Desktop\\2017\\min_time.csv', index=0, sep=',')

运行结果:

1、以天为粒度:

2、以小时为粒度

3、以分钟为粒度

  • 0
    点赞
  • 7
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值