代码库:Pandas_时间处理

Table of Contents

Pandas 时间处理

  • 时间序列的三大基础数据类型:
    1. 时间戳 timestamp
    2. 时间周期 period
    3. 时间增量 timedelta
import numpy as np
import pandas as pd

时间戳

  • 单个时间戳:Timestamp
  • 多个时间戳构成的序列:DatetimeIndex

Numpy

date = np.array('2021-02-03', dtype=np.datetime64)

'''
可以被解析的时间格式一般为year-mouth-day
numpy的时间类型:np.datetime64
'''
'\n可以被解析的时间格式一般为year-mouth-day\nnumpy的时间类型:np.datetime64\n'
date
array('2021-02-03', dtype='datetime64[D]')
np.arange(12)
array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11])
# WTF???
date + np.arange(12)
array(['2021-02-03', '2021-02-04', '2021-02-05', '2021-02-06',
       '2021-02-07', '2021-02-08', '2021-02-09', '2021-02-10',
       '2021-02-11', '2021-02-12', '2021-02-13', '2021-02-14'],
      dtype='datetime64[D]')
date = np.datetime64('2020-12-21T13:00', 'ms')

'''
T 分割天和小时

第二个参数指定单位:
YMWDHMS ms us ns ps
'''
'\nT 分割天和小时\n\n第二个参数指定单位:\nYMWDHMS ms us ns ps\n'
date
numpy.datetime64('2020-12-21T13:00:00.000')

Pandas

字符串和时间类型转换

# string -> time
pd.to_datetime('10/11/12')
Timestamp('2012-10-11 00:00:00')
pd.to_datetime('10/11/12', yearfirst=True)

'''
dayfirst=True: 日先
yearfirst=True
'''
Timestamp('2010-11-12 00:00:00')
date = pd.to_datetime('1th July 2020 13:12:00')
date
Timestamp('2020-07-01 13:12:00')
# 用format解决异常时间字符串的问题
pd.to_datetime('018-02-03', format='0%y-%m-%d')
Timestamp('2018-02-03 00:00:00')
# time -> string
date.strftime('%Y-%m-%d %H:%M:%S')
'2020-07-01 13:12:00'
%a
Weekday as locale’s abbreviated name.
Sun, Mon,, Sat (en_US);

%A
Weekday as locale’s full name.
Sunday, Monday,, Saturday (en_US);

%w
Weekday as a decimal number, where 0 is Sunday and 6 is Saturday.
0, 1,, 6

%d
Day of the month as a zero-padded decimal number.
01, 02,, 31

%B
Month as locale’s full name.
January, February,, December (en_US);
Januar, Februar,, Dezember (de_DE)

%m
Month as a zero-padded decimal number.
01, 02,, 12

%y
Year without century as a zero-padded decimal number.
00, 01,, 99

%Y
Year with century as a decimal number.
0001, 0002,, 2013, 2014,, 9998, 9999

%H
Hour (24-hour clock) as a zero-padded decimal number.
00, 01,, 23

%p
Locale’s equivalent of either AM or PM.
AM, PM (en_US);

%M
Minute as a zero-padded decimal number.
00, 01,, 59

%S
Second as a zero-padded decimal number.
00, 01,, 59

%f
Microsecond as a decimal number, zero-padded on the left.
000000, 000001,, 999999

%Z
Time zone name (empty string if the object is naive).
(empty), UTC, GMT

%x
Locale’s appropriate date representation.
08/16/88 (None);
08/16/1988 (en_US);
16.08.1988 (de_DE)

%X
Locale’s appropriate time representation.
21:30:00 (en_US);
21:30:00 (de_DE)
(1)

%%
A literal '%' character.

获取时间信息

# 获取时间信息1
# 注意是Timestamp类型
print(
    '\n1:',date.year,
    '\n2:',date.month,
    '\n3:',date.week,# == date.weekofyear
    '\n4:',date.day,
    '\n5:',date.hour,
    '\n6:',date.minute,
    '\n7:',date.second,
    '\n8:',date.microsecond,
    '\n9:',date.nanosecond,
    '\n10:',date.day_name(),
    '\n11:',date.dayofweek, # 周一 = 0,周日 = 6
    '\n12:',date.dayofyear,
    '\n13:',date.weekofyear,
    '\n14:',date.days_in_month,# number of days in the month
    
    '\n15:',date.weekday(), # == dayofweek
)
1: 2020 
2: 7 
3: 27 
4: 1 
5: 13 
6: 12 
7: 0 
8: 0 
9: 0 
10: Wednesday 
11: 2 
12: 183 
13: 27 
14: 31 
15: 2
# 获取时间信息2
print(
    date.is_leap_year, #闰年
    date.is_month_start,
    date.is_month_end,
    date.is_quarter_start, #季度
    date.is_quarter_end,
    date.is_year_start,
    date.is_year_end,
)
True True False True False False False 3
# 获取时间信息3
# 转换为Series从而引入apply方法
date_series = pd.Series(date)
date_series
0   2020-07-01 13:12:00
dtype: datetime64[ns]

apply时间信息

# Series类型的日期(即一系列时间戳,或dataframe的一整列
# 都可以使用.dt的方式访问上面的方法

# 判断周末 weekend
print(date_series.dt.dayofweek.apply(lambda x: 1 if x >= 4 and x <= 6 else 0),
      # 判断 13:00 ~ 15:00 之间
      date_series.dt.hour.apply(lambda x: 1 if x >= 13 and x <= 15 else 0),
      sep = '\n'
     )
0    0
dtype: int64
0    1
dtype: int64
# 构造datetime index
date_index = pd.DatetimeIndex(['2010-01-04', '2010-01-05'])
data = pd.Series([1, 2], index=date_index)
data
2010-01-04    1
2010-01-05    2
dtype: int64
data['2010']
2010-01-04    1
2010-01-05    2
dtype: int64

在读取文件期间parse datetime

  • 如parse过程需要其他处理,建议单独parse
test_tsv = pd.read_csv('./test_tsv.tsv', usecols=['customer_id', 'review_date'], sep = '\t')
test_tsv
customer_idreview_date
0406265228/31/2015
1162900228/31/2015
2102165098/31/2015
31140408/31/2015
4279715798/31/2015
.........
18934215731365/24/2004
18935196067064/4/2004
18936257641554/4/2004
189372816230112/2/2003
18938401732844/27/2003

18939 rows × 2 columns

test_tsv = pd.read_csv('./test_tsv.tsv', usecols=['customer_id', 'review_date'], \
                       parse_dates=['review_date'], sep = '\t')[['review_date', 'customer_id']]
test_tsv.sort_values(by='review_date', inplace=True)
test_tsv.reset_index(inplace=True, drop=True)
test_tsv
review_datecustomer_id
02003-04-2740173284
12003-12-0228162301
22004-04-0419606706
32004-04-0425764155
42004-05-2421573136
.........
189342015-08-312761934
189352015-08-3112996130
189362015-08-319603909
189372015-08-3115312194
189382015-08-3140626522

18939 rows × 2 columns

  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值