机器学习—科学数据包(六)时间事件日志
背景
- 使用dida365.com 作为数据文件
- 导出数据
- 分析数据
读取数据
%matplotlib inline
import pandas as pd
import matplotlib.pyplot as plt
In [3]:
from matplotlib.pylab import mpl
mpl.rcParams['font.sans-serif']=['Arial Unicode MS']#指定默认字体
mpl.rcParams['axes.unicode_minus']=False #解决保存图像是负号‘-’显示为方块问题
In [9]:
def _parser_date(dsstr):
return pd.Timestamp(dsstr).date()
data=pd.read_csv('dida365.csv',header=3,index_col='Due Date',parse_dates=True,date_parser=_parser_date)
data
Out[9]:
List Name Title Content Is Checklist Reminder Repeat Priority Status Completed Time Order Timezone Is All Day
Due Date
2016-05-24 自我成长 [编程] javascript exercism [1h] NaN N NaN NaN 0 2 2016-05-25T14:15:10+0000 -235295488344064 Asia/Shanghai True
2016-05-23 自我成长 [编程] javascript exercism [0.5h] NaN N NaN NaN 0 2 2016-05-24T15:59:08+0000 -234195976716288 Asia/Shanghai True
2016-05-23 自我成长 [编程] clojure ring request [2h] 阅读 ring.util.request 源码\r N NaN NaN 0 2 2016-05-24T15:58:56+0000 -233096465088512 Asia/Shanghai True
2016-05-22 自我成长 [编程] clojure ring 入门 [30m] NaN N NaN NaN 0 2 2016-05-23T15:03:24+0000 -231996953460736 Asia/Shanghai True
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2016-05-14 自我成长 [探索发现] vim 查找替换及正则表达式 [1h] NaN N NaN NaN 0 2 2016-05-15T16:05:51+0000 -222101348810752 Asia/Shanghai True
2016-05-14 自我成长 [编程] Clojure 程序设计 exercism [5h] NaN N NaN NaN 0 2 2016-05-15T16:05:21+0000 -221001837182976 Asia/Shanghai True
2016-05-13 自我成长 [探索发现] 使用 github 进行程序员招聘 [1h] N NaN NaN 0 2 2016-05-14T16:47:33+0000 -219902325555200 Asia/Shanghai True
2016-05-13 自我成长 [编程] Clojure 程序设计 exercism [3h] NaN N NaN NaN 0 2 2016-05-14T16:46:49+0000 -218802813927424 Asia/Shanghai True
2016-05-12 自我成长 [编程] Clojure 程序设计 exercism [3h] NaN N NaN NaN 0 2 2016-05-13T16:19:12+0000 -217703302299648 Asia/Shanghai True
2016-05-11 自我成长 [编程] Clojure 程序设计 exercism [2h] NaN N NaN NaN 0 2 2016-05-13T16:18:55+0000 -216603790671872 Asia/Shanghai True
2016-05-10 自我成长 [编程] Clojure 程序设计 exercism [4h] NaN N NaN NaN 0 2 2016-05-11T16:04:40+0000 -215504279044096 Asia/Shanghai True
2016-05-09 自我成长 [编程] Clojure 程序设计 exercism [4h] NaN N NaN NaN 0 2 2016-05-11T16:04:31+0000 -214404767416320 Asia/Shanghai True
2016-05-08 自我成长 [编程] Clojure 程序设计 exercism [4h] NaN N NaN NaN 0 2 2016-05-09T15:32:29+0000 -213305255788544 Asia/Shanghai True
2016-05-07 自我成长 [编程] Clojure 程序设计 exercism [4h] NaN N NaN NaN 0 2 2016-05-09T15:32:14+0000 -212205744160768 Asia/Shanghai True
2016-05-06 自我成长 [编程] Clojure 程序设计 [4h] NaN N NaN NaN 0 2 2016-05-07T17:53:44+0000 -211106232532992 Asia/Shanghai True
2016-05-05 自我成长 [编程] Clojure 程序设计 [4h] NaN N NaN NaN 0 2 2016-05-07T17:53:24+0000 -210556476719104 Asia/Shanghai True
2016-05-03 自我成长 [编程] 函数式编程 [2h] NaN N NaN NaN 0 2 2016-05-05T17:13:52+0000 -210006720905216 Asia/Shanghai True
2016-05-04 自我成长 [编程] 函数式编程 [1h] NaN N NaN NaN 0 2 2016-05-05T17:13:16+0000 -209456965091328 Asia/Shanghai True
2016-05-04 自我成长 [编程] pandas 数据可视化录音剪辑 [2h] NaN N NaN NaN 0 2 2016-05-05T17:12:43+0000 -208907209277440 Asia/Shanghai True
2016-05-02 自我成长 [编程] 函数式编程 [1h] NaN N NaN NaN 0 2 2016-05-03T15:58:08+0000 -208357453463552 Asia/Shanghai True
2016-05-02 自我成长 [编程] Clojure 程序设计 [1h] 《Functional Programming Patterns in Scala and ... N NaN NaN 0 2 2016-05-03T15:56:51+0000 -207257941835776 Asia/Shanghai True
2016-05-01 自我成长 [编程] Clojure 程序设计 [3h] NaN N NaN NaN 0 2 2016-05-03T06:16:08+0000 -206158430208000 Asia/Shanghai True
... ... ... ... ... ... ... ... ... ... ... ... ...
2015-12-15 自我成长 [阅读] 《把时间当作朋友》[1h] NaN N NaN NaN 0 2 2015-12-20T02:56:26+0000 -35184372088832 Asia/Shanghai True
2015-12-14 自我成长 [阅读] 《把时间当作朋友》[1h] NaN N NaN NaN 0 2 2015-12-20T02:56:05+0000 -34084860461056 Asia/Shanghai True
2015-12-11 自我成长 [阅读] 《把时间当作朋友》[4h] NaN N NaN NaN 0 2 2015-12-20T02:55:38+0000 -32985348833280 Asia/Shanghai True
2015-12-18 自我成长 [写作] 基于协同过滤算法的推荐系统 [1.5h] NaN N NaN NaN 0 2 2015-12-19T15:50:32+0000 -31885837205504 Asia/Shanghai True
2015-12-18 自我成长 [机器学习] spleen_pycon2015 05-Validation [1h] NaN N NaN NaN 0 2 2015-12-19T14:06:18+0000 -26388279066624 Asia/Shanghai True
2015-12-18 自我成长 [编程] urllib 介绍(续)使用 pycharm 重录 [3h] NaN N NaN NaN 0 2 2015-12-18T16:59:43+0000 -25288767438848 Asia/Shanghai True
2015-12-17 自我成长 [机器学习] sklean_pycon2015 05-Validation [0.5h] NaN N NaN NaN 0 2 2015-12-18T09:00:29+0000 -24739011624960 Asia/Shanghai True
2015-12-16 自我成长 [编程] urllib 介绍使用 pycharm 重录 [3h] NaN N NaN NaN 0 2 2015-12-18T08:10:27+0000 -23639499997184 Asia/Shanghai True
2015-12-04 自我成长 [阅读]《把时间当朋友》[3h] NaN N NaN NaN 0 2 2015-12-11T06:47:01+0000 -3298534883328 Asia/Shanghai True
2015-12-02 自我成长 [写作]《支持向量机 SVM 算法》[3h] NaN N NaN NaN 0 2 2015-12-11T06:46:09+0000 -2199023255552 Asia/Shanghai True
2015-12-06 自我成长 [写作] 《支持向量机核函数》[4h] NaN N NaN NaN 0 2 2015-12-11T06:45:38+0000 -1099511627776 Asia/Shanghai True
2015-12-07 自我成长 [写作] 《K 均值算法》[1h] NaN N NaN NaN 0 2 2015-12-11T06:44:43+0000 0 Asia/Shanghai True
2015-12-09 自我成长 [写作] 培训文档 《使用 sublime + plantuml 画图》[4h] NaN N NaN NaN 0 2 2015-12-11T05:02:10+0000 -4398046511104 Asia/Shanghai True
数据清洗
- 只关心己完成或己达成的事件,即 status != 0 的事件
- 只需要 List Name 和 Title 字段
df=data[data.Status !=0][['List Name','Title']]
List Name Title
Due Date
2016-05-24 自我成长 [编程] javascript exercism [1h]
2016-05-23 自我成长 [编程] javascript exercism [0.5h]
2016-05-23 自我成长 [编程] clojure ring request [2h]
^^^^^^^^^^^^^^^^^^^^^^^
数据解析
将title的文字与花费的时间用正则表达式解析出来
def parse_tag(value):
m = re.match(r'^(\[(.*?)\])?.*$', value)
if m and m.group(2):
return m.group(2)
else:
return '其他'
def parse_duration(value):
m = re.match(r'^.+?\[(.*?)([hm]?)\]$', value)
if m:
dur = 0
try:
dur = float(m.group(1))
except e:
print('parse duration error: \n%s' % e)
if m.group(2) == 'm':
dur = dur / 60.0
return dur
else:
return 0
titles = df['Title']
df['Tag'] = titles.map(parse_tag)
df['Duration'] = titles.map(parse_duration)
df.head()
Out[14]:
List Name Title Tag Duration
Due Date
2016-05-24 自我成长 [编程] javascript exercism [1h] 编程 1.0
2016-05-23 自我成长 [编程] javascript exercism [0.5h] 编程 0.5
2016-05-23 自我成长 [编程] clojure ring request [2h] 编程 2.0
2016-05-22 自我成长 [编程] clojure ring 入门 [30m] 编程 0.5
2016-05-22 自我成长 [探索发现] 体验 iMac 开发环境 [3h] 探索发现 3.0
统计个数 : .count()
起始时间 :.index.min()
终止时间: .index.max()
数据分析
时间总览
平均每天投资在自己身上的时间是多少?-> 全部时间 / 总天数
In [20]:
end_date=df.index.max().date()
end_date
Out[20]:
datetime.date(2016, 5, 24)
In [21]:
start_date=df.index.min().date()
start_date
Out[21]:
datetime.date(2015, 12, 2)
In [27]:
df.Duration.sum()/(end_date-start_date).days
Out[27]:
2.771264367816092
精力分配
In [28]:
df.groupby('Tag').sum()
Out[28]:
Duration
Tag
写作 49.0
探索发现 54.5
机器学习 33.5
电影 50.8
编程 243.4
阅读 51.0
专注力
长时间学习某项技能的能力
programming=df[df['Tag']=='编程']
programming.resample('m', how='sum').to_period(freq='m').plot(kind='bar', figsize=(8, 8), fontsize=16)
连续时间的精力分配
df2=df.reset_index().groupby(['Due Date','Tag']).sum()
df3=df2.reset_index().pivot(index='Due Date',columns='Tag',values='Duration')
df3.fillna(0)
df4=df3.reindex(pd.date_range(start_date,end_date))
df4.plot(kind='bar',stacked=True,figsize=(16,8))
df4.resample('m',how='sum').to_period(freq='m').plot(kind='bar',figsize=(8,8),stacked=True)