数据来源于阿里云天池,为淘宝app平台在2014年11月18日-12月18日的数据。
数据处理
导入相关的包,设置seaborn
的绘图风格:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
sns.set()
使用open
看一下数据形式:
filename = 'tianchi_mobile_recommend_train_user.csv'
with open(filename) as f:
for _ in range(5):
line = f.readline()
line.strip()
print(line)
user_id,item_id,behavior_type,user_geohash,item_category,time
98047837,232431562,1,,4245,2014-12-06 02
97726136,383583590,1,,5894,2014-12-09 20
98607707,64749712,1,,2883,2014-12-18 11
98662432,320593836,1,96nn52n,6562,2014-12-06 10
使用read_csv
读取数据:
data = pd.read_csv(filename, sep=',')
data.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 12256906 entries, 0 to 12256905
Data columns (total 6 columns):
# Column Dtype
--- ------ -----
0 user_id int64
1 item_id int64
2 behavior_type int64
3 user_geohash object
4 item_category int64
5 time object
dtypes: int64(4), object(2)
memory usage: 561.1+ MB
data.head()
将behavior_type
的值替换为对应的行为:
# 将behavior列改变
behavior_mapping = {