一、关联规则
通过电影观看记录来找到关联的电影以便推荐
1.电影表
import pandas as pd
df_movie=pd.read_csv('E:/Jupyter workspace/python_for_data_science/Data/movies.csv')
df_movie.head()
构建字典,便于查询
#电影编号-电影名 字典
movie_dic = {}
for rec in df_movie.iterrows():
movie_dic[rec[1].movieId] = rec[1].title
#movie_dic=df_movie.set_index('movieId')['title'].to_dict()
2.观影记录表
df = pd.read_csv('E:/Jupyter workspace/python_for_data_science/Data/ratings.csv')
df.head()
只选取2012年之后的记录
df = df[df['timestamp'] >= 1325376000]
3.根据个人分组,将所看的电影形成对应列表,形成序列,转为列表
use_movie=df.groupby('userId')['movieId'].apply(list)#按照userId,取每个人
use_movie
Series转列表
transactions=use_movie.values.tolist()
#transactions=[ele for ele in use_movie]
4.使用Apriori算法
from apyori import apriori
rules=apriori(transactions,min_support=0.2,min_confidence=0.5,min_lift=3,min_length=2)#设置参数,建立关联规则
5.列表化规则,从规则中取出内容
results=list(rules)#列表化
for rec in results:
#print([item for item in rec.items])
print([movie_dic.get(item) for item in rec.items])
二、频繁样式勘探
from pymining import itemmining
#设置参数
fp_input=itemmining.get_fptree(transactions)
report =itemmining.fpgrowth(fp_input,min_support=30,pruning=True)
#取出内容
for ele in report:
if len(ele)>=6:
#print([item for item in rec.items])
print([movie_dic.get(item) for item in ele])
三、购物篮案例
import pandas as pd
df=pd.read_csv('E:/Jupyter workspace/python_for_data_science/Data/Market_Basket.csv',header=None)
df.head()
设置参数,创建规则
trans=[r.values.tolist() for i,r in df.astype('str').iterrows()]#转换成列表
from apyori import apriori
rules=apriori(trans,min_support=0.003,min_confidence=0.2,min_lift=3,min_length=2)#设置参数,建立关联规则
results=list(rules)#列表化
打印规则明细
for rec in results:
left_hands = rec.ordered_statistics[0].items_base
right_hands = rec.ordered_statistics[0].items_add
l = ';'.join([item for item in left_hands])
r = ';'.join([item for item in right_hands])
print('{} => {}'.format(l,r))
两两枚举
itemsets=[]
for rec in results:
for ele in itertools.combinations(rec.items,2):
itemsets.append(ele)
df2=pd.DataFrame(itemsets)
df2.columns=['Source','Target']
df2['Type']='undirected'
df2.to_csv('G:\\temp files\\trans.csv')