Table_2: Users’shopping records at brick-and-mortar stores before Dec. 2015. (ijcai2016_koubei_train)
User_id、Merchant_id、Location_id 、Time_Stamp
Table_3: Merchant information. (ijcai2016_merchant_info)
Merchant_id 、Budget(budget constraints imposed on the merchant) 、Location_id_list
Table_4: Prediction result. (ijcai2016_koubei_test)
User_id、Location_id、Merchant_id_list
首先统计出下面几个字典:
{用户:[商家]}
{商家:[用户]}
{商家:商家热度} (热度用用户访问次数来表示)
{商圈:{商家:商家热度}} (方便后面的推荐)
商家间的相似度矩阵(其实是存储在字典中)
1、对于某用户,若其所在商圈内商家少于10个,则全部推荐,并且按照商家热度从高到低排序。
2、冷启动问题,若该用户之前没有使用过口碑,即没有历史商家记录,则推荐所在商圈热门的前十个商家(需排序)。
3、如果有历史商家记录,分别对消费过的商家找到相似商家和相似度,对于所有消费过的商家设置等权,累计同一商家的相似度,按照相似度从大到小排序,推荐前十个。
# -*- coding:utf-8 -*-
__author__ = 'Bai'
import os,math,csv
os.chdir('c:/Bai/datasets/tianchi_ijcai2016')
f1 = open('ijcai2016_merchant_info')
context1 = f1.readlines()
f2 = open('ijcai2016_koubei_train')
context2 = f2.readlines()
f3 = open('ijcai2016_koubei_test')
context3 = f3.readlines()
csvfile = file('forcast.csv','wb')
writer = csv.writer(csvfile)
#统计字典: {用户:[商家]}、{商家:[用户]} 及 {商家:商家热度}
user_merchant = {}
merchant_user = {}
merchant_popular = {}
for x in context2:
x = x.replace('\n','').split(',')
if x[0] not in user_merchant:
user_merchant[x[0]] = set()
user_merchant[x[0]].add(x[1])
if x[1] not in merchant_user:
merchant_user[x[1]] = set()
merchant_popular[x[1]] = 0
merchant_user[x[1]].add(x[0])
merchant_popular[x[1]] += 1
#统计字典: {商圈:{商家:商家热度}}
district_merchant = {}
for x in context1:
x = x.replace('\n','').split(',')
y = x[2].split(':')
for j in y:
if j not in district_merchant:
district_merchant[j] = {}
district_merchant[j][x[0]] = merchant_popular[x[0]]
#统计商家间的相似度矩阵
C = {}
N = {}
for user in user_merchant:
for i in user_merchant[user]:
if i not in N:
N[i] = 0
N[i] += 1
if i not in C:
C[i] = {}
for j in user_merchant[user]:
if i == j:
continue
if j not in C[i]:
C[i][j] = 0
C[i][j] += 1
W = {}
for i in C:
if i not in W:
W[i] = {}
for j in C[i]:
W[i][j] = C[i][j]/math.sqrt(N[i] * N[j])
for i in context3:
i = i.replace('\n','').split(',')
#若商圈内商家数量少于十个,则全部推荐
if len(district_merchant[i[1]]) <= 10:
#将商家按照热度排序
s = sorted(district_merchant[i[1]].iteritems(),key=lambda t:t[1],reverse=True)
for j in range(len(s)):
if j == 0:
a = s[j][0]
else:
a = a + ':' + s[j][0]
#冷启动:若用户没有口碑商家记录,则推荐所在商区热门商家前十个,按照商家热度从大到小的顺序
elif i[0] not in user_merchant:
s = sorted(district_merchant[i[1]].iteritems(),key=lambda t:t[1],reverse=True)
for j in range(10):
if j == 0:
a = s[j][0]
else:
a = a + ':' + s[j][0]
else:
e = set()
sim_merchant = {}
for u in user_merchant[i[0]]:
for v in W[u]:
if v not in sim_merchant:
sim_merchant[v] = 0
sim_merchant[v] += W[u][v]
s2 = sorted(sim_merchant.iteritems(),key=lambda t:t[1],reverse=True)
j = 0
for k in s2:
e.add(k[0])
if j == 10:
break
if j == 0:
a = k[0]
else:
a = a + ':' + k[0]
j += 1
if j < 10:
s = sorted(district_merchant[i[1]].iteritems(),key=lambda t:t[1],reverse=True)
l = 0
n = 0
while n < 10-j:
if s[l][0] in e:
continue
else:
if len(a) == 0:
a = s[l][0]
else:
a = a + ':' + s[l][0]
n += 1
l += 1
i.append(a)
writer.writerow(i)
csvfile.close()