理解梯度下降,随机梯度下降,附电影推荐系统的简单代码小样 2

这是这一title 的下半部分,主要是因为这个浏览器好像缓存不了那么多东西,所以写到某一个临界点的时候,总是崩溃,要死了我都。


最后一部分,老师给了八十万行的数据,让我们自行处理,本来是要按照上面的代码处理一下就好了,我自己写了个三维的图。


import pandas as pd
#three dimensions, x is item y is rating z is the num of people who rating this item
#axis x
x = np.array(list(set(Y4.item)))#could be considered as the name of movie


#axis y
rating_mean = Y4.rating.mean()
#Y4 represents the original dataframe (because there was a same Y as the question 3, I have changed it to make sure that there is no relation between these two questions)
Y4.rating -= Y4.rating.mean()
#get ratings
y = pd.DataFrame(np.linspace(rating_mean,rating_mean,x.shape[0]+1))# because we need to drop one column we need to add extra column
#gY.index=x
y = y.drop(0)#don't need to get a forloop to update the index, delete the column 0 directly


#get user 943
users = np.array(range(1,944))
def movie_stochastic_gradient(Y4, y):
    gy = pd.DataFrame(np.zeros(y.shape), index=y.index)
    random_user = users[np.random.randint(users.shape[0]-1)]#is the same as 'np.random.randint(users.size)'
    items = list(set(Y4.item[Y4.user==random_user]))
    #items = list(Y4.item[Y4.user==1])
    #print(items)
    #get all the ratings from this user (there are some same nums)
    Y4_newform = Y4[Y4.user==random_user]#get a new form only belonged to this random_user and then we could easily get the rating
    for item in items:
        rating = list(set(Y4_newform.rating[Y4_newform.item==item]))[0]# in this form the same items and ratings have repeated several times
        #print(y[item])
        
        gy[0][gy.index==item] +=2*(y[0][y.index==item] - rating)
    return gy


learning_rate = 0.01
iterations = 100
for i in range(iterations):
    gy = movie_stochastic_gradient(Y4, y)
    print('^_^ We have iterated', i, 'times.')
    y -= learning_rate*gy
    #print(y)

# axis z the num of users who rated the same film
z = np.zeros(x.shape) # index means the name of film, z[x] means one movie was rated for x times
for item in Y4.item:
    z[item-1] += 1


#show the 3d map
import pylab as py
import mpl_toolkits.mplot3d.axes3d as p3 
fig = py.figure()
ax = p3.Axes3D(fig)
ax.scatter(x,y,z)
ax.set_xlabel('film')
ax.set_ylabel('rating')
ax.set_zlabel('users_num')
fig.add_axes(ax)
py.show()


主要选取了三个变量,电影的名字,电影被评论的次数以及电影受到用户影响之后的评分。

最后的结果大概是这个样子:


  • 3
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值