机器学习-推荐系统编程作业

在这里插入图片描述

一、正则化的代价函数

在这里插入图片描述
Y:用户评分矩阵
R:用户是否评分
设影片个数为n,用户个数为m,则Y和R都为n×m大小矩阵
X:电影特征矩阵
theta:用户偏好矩阵
param:序列化的X和theta

def serialize(X, theta):
    """序列化两个矩阵"""
    # X (movie, feature), (1682, 10): movie features
    # theta (user, feature), (943, 10): user preference
    return np.concatenate((X.ravel(), theta.ravel()))


def deserialize(param, n_movie, n_user, n_features):
    """逆序列化"""
    return param[:n_movie * n_features].reshape(n_movie, n_features), param[n_movie * n_features:].reshape(n_user, n_features)

def cost(param, Y, R, n_features):
    """compute cost for every r(i, j)=1
    Args:
        param: serialized X, theta
        Y (movie, user), (1682, 943): (movie, user) rating
        R (movie, user), (1682, 943): (movie, user) has rating
    """
    # theta (user, feature), (943, 10): user preference
    # X (movie, feature), (1682, 10): movie features
    n_movie, n_user = Y.shape #通过Y获得电影数和用户数信息
    # 逆序列化得到X和theta矩阵
    X, theta = deserialize(param, n_movie, n_user, n_features) 
	#所有已评分的电影的误差计算
    inner = np.multiply(X @ theta.T - Y, R)

    return np.power(inner, 2).sum() / 2

正则化代价函数:

def regularized_cost(param, Y, R, n_features, l=1):
    reg_term = np.power(param, 2).sum() * (l / 2)

    return cost(param, Y, R, n_features) + reg_term

二、正则化的梯度函数

在这里插入图片描述

def gradient(param, Y, R, n_features):
    # theta (user, feature), (943, 10): user preference
    # X (movie, feature), (1682, 10): movie features
    n_movies, n_user = Y.shape
    X, theta = deserialize(param, n_movies, n_user, n_features)

    inner = np.multiply(X @ theta.T - Y, R)  # (1682, 943)

    # X_grad (1682, 10)
    X_grad = inner @ theta

    # theta_grad (943, 10)
    theta_grad = inner.T @ X

    # roll them together and return
    return serialize(X_grad, theta_grad)

正则化梯度函数

def regularized_gradient(param, Y, R, n_features, l=1):
    grad = gradient(param, Y, R, n_features)
    reg_term = l * param

    return grad + reg_term

三、推荐系统

数据初始化:

movies = Y.shape[0]  # 1682
users = Y.shape[1]  # 944
features = 10 #电影特征
learning_rate = 10.
# 对X和theta进行随机初始化,范围 : (0,1)
X = np.random.random(size=(movies, features))
theta = np.random.random(size=(users, features))
params = serialize(X, theta)
# 对y进行归一化
Y_norm = Y - Y.mean()
Y_norm.mean()

计算

from scipy.optimize import minimize

fmin = minimize(fun=regularized_cost, x0=params, args=(Y_norm, R, features, learning_rate), 
                method='TNC', jac=regularized_gradient)

将训练好的数据进行逆序列化得到X和theta:

X_trained, theta_trained = deserialize(fmin.x, movies, users, features)

通过训练数据得出推荐电影:

prediction = X_trained @ theta_trained.T
my_preds = prediction[:, 0] + Y.mean()
idx = np.argsort(my_preds)[::-1]  # Descending order
  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值