- 数据集:本文用的是Movielens ml-100k.zip
- 本文为译文,原文链接:
Let’s begin
1.数据集情况,
# u.user文件中为user_id,age,occupation,zip_code,格式如下:
# u.data文件中为user_id,movie_id,rating,unix_timestamp,格式如下:
# u.item文件中为movie_id,title, release_date, video_release_date,imdb_url,格式如下:
import pandas as pd
import numpy as np
import matplotlib.pylab as plt
%matplotlib inline
u_cols = ['user_id', 'age', 'sex', 'occupation', 'zip_code']
users = pd.read_csv('u.user', sep='|', names=u_cols,encoding='latin-1')
r_cols = ['user_id', 'movie_id', 'rating', 'unix_timestamp']
ratings = pd.read_csv('u.data', sep='\t', names=r_cols,encoding='latin-1')
m_cols = ['movie_id', 'title', 'release_date', 'video_release_date', 'imdb_url']
movies = pd.read_csv('u.item', sep=