《利用Python进行数据分析》 电影数据

由于原书中都是python2版本的代码,有些方法已过期,在python3中执行不成功。修改后的python3可执行的完整代码如下:

import pandas as pd
unames = ['user_id','gender','age','occupation','zip']
users = pd.read_table('E:/python/geany_workspace/pydata-book-2nd-edition/datasets/movielens/users.dat',sep='::',engine='python',header = None,names=unames)

rnames = ['user_id','movie_id','rating','timestamp']
ratings = pd.read_table('E:/python/geany_workspace/pydata-book-2nd-edition/datasets/movielens/ratings.dat',sep='::',engine='python',header=None,names=rnames)

mnames = ['movie_id','title','genres']
movies = pd.read_table('E:/python/geany_workspace/pydata-book-2nd-edition/datasets/movielens/movies.dat',sep='::',engine='python',header=None,names=mnames)

data = pd.merge(pd.merge(ratings,users),movies)
data[:5]

mean_ratings = data.pivot_table('rating',index =['title'],columns = ['gender'],aggfunc = 'mean')
mean_ratings[:5]

ratings_by_title = data.groupby('title').size()
ratings_by_title[:10]

active_titles = ratings_by_title.index[ratings_by_title>=250]
mean_ratings = mean_ratings.loc[active_titles]
mean_ratings[:5]

top_female_ratings = mean_ratings.sort_values(by='F',ascending=False)
top_female_ratings[:10]

mean_ratings['rating_diff'] = mean_ratings['F'] - mean_ratings['M']
sort_by_diff = mean_ratings.sort_values(by='rating_diff',ascending=False)
sort_by_diff[::-1][:5]

rating_std_by_title = data.groupby('title')['rating'].std()
rating_std_by_title.loc[active_titles].sort_values(ascending = False)[:10]
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值