pandas小节,电影,男女喜爱差异数据分析

pandas

电影喜爱分析

多个framedata合并

  • pd.merge(ratings, users) 合并相同类的
  • pd.concat([ratings,users,movies],axis=1) 横向不合并相同类的
  • meand=data.table(‘rating’,index=‘title’,columns=‘gender’,aggfunc=‘mean’)
    取平均值,序列按index排列
  • atings_by_title = data.groupby(‘title’).size() 统计title中相同的个数
  • active_titles = ratings_by_title.index[ratings_by_title >= 250]
    索引布尔值为真数据
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
pd.options.display.max_rows	= 10

unames = ['user_id','gender','age','occupation','zip']
users = pd.read_table(r'C:\Users\Administrator\Desktop\网络模型\ml-1m\users.dat',sep='::',header=None,names=unames)
rnames	=	['user_id','movie_id','rating','timestamp']
ratings	=	pd.read_table(r'C:\Users\Administrator\Desktop\网络模型\ml-1m\ratings.dat',sep='::',header=None,names=rnames)
mnames	=	['movie_id','title','genres']
movies	=	pd.read_table(r'C:\Users\Administrator\Desktop\网络模型\ml-1m\movies.dat',sep='::',header=None,names=mnames)
E:\anaconda\envs\yolo\lib\site-packages\ipykernel_launcher.py:2: ParserWarning: Falling back to the 'python' engine because the 'c' engine does not support regex separators (separators > 1 char and different from '\s+' are interpreted as regex); you can avoid this warning by specifying engine='python'.
  
E:\anaconda\envs\yolo\lib\site-packages\ipykernel_launcher.py:4: ParserWarning: Falling back to the 'python' engine because the 'c' engine does not support regex separators (separators > 1 char and different from '\s+' are interpreted as regex); you can avoid this warning by specifying engine='python'.
  after removing the cwd from sys.path.
E:\anaconda\envs\yolo\lib\site-packages\ipykernel_launcher.py:6: ParserWarning: Falling back to the 'python' engine because the 'c' engine does not support regex separators (separators > 1 char and different from '\s+' are interpreted as regex); you can avoid this warning by specifying engine='python'.
users[:3]
user_idgenderageoccupationzip
01F11048067
12M561670072
23M251555117
ratings[:3]
user_idmovie_idratingtimestamp
0111935978300760
116613978302109
219143978301968
movies[:3]
movie_idtitlegenres
01Toy Story (1995)Animation|Children's|Comedy
12Jumanji (1995)Adventure|Children's|Fantasy
23Grumpier Old Men (1995)Comedy|Romance
data=pd.merge(pd.merge(ratings,	users),	movies)   #合并同类
pd.concat([ratings,users,movies],axis=1)[:3]  #不合并同类
user_idmovie_idratingtimestampuser_idgenderageoccupationzipmovie_idtitlegenres
01119359783007601.0F1.010.0480671.0Toy Story (1995)Animation|Children's|Comedy
1166139783021092.0M56.016.0700722.0Jumanji (1995)Adventure|Children's|Fantasy
2191439783019683.0M25.015.0551173.0Grumpier Old Men (1995)Comedy|Romance
data[:3]
user_idmovie_idratingtimestampgenderageoccupationziptitlegenres
0111935978300760F11048067One Flew Over the Cuckoo's Nest (1975)Drama
1211935978298413M561670072One Flew Over the Cuckoo's Nest (1975)Drama
21211934978220179M251232793One Flew Over the Cuckoo's Nest (1975)Drama
mean_ratings=data.pivot_table('rating',index='title',columns='gender',aggfunc='mean')
mean_ratings[:5]
genderFM
title
$1,000,000 Duck (1971)3.3750002.761905
'Night Mother (1986)3.3888893.352941
'Til There Was You (1997)2.6756762.733333
'burbs, The (1989)2.7934782.962085
...And Justice for All (1979)3.8285713.689024
ratings_by_title = data.groupby('title').size()
ratings_by_title[:4]
title
$1,000,000 Duck (1971)        37
'Night Mother (1986)          70
'Til There Was You (1997)     52
'burbs, The (1989)           303
dtype: int64
active_titles = ratings_by_title.index[ratings_by_title	>=	250]
pd.Series(active_titles)[:5]
0                   'burbs, The (1989)
1    10 Things I Hate About You (1999)
2                101 Dalmatians (1961)
3                101 Dalmatians (1996)
4                  12 Angry Men (1957)
Name: title, dtype: object
mean_ratings = mean_ratings.loc[active_titles]
mean_ratings[:5]
genderFM
title
'burbs, The (1989)2.7934782.962085
10 Things I Hate About You (1999)3.6465523.311966
101 Dalmatians (1961)3.7914443.500000
101 Dalmatians (1996)3.2400002.911215
12 Angry Men (1957)4.1843974.328421
top_female_ratings=mean_ratings.sort_values(by='F',ascending=False)  #对女性评分降序排列 ascending升序
top_female_ratings[:4]
genderFMdifferent
title
Close Shave, A (1995)4.6444444.473795-0.170650
Wrong Trousers, The (1993)4.5882354.478261-0.109974
Sunset Blvd. (a.k.a. Sunset Boulevard) (1950)4.5726504.464589-0.108060
Wallace & Gromit: The Best of Aardman Animation (1996)4.5631074.385075-0.178032
mean_ratings['different']=mean_ratings['M'] - mean_ratings['F']   #计算评分差异
sorted_by_diff = mean_ratings.sort_values(by='different')
sorted_by_diff[:4]  #分歧最大的,女生评分高的
genderFMdifferent
title
Dirty Dancing (1987)3.7903782.959596-0.830782
Jumpin' Jack Flash (1986)3.2547172.578358-0.676359
Grease (1978)3.9752653.367041-0.608224
Little Women (1994)3.8705883.321739-0.548849
rating_std_by_title	=	data.groupby('title')['rating'].std()
rating_std_by_title	=	rating_std_by_title.loc[active_titles]
rating_std_by_title.sort_values(ascending=False)[:5]   #排序
title
Dumb & Dumber (1994)                     1.321333
Blair Witch Project, The (1999)          1.316368
Natural Born Killers (1994)              1.307198
Tank Girl (1995)                         1.277695
Rocky Horror Picture Show, The (1975)    1.260177
Name: rating, dtype: float64
  • 0
    点赞
  • 2
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值