内容:
以Pandas和借阅数据完成以下查询
1)查询也借阅了1103310用户至少4本所借图书的其他用户
2)查询借阅量最大的前10位用户及每人借的最多的图书
3)查询被同一用户在一年中能借阅多次的图书名称
4)查询所有年度借阅量最大的五部图书在各个年份借书的总量
lending.dat数据是老师给的
(1)
import pandas as pd
df=pd.read_csv('lending.dat')
books=df[df['uid']==1103310]['uid'].unique()
users=df['uid'].value_counts()[df['uid'].value_counts()>=4].index
result=df[(df['uid'].isin(users))&(df['bid'].isin(books))&(df['uid']!=1103310)]['uid'].unique()
print(result)
(2)
import pandas as pd
df=pd.read_csv('lending.dat')
#sort by users
user_counts=df.groupby('uid').size()
#get top 10 users
top10_users=user_counts.sort_values(ascending=False)[:10].index
#get the books everyone borrowed
result=[]
for user in top10_users:
book=df[records['uid']==user]['title'].value_counts().index[0]
result.append((user,book))
print(result)
(3)
import pandas as pd
df = pd.read_csv('lending.dat')
# 添加年份列
df['year'] = pd.to_datetime(df['date']).dt.year
# 按用户和图书进行分组,并找到同一用户在同一年中借阅多次的图书
result = df.groupby(['uid', 'title', 'year']).size()[df.groupby(['uid', 'title', 'year']).size() >= 2].reset_index()[['uid','title', 'year']]
#
#这边用不到的其实
# # 获取图书名称
# book_names = df[['bid', 'title']].drop_duplicates()
# result = pd.merge(result, book_names, on='bid')['title'].unique()
print(result)
(4)
import pandas as pd
df=pd.read_csv('lending.dat')
# 添加年份列
df['year'] = pd.to_datetime(df['date']).dt.year
# 获取最受欢迎的五部图书
top5_books = df['title'].value_counts().head().index
# 按图书和年份进行分组统计借书量
result = df[df['title'].isin(top5_books)].groupby(['title', 'year']).size()
print(result)
与文章无关:
昨天python上机课取消了,晚上本来要去老师办公室汇报来着,结果还被放了鸽子。早知道就去找朋友玩了