本章思维导图
本章问题与练习
- 【练习一】 现有一份关于美剧《权力的游戏》剧本的数据集,请解决以下问题:
(a)在所有的数据中,一共出现了多少人物?
print(df['Name'].nunique())
# 564
(b)以单元格计数(即简单把一个单元格视作一句),谁说了最多的话?
print(df['Name'].value_counts().index[0])
# 'tyrion lannister'
(c)以单词计数,谁说了最多的单词?
df_words = df.assign(Words=df['Sentence'].apply(lambda x:len(x.split()))).sort_values(by='Name')
print(df_words.head())
L_count = []
N_words = list(zip(df_words['Name'],df_words['Words']))
for i in N_words:
if i == N_words[0]:
L_count.append(i[1])
last = i[0]
else:
L_count.append(L_count[-1]+i[1] if i[0]==last else i[1])
last = i[0]
df_words['Count']=L_count
print(df_words['Name'][df_words['Count'].idxmax()])
# 'tyrion lannister'
- 【练习二】现有一份关于科比的投篮数据集,请解决如下问题:
(a)哪种action_type和combined_shot_type的组合是最多的?
df = pd.read_csv('data/Kobe_data.csv',index_col='shot_id')
df1 = pd.Series(list(zip(df['action_type'],df['combined_shot_type']))).value_counts().index[0]
print(df1)
# ('Jump Shot', 'Jump Shot')
(b)以单元格计数(即简单把一个单元格视作一句),谁说了最多的话?
df2_1 = pd.Series(list(list(zip(*(pd.Series(list(zip(df['game_id'],df['opponent']))).unique()).tolist()))[1])).value_counts().index[0]
print(df2_1)
# 'SAS'
或者:
df2_2=df['game_id'].to_frame()
df2_2=df2_2.assign(opponent=pd.Series(df['opponent']))
df2_2=df2_2.drop_duplicates(keep='first')
df2_2=df2_2['opponent'].value_counts().index[0]
print(df2_2)
# 'SAS'