【练习一】 现有一份关于美剧《权力的游戏》剧本的数据集,请解决以下问题:
(a)在所有的数据中,一共出现了多少人物?
import pandas as pd
import numpy as np
df = pd.read_csv('C:/Users/PuLinYue/Desktop/joyful-pandas/data/Game_of_Thrones_Script.csv')
df.head()
Release Date | Season | Episode | Episode Title | Name | Sentence | |
---|---|---|---|---|---|---|
0 | 2011/4/17 | Season 1 | Episode 1 | Winter is Coming | waymar royce | What do you expect? They're savages. One lot s... |
1 | 2011/4/17 | Season 1 | Episode 1 | Winter is Coming | will | I've never seen wildlings do a thing like this... |
2 | 2011/4/17 | Season 1 | Episode 1 | Winter is Coming | waymar royce | How close did you get? |
3 | 2011/4/17 | Season 1 | Episode 1 | Winter is Coming | will | Close as any man would. |
4 | 2011/4/17 | Season 1 | Episode 1 | Winter is Coming | gared | We should head back to the wall. |
df.describe()
Release Date | Season | Episode | Episode Title | Name | Sentence | |
---|---|---|---|---|---|---|
count | 23911 | 23911 | 23911 | 23911 | 23911 | 23911 |
unique | 73 | 8 | 10 | 73 | 564 | 22300 |
top | 2017/8/13 | Season 2 | Episode 5 | Eastwatch | tyrion lannister | No. |
freq | 505 | 3914 | 3083 | 505 | 1760 | 103 |
df['Name'].nunique() #显示Name有多少个唯一值
564
(b)以单元格计数(即简单把一个单元格视作一句),谁说了最多的话?
df['Name'].value_counts()
tyrion lannister 1760
jon snow 1133
daenerys targaryen 1048
cersei lannister 1005
jaime lannister 945
...
janos slunt 1
steward of house stark 1
archmaester 1
night watch stable boy 1
bryndel 1
Name: Name, Length: 564, dtype: int64
df['Name'].value_counts().index[0]
'tyrion lannister'
(c)以单词计数,谁说了最多的单词?
#apply(lambda x:len(x.split())) apply函数看每个句子里有多少个单词
df_words = df.assign(Words=df['Sentence'].apply(lambda x:len(x.s