题目:对于这一组电影数据,如果我们希望统计电影分类(genre)的情况,应该如何处理数据?
import numpy as np
import pandas as pd
from matplotlib import pyplot as plt
file_path = "/**/day05/code/IMDB-Movie-Data.csv"
df1 = pd.read_csv(file_path)
temp_list = df1["Genre"].str.split(",").tolist()
Genre_list = list(set([i for j in temp_list for i in j]))
#构建列名为各个类型,值全为0的df
zero_df = pd.DataFrame(np.zeros(shape=(df1.shape[0],len(Genre_list)),dtype=int),columns=Genre_list )
#统计
for i in range(df1.shape[0]):
zero_df.loc[i, temp_list[i]] = 1
#统计各列的和
genre_sum = zero_df.sum(axis=0)
genre_count = genre_sum.sort_values()
#画图
_x = genre_count.index
_y = genre_count.values
plt.figure(figsize=(20,8), dpi=80)
plt.bar(_x,_y,)
结果如下: