Kaggle数据分析学习（二）-Netflix Shows and Movies

最新推荐文章于 2022-08-24 11:11:32 发布

qq_46454022

最新推荐文章于 2022-08-24 11:11:32 发布

阅读量407

点赞数

文章标签：电影时长电视节目季节分布内容评级类别分析

原文链接：https://www.kaggle.com/zzw1997/netflix-shows-and-movies-exploratory-analysis/edit/run/58200250

版权

电影时长的分布

1.缺失值处理 df.fillna(0,inplace=True) inplace=true 不创建副本直接修改原参数
.fillna(method=‘ffill’)用前一个非缺失值填充，(method=‘bfill’)用下一个非缺失值填充
limit=2,限制填充个数
2…dtype查看数据类型；.astype(np.float64) 转换数据类型

import plotly.figure_factory as ff
x1 = d2['duration'].fillna(0.0).astype(float)
fig = ff.create_distplot([x1], ['a'], bin_size=0.7, curve_type='normal', colors=["#6ad49b"])
fig.update_layout(title_text='Distplot with Normal Distribution')
fig.show()

在这里插入图片描述

有许多季的电视节目

col = 'season_count'
vc1 = d1[col].value_counts().reset_index()
print(vc1)
vc1 = vc1.rename(columns = {col : "count", "index" : col})
vc1['percent'] = vc1['count'].apply(lambda x : 100*x/sum(vc1['count']))
vc1 = vc1.sort_values(col)

trace1 = go.Bar(x=vc1[col], y=vc1["count"], name="TV Shows", marker=dict(color="#a678de"))
data = [trace1]
layout = go.Layout(title="Seasons", legend=dict(x=0.1, y=1.1, orientation="h"))
fig = go.Figure(data, layout=layout)
fig.show()

在这里插入图片描述

内容的评级

提示：这里可以添加计划学习的时间

col = "rating"

vc1 = d1[col].value_counts().reset_index()
vc1 = vc1.rename(columns = {col : "count", "index" : col})
vc1['percent'] = vc1['count'].apply(lambda x : 100*x/sum(vc1['count']))
vc1 = vc1.sort_values(col)

vc2 = d2[col].value_counts().reset_index()
vc2 = vc2.rename(columns = {col : "count", "index" : col})
vc2['percent'] = vc2['count'].apply(lambda x : 100*x/sum(vc2['count']))
vc2 = vc2.sort_values(col)

trace1 = go.Bar(x=vc1[col], y=vc1["count"], name="TV Shows", marker=dict(color="#a678de"))
trace2 = go.Bar(x=vc2[col], y=vc2["count"], name="Movies", marker=dict(color="#6ad49b"))
data = [trace1, trace2]
layout = go.Layout(title="Content added over the years", legend=dict(x=0.1, y=1.1, orientation="h"))
fig = go.Figure(data, layout=layout)
fig.show()

在这里插入图片描述

最主要的类别是什么

[-1]：获取最后一个元素，类似于matlab中的end；
[:-1]：除了最后一个元素，获取其他所有的元素；
[::-1]：对第一个到最后一个元素进行倒序之后取出；
[n::-1]：对第一个到第n个元素进行倒序后取出。

col = "listed_in"
categories = ", ".join(d2['listed_in']).split(", ")
counter_list = Counter(categories).most_common(50)
labels = [_[0] for _ in counter_list][::-1]
values = [_[1] for _ in counter_list][::-1]
trace1 = go.Bar(y=labels, x=values, orientation="h", name="TV Shows", marker=dict(color="#a678de"))

data = [trace1]
layout = go.Layout(title="Content added over the years", legend=dict(x=0.1, y=1.1, orientation="h"))
fig = go.Figure(data, layout=layout)
fig.show()

在这里插入图片描述

qq_46454022

关注

0
点赞
踩
2

收藏

觉得还不错? 一键收藏
0
评论
Kaggle数据分析学习（二）-Netflix Shows and Movies

电影时长的分布1.缺失值处理 df.fillna(0,inplace=True) inplace=true 不创建副本直接修改原参数.fillna(method=‘ffill’)用前一个非缺失值填充，(method=‘bfill’)用下一个非缺失值填充limit=2,限制填充个数2…dtype查看数据类型；.astype(np.float64) 转换数据类型import plotly.figure_factory as ffx1 = d2['duration'].fillna(0.0).ast
复制链接

扫一扫