2022.04.11

最新推荐文章于 2024-07-10 22:41:59 发布

ifuudoudou

最新推荐文章于 2024-07-10 22:41:59 发布

阅读量477

点赞数

文章标签：数据库

本文链接：https://blog.csdn.net/ifuudoudou/article/details/124077363

版权

课时41 数据索引的练习

multiIndex-复合索引

也可以通过xx.index=[ ]来重新赋值一个索引。

也可以用reindex函数。

最后的f一行为nan，因为原本的f那一行不存在，是新建立的一个。

reindex相当于从原本的里面取出两行，a行存在所以有值，原本的f行不存在，所以没有。

set_index是把一列作为索引。但是这样的话会把作为索引的一列给删除。

如果希望把某一列作为索引，且把某一列保留，就需要用到drop参数。

索引还有unique这个方法。说明索引是可以重复的。

运用DataFrame设置两个列为索引的话，得到的就是一个复合索引。

那么三个是否可以呢？

答案是可以的。

课时42 数据分组聚合练习和总结

这个时候c依然是个series类型。

那么如何取值呢？

那么重新定义一下d、

这个时候从one开始取值的话，要如何做呢？

可以用swaplevel（）函数取得 “one”这个索引的值。

这个函数可以从内存开始取值。

那么现在对于这个b来说，如何取one的值呢？

这个时候需要使用loc，从外面来取。

使用swaplevel的话，就要交换位置，再取。

import pandas as pd
from matplotlib import pyplot as plt

file_path=" "
df=pd.read_csv(file_path)

#使用matplotlib呈现出店铺总数排名前10的国家
#准备数据
detail=df.groupby(by="Country").count()["Brand"].sort_values(ascending=False)[:10]

_x=detail.index
_y=detail.values

#画图
plt.figure(figsize=(20,8),dpi=80)

plt.bar(range(len(_x)),_y)
plt.xticks(range(len(_x)),_x)

plt.show()

import pandas as pd
from matplotlib import pyplot as plt

file_path=" "
df=pd.read_csv(file_path)
df=df[df["Country"]==CN]


#使用matplotlib呈现出店铺总数排名前10的国家
#准备数据
detail=df.groupby(by="Country").count()["Brand"].sort_values(ascending=False)[:10]

_x=detail.index
_y=detail.values

#画图
plt.figure(figsize=(20,8),dpi=80)

plt.bar(range(len(_x)),_y,width=0.3,color="orange")
plt.xticks(range(len(_x)),_x)

plt.show()

可以看到字体出现了错误，这是因为没有引用字体。

import pandas as pd
from matplotlib import pyplot as plt
from matplotlib import font_manager

my_font=font_manager.FontProperties(fname="C:\Windows\Fonts\simsun.ttc")
file_path=" "
df=pd.read_csv(file_path)
df=df[df["Country"]==CN]


#使用matplotlib呈现出店铺总数排名前10的国家
#准备数据
detail=df.groupby(by="Country").count()["Brand"].sort_values(ascending=False)[:25]

_x=detail.index
_y=detail.values

#画图
plt.figure(figsize=(20,8),dpi=80)

plt.bar(range(len(_x)),_y,width=0.3,color="orange")
plt.xticks(range(len(_x)),_x,fontproperties=my_font)

plt.show()

如果要调整成横向的曲线图呢？

import pandas as pd
from matplotlib import pyplot as plt
from matplotlib import font_manager

my_font=font_manager.FontProperties(fname="C:\Windows\Fonts\simsun.ttc")
file_path=" "
df=pd.read_csv(file_path)
df=df[df["Country"]==CN]


#使用matplotlib呈现出店铺总数排名前10的国家
#准备数据
detail=df.groupby(by="Country").count()["Brand"].sort_values(ascending=False)[:25]

_x=detail.index
_y=detail.values

#画图
plt.figure(figsize=(20,10),dpi=80)

#plt.bar(range(len(_x)),_y,width=0.3,color="orange")
plt.barh(range(len(_x)),_y,height=0.3,color="orange")
plt.xticks(range(len(_x)),_x,fontproperties=my_font)

plt.show()

import pandas as pd
from matplotlib import pyplot as plt

file_path=" "

df=pd.read_csv(file_path)

#确定信息
#print(df.head(2))
#print(df.info())

#detail=df[pd.notnull(df["original_publication_year"])]

#group=detail.groupby(by="original_publication_year").count()["tiltle"]

#不同年份书的平均评分情况
#去除original_publication_year列中nan的行
detail=df[pd.notnull(df["original_publication_year"])]

detail["average_rating"].groupby(by=detail["original_publication_year"])

print(grouped)

import pandas as pd
from matplotlib import pyplot as plt

file_path=" "

df=pd.read_csv(file_path)

#确定信息
#print(df.head(2))
#print(df.info())

#detail=df[pd.notnull(df["original_publication_year"])]

#group=detail.groupby(by="original_publication_year").count()["tiltle"]

#不同年份书的平均评分情况
#去除original_publication_year列中nan的行
detail=df[pd.notnull(df["original_publication_year"])]

grouped=detail["average_rating"].groupby(by=detail["original_publication_year"])

print(grouped)

_x=grouped.index
_y=grouped.values

#画图
plt.figure(figsize=(20,8),dpi=80)
plt.plot(range(len(_x),_y)

plt.xticks(list(range(len(_x)))[::10],_x[::10],rotation=90)

plt.show()

ifuudoudou

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
2022.04.11

课时41 数据索引的练习multiIndex-复合索引也可以通过xx.index=[ ]来重新赋值一个索引。也可以用reindex函数。最后的f一行为nan，因为原本的f那一行不存在，是新建立的一个。reindex相当于从原本的里面取出两行，a行存在所以有值，原本的f行不存在，所以没有。set_index是把一列作为索引。但是这样的话会把作为索引的一列给删除。如果希望把某一列作为索引，且把某一列保留，就需要用到drop参数。索引还有unique这个方...
复制链接

扫一扫