1.分组
groupby()可以通过传入需要分组的参数实现对数据的分组.
import pandas as pd
oly=pd.read_csv('olympics.csv',skiprows=4)
group2=oly.groupby(['Edition','NOC']) #多层分组
b=group2.get_group((1912,'FRA')) #多层分组提取,提取Edition=1912,NOC='FRA'那个分组,返回DataFrame
print(b)
City Edition Sport Discipline \ 2067 Stockholm 1912 Athletics Athletics 2068 Stockholm 1912 Athletics Athletics 2069 Stockholm 1912 Athletics Athletics 2070 Stockholm 1912 Athletics Athletics 2073 Stockholm 1912 Athletics Athletics 2160 Stockholm 1912 Equestrian Eventing............................................................................and so forth
2.透视表(pivot table)(本质是一种分组计算)
- 相当于分组选定的数据列进行函数运算
import pandas as pd
import numpy as np
df = pd.read_excel("sales-funnel.xlsx")
print(df)
#将"Name"作为index,对各数据型列进行操作,默认是mean操作,返回作用结果
print(pd.pivot_table(df,index=["Name"]))
3.合并(append、insert)
- 常用append,insert实现行追加、列追加和列插入
(1)append(用于行追加数据,不能利用该命令做行插入)
注意:1.append是series和dataframe的方法,使用它就是按行进行追加数据(不能横拼接)
2.append不能实现行插入,若想实现行插入功能,只能分块拆分再合并追加。
s3=[['a8','b8','c8','d8'],['a9','b9','c9','d9'],['a10','b10','c10','d10'],['a11','b11','c11','d11']]
df3=pd.DataFrame(s3,index=[8,9,10,11],columns=['A','B','C','D'])
result1 = df1.append(df3,ignore_index=True)
result1#忽略df3的index,重新以0,1,2,...定义新的index。
(2)insert(列插入,只能插入1列)
my_array=np.arange(20).reshape((4,5))
my_data=pd.DataFrame(my_array,columns=['A','B','C',"D",'E'])
my_data['F']=pd.Series([89,90,1001,100])
my_data['g']=[2,3,15,28]
insert_col=np.array([89,1,9,3])
my_data.insert(4,'insert1',insert_col)
my_data.insert(2,'insert2',1000)
my_data
结果:
3.练习题(先不要看答案哟!试一下你有没有真正的掌握)
参考答案:
import pandas as pd
list1=[['a0','b0','c0','d0'],['a1','b1','c1','d1'],['a2','b2','c2','d2'],['a3','b3','c3','d3']]
list2=[['a4','b4','c4','d4'],['a5','b5','c5','d5'],['a6','b6','c6','d6'],['a7','b7','c7','d7']]
df1=pd.DataFrame(list1,columns=['A','B','C','D'])
df2=pd.DataFrame(list2,columns=['A','B','C','D'])
result = df1.append(df2,ignore_index=True)
result