多表操作
- concat函数
import pandas as pd
import numpy
dictionary1 = {"A":["A0","A1","A2","A3"],"B":["B0","B1","B2","B3"],"C":["C0","C1","C2","C3"],"D":["D0","D1","D2","D3"]}
df1 = pd.DataFrame(data=dictionary1,index=[0,1,2,3])
dictionary2 = {"A":["A4","A5","A6","A7"],"B":["B4","B5","B6","B7"],"C":["C4","C5","C6","C7"],"D":["D4","D5","D6","D7"]}
df2 = pd.DataFrame(data=dictionary2,index=[4,5,6,7])
dictionary3 = {"A":["A8","A9","A10","A11"],"B":["B8","B9","B10","B11"],"C":["C8","C9","C10","C11"],"D":["D8","D9","D10","D11"]}
df3 = pd.DataFrame(data=dictionary3,index=[8,9,10,11])
# concat 函数的作用是将多个数据框对象进行组合,默认的组合方式是按照列来组合
pd.concat(objs=[df1,df2,df3])
pd.concat(objs=[df1,df2,df3],axis=1,ignore_index=True)
concat函数可以组合多张表,默认是按照列的方式进行组合,当需要按照行的方式进行组合的时候,需要指定参数axis为1,当对应值不存在的时候,会用miss value来填充,ignore_index参数会忽略原有的索引并重新赋予数据集索引
- merge函数
left = {"A":["A0","A1","A2","A3"],"B":["B0","B1","B2","B3"],"Key0":["K0","K0","K1","K2"],"Key1":["K0","K1","K0","K1"]}
right = {"C":["C0","C1","C2","C3"],"D":["D0","D1","D2","D3"],"Key0":["K0","K1","K1","K2"],"Key1":["K0","K0","K0","K0"]}
left = pd.DataFrame(data=left)
right = pd.DataFrame(data=right)
pd.merge(left=left,right=right,how="inner",on="Key1",suffixes=("_left","_right"))
merge函数用来组合两张表,它不同于concat函数,每次只能组合两张表,但相对于concat函数,它更加灵活。
left参数用来指定左表数据集
right参数用来指定右表数据集
how参数用来指定按照什么方式来组合表,默认为left是按照左表的方式组合
on参数用来指定哪一列为主列并按照此列来组合两个数据集
suffixes参数用来指定相同列名的情况下为列名添加后缀