主要函数:concat,append、merge
一、concat函数
pd.concat(objs, axis=0, join=‘outer’, join_axes=None,
ignore_index=False, keys=None, levels=None, names=None, verify_integrity=False)
参数说明:
1、objs: series,dataframe或者是panel构成的序列lsit axis: 需要合并链接的轴,0是行,1是列
2、join:连接的方式 inner,或者outer。取交集(inner),取并集(outer)
3、ignore_index
默认值:ignore_index=False
合并方向是否忽略原行/列名称,而采用系统默认的索引,即从0开始的int。
4、keys
默认值:keys=None 可以加一层标签,标识行/列名称属于原来哪个df。
5、levels
默认值:levels=None 明确行/列名称取值范围:
代码及注释:
import numpy as np
import pandas as pd
################ concat #################
df1 = pd.DataFrame(np.ones((3,4))*1,columns=['a','b','c','d'])
df2 = pd.DataFrame(np.ones((3,4))*2,columns=['a','b','c','d'])
df3 = pd.DataFrame(np.ones((3,4))*3,columns=['a','b','c','d'])
#合并
res = pd.concat([df1,df2,df3],axis=0) #合并后的索引不对
print(res)
res2 = pd.concat([df1,df2,df3],axis=0,ignore_index='ture')
print(res2)
df1 = pd.DataFrame(np.ones((3,4))*1,columns=['a','b','c','d'],index=[1,2,3])
df2 = pd.DataFrame(np.ones((3,4))*2,columns=['b','c','d','e'],index=[2,3,4])
# res = pd.concat([df1,df2]) # 没有数据的地方用nan填充
res2 = pd.concat([df1,df2],join='inner',ignore_index='Ture')
# print(res)
print(res2)
res3 = pd.concat([df1,df2],axis=1,join_axes=[df1.index]) #join_axes不同效果不同,以那个index为标准
res4 = pd.concat([df1,df2],axis=1,join_axes=[df2.index])
print(res3)
print(res4)
二、append函数
append(self, other, ignore_index=False, verify_integrity=False)
竖方向合并df,没有axis属性
使用方法: df1.append([df2,df3],ignore_index=‘Ture’)
ignore_index属性:
为‘Ture’时,将忽略之前的索引号,自动生成一个新的索引号
代码及注释:
import numpy as np
import pandas as pd
################# append #################
df1 = pd.DataFrame(np.ones((3,4))*1,columns=['a','b','c','d'])
df2 = pd.DataFrame(np.ones((3,4))*2,columns=['a','b','c','d'])
df3 = pd.DataFrame(np.ones((3,4))*2,columns=['a','b','c','d'])
res1 = df1.append(df2,ignore_index='Ture')
res2 = df1.append([df2,df3],ignore_index='Ture')
# res = df1.append([df2,df3])
print(res1)
print(res2)
s1 = pd.Series([1,2,3,4],index=['a','b','c','d'])
res3 = df1.append(s1,ignore_index='Ture') # 添加Series
print(res3)
二、merge函数
pd.merge(left, right, how=‘inner’, on=None, left_on=None,
right_on=None,left_index=False, right_index=False, sort=True,
suffixes=(’_x’, ‘_y’), copy=True, indicator=False,
validate=None)
1、on
指定on,设定合并基准列
2、how
how取值范围:‘inner’, ‘outer’, ‘left’, ‘right’,,默认值:how=‘inner’
‘inner’:共同列的值必须完全相等。
‘outer’:共同列的值都会保留,left或right在共同列上的差集,会对它们的缺失列项的值赋上NaN。
‘left’:根据左边的DataFrame确定共同列的保留值,右边缺失列项的值赋上NaN。
‘right’:根据右边的DataFrame确定共同列的保留值,左边缺失列项的值赋上NaN。
3、indicator
默认值:indicator=False,不显示合并方式 设置True表示显示合并方式,即left / right / both。
4、suffixes 将原始的两个索引后边增加指定的文字以示区分,并且同时保留两个索引对应的数据
代码及注释:
import numpy as np
import pandas as pd
left = pd.DataFrame({'key': ['K0', 'K1', 'K2', 'K3'],
'A': ['A0', 'A1', 'A2', 'A3'],
'B': ['B0', 'B1', 'B2', 'B3']})
right = pd.DataFrame({'key': ['K0', 'K1', 'K2', 'K3'],
'C': ['C0', 'C1', 'C2', 'C3'],
'D': ['D0', 'D1', 'D2', 'D3']})
print(left)
print(right)
res = pd.merge(left, right, on='key')
# pd.merge(数据1, 数据2, on='key')
print(res)
#######################################################################################################
# consider two keys
left = pd.DataFrame({'key1': ['K0', 'K0', 'K1', 'K2'],
'key2': ['K0', 'K1', 'K0', 'K1'],
'A': ['A0', 'A1', 'A2', 'A3'],
'B': ['B0', 'B1', 'B2', 'B3']})
right = pd.DataFrame({'key1': ['K0', 'K1', 'K1', 'K2'],
'key2': ['K0', 'K0', 'K0', 'K0'],
'C': ['C0', 'C1', 'C2', 'C3'],
'D': ['D0', 'D1', 'D2', 'D3']})
print(left)
print(right)
res = pd.merge(left, right, on=['key1', 'key2'], how='inner') # 默认 how='inner'
# how = ['left', 'right', 'outer', 'inner']
res = pd.merge(left, right, on=['key1', 'key2'], how='left')
print(res)
#######################################################################################################
# indicator
df1 = pd.DataFrame({'col1':[0,1], 'col_left':['a','b']})
df2 = pd.DataFrame({'col1':[1,2,2],'col_right':[2,2,2]})
print(df1)
print(df2)
res = pd.merge(df1, df2, on='col1', how='outer', indicator=True)
# give the indicator a custom name
res = pd.merge(df1, df2, on='col1', how='outer', indicator='indicator_column')
print(res)
#######################################################################################################
# suffixes
left = pd.DataFrame({'A': ['A0', 'A1', 'A2'],
'B': ['B0', 'B1', 'B2']},
index=['K0', 'K1', 'K2'])
right = pd.DataFrame({'C': ['C0', 'C2', 'C3'],
'D': ['D0', 'D2', 'D3']},
index=['K0', 'K2', 'K3'])
print(left)
print(right)
# left_index and right_index
res = pd.merge(left, right, left_index=True, right_index=True, how='outer')
res = pd.merge(left, right, left_index=True, right_index=True, how='inner')
# handle overlapping
boys = pd.DataFrame({'k': ['K0', 'K1', 'K2'], 'age': [1, 2, 3]})
girls = pd.DataFrame({'k': ['K0', 'K0', 'K3'], 'age': [4, 5, 6]})
res = pd.merge(boys, girls, on='k', suffixes=['_boy', '_girl'], how='outer')
print(res)
部分内容参考以下链接,博主写的很详细。
传送门:https://www.cnblogs.com/guxh/p/9451532.html