DataFrame的级联&合并操作

DataFrame的级联&合并操作

# DataFrame 的级联操作
pd.concat   pd.append
pandas 使用pd.concat函数与np.concatenate 函数相似,只是多了一些参数

objs
axis=0
keys
join = 'outer' / 'inner':表示的是级联的方式,outer会将所有的项进行级联(忽略匹配与不匹配),而inner只会将匹配的项级联到一起,不匹配的不级联
ingnore_index = False
import numpy as np
import pandas as pd
from pandas import DataFrame
df1 = DataFrame(data = np.random.randint(0,100,size=(5,3)),columns = ['A','B','C'])
df2 = DataFrame(data = np.random.randint(0,100,size=(5,3)),columns = ['A','D','C'])
df1
ABC
0218816
1174351
259504
3257719
459696
df2
ADC
097390
1892073
2274967
3401056
464388
pd.concat((df1,df2),axis=1)
ABCADC
021881697390
1174351892073
259504274967
3257719401056
45969664388
# 纵向级联会存在不匹配线性,产生空值
pd.concat((df1,df2),axis=0,join = 'inner')

# -- 想要保证数据的完整性,需要使用外级联‘outer’
#pd.concat((df1,df2),axis=0,join = 'outer')   ==  df1.append(df2)
AC
02116
11751
2594
32519
4596
0990
18973
22767
34056
4648
###  DataFrame 的合并操作
"""
merge与concat的区别在于,merge需要依据某个共同列来进行合并
使用pd.merge()合并时,会自动根据两者相同的column名称的那一列,作为key来进行合并

"""
df1 = DataFrame({'employee':['Bob','Jake','Lisa'],'group':['Accounting','Engineering','Engineering'],})
df1
employeegroup
0BobAccounting
1JakeEngineering
2LisaEngineering
df2 = DataFrame({'employee':['Lisa','Bob','Jake'],'hire_date':[2004,2008,2012]})
df2
employeehire_date
0Lisa2004
1Bob2008
2Jake2012
pd.merge(df1,df2,on='employee')
employeegrouphire_date
0BobAccounting2008
1JakeEngineering2012
2LisaEngineering2004
df3 = DataFrame({'employee':['Jake','Lisa'],
                 'group':['Accounting','Engineering'],
                 'hire_date':[2004,2016]})
df3

employeegrouphire_date
0JakeAccounting2004
1LisaEngineering2016
df4 = DataFrame({'group':['Accounting','Engineering','Engineering'],
                 'supervisor':['Carly','Guido','Steve']})
df4
groupsupervisor
0AccountingCarly
1EngineeringGuido
2EngineeringSteve
pd.merge(df3,df4)
employeegrouphire_datesupervisor
0JakeAccounting2004Carly
1LisaEngineering2016Guido
2LisaEngineering2016Steve
df5 = DataFrame({'group':['Accounting','Engineering','HR'],
                 'supervisor':['Carly','Guido','Steve']})
df5
groupsupervisor
0AccountingCarly
1EngineeringGuido
2HRSteve
pd.merge(df1,df5)
employeegroupsupervisor
0BobAccountingCarly
1JakeEngineeringGuido
2LisaEngineeringGuido
pd.merge(df1,df5,how='outer')
employeegroupsupervisor
0BobAccountingCarly
1JakeEngineeringGuido
2LisaEngineeringGuido
3NaNHRSteve
pd.merge(df1,df5,how='left')  # outer :不过滤  left: 保留左表数据 right :保留右表数据
employeegroupsupervisor
0BobAccountingCarly
1JakeEngineeringGuido
2LisaEngineeringGuido
df2 = DataFrame({'employee':['Jack','Bob','Jack'],
                 'group':['Accounting','sell','CEO'],
                'hire_date':[2003,2007,2012]
                })
pd.merge(df1,df2)  # 此时合并条件有两个
employeegrouphire_date
0JackAccounting2003
pd.merge(df1,df2,on='group')  # 指定合并条件
employee_xgrouphire_date_xemployee_yhire_date_y
0JackAccounting2003Jack2003
df5 = DataFrame({'name':['Lisa','Bob','Bill'],
                'hire_date':[1998,2016,2007]})
df5
namehire_date
0Lisa1998
1Bob2016
2Bill2007
pd.merge(df1,df5,left_on='employee',right_on='name')
employeegrouphire_date_xnamehire_date_y
0BobEngineering2007Bob2016

  • 4
    点赞
  • 11
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值