DataFrame的merge操作
引入相关库
import numpy as np
import pandas as pd
from pandas import Series,DataFrame
merge操作
创建两个DataFrame
df1=DataFrame({'key':['X','Y','Z','X'],'data_set_1':[1,2,3,4]})
df1
key | data_set_1 | |
---|---|---|
0 | X | 1 |
1 | Y | 2 |
2 | Z | 3 |
3 | X | 4 |
df2=DataFrame({'key':['X','B','C'],'data_set_2':[4,5,6]})
df2
key | data_set_2 | |
---|---|---|
0 | X | 4 |
1 | B | 5 |
2 | C | 6 |
merge操作为取出df1和df2同时具有的‘X’(相同名称的columns)
pd.merge(df1,df2)
key | data_set_1 | data_set_2 | |
---|---|---|---|
0 | X | 1 | 4 |
1 | X | 4 | 4 |
on方法
on方法默认为None ,如果on使用on“key”,返回结果同上一样,同样会去找两边都有的‘key’
pd.merge(df1,df2,on='key')
key | data_set_1 | data_set_2 | |
---|---|---|---|
0 | X | 1 | 4 |
1 | X | 4 | 4 |
使用on“data_set_1”会报错,因为df2没有data_set_1
pd.merge(df1,df2,on='data_set_1')
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
<ipython-input-12-4bbe5b9d0e02> in <module>
----> 1 pd.merge(df1,df2,on='data_set_1')
I:\anaconda\lib\site-packages\pandas\core\reshape\merge.py in merge(left, right, how, on, left_on, right_on, left_index, right_index, sort, suffixes, copy, indicator, validate)
84 copy=copy,
85 indicator=indicator,
---> 86 validate=validate,
87 )
88 return op.get_result()
I:\anaconda\lib\site-packages\pandas\core\reshape\merge.py in __init__(self, left, right, how, on, left_on, right_on, axis, left_index, right_index, sort, suffixes, copy, indicator, validate)
625 self.right_join_keys,
626 self.join_names,
--> 627 ) = self._get_merge_keys()
628
629 # validate the merge keys dtypes. We may need to coerce
I:\anaconda\lib\site-packages\pandas\core\reshape\merge.py in _get_merge_keys(self)
981 if not is_rkey(rk):
982 if rk is not None:
--> 983 right_keys.append(right._get_label_or_level_values(rk))
984 else:
985 # work-around for merge_asof(right_index=True)
I:\anaconda\lib\site-packages\pandas\core\generic.py in _get_label_or_level_values(self, key, axis)
1690 values = self.axes[axis].get_level_values(key)._values
1691 else:
-> 1692 raise KeyError(key)
1693
1694 # Check for duplicates
KeyError: 'data_set_1'
how方法
how默认的方法为“inner”,结果同上一样
pd.merge(df1,df2,on='key',how='inner')
key | data_set_1 | data_set_2 | |
---|---|---|---|
0 | X | 1 | 4 |
1 | X | 4 | 4 |
使用left方法,会保留df1所有的columns和row,df2会过来使用NaN做一个补全
pd.merge(df1,df2,on='key',how='left')
key | data_set_1 | data_set_2 | |
---|---|---|---|
0 | X | 1 | 4.0 |
1 | Y | 2 | NaN |
2 | Z | 3 | NaN |
3 | X | 4 | 4.0 |
使用right方法和left一样,不过是以df2为基准
pd.merge(df1,df2,on='key',how='right')
key | data_set_1 | data_set_2 | |
---|---|---|---|
0 | X | 1.0 | 4 |
1 | X | 4.0 | 4 |
2 | B | NaN | 5 |
3 | C | NaN | 6 |
使用outer方法,会保留df1和df2所有的columns和values,并会使用NaN做一个补全
pd.merge(df1,df2,on='key',how='outer')
key | data_set_1 | data_set_2 | |
---|---|---|---|
0 | X | 1.0 | 4.0 |
1 | X | 4.0 | 4.0 |
2 | Y | 2.0 | NaN |
3 | Z | 3.0 | NaN |
4 | B | NaN | 5.0 |
5 | C | NaN | 6.0 |
url链接
可以打开url查看merge的更详细的说明
import webbrowser
webbrowser.open('https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.merge.html')