[ Pandas version: 1.0.1 ]
六、合并数据集:Concat与Append操作
将不同的数据源进行合并,包括:
- 将两个不同的数据集简单拼接
- 用数据库的连接 (join) 与合并 (merge) 操作处理有重叠字段的数据集
# 定义一个能够创建DataFrame某种形式的函数
def make_df(cols, ind):
"""一个简单的DataFrame"""
data = {
c: [str(c) + str(i) for i in ind] for c in cols}
return pd.DataFrame(data, ind)
# DataFrame示例
make_df('ABC', range(3))
# A B C
# 0 A0 B0 C0
# 1 A1 B1 C1
# 2 A2 B2 C2
(一)NumPy数组的合并 np.concatenate()
x = [1, 2, 3]
y = [4, 5, 6]
z = [7, 8, 9]
np.concatenate([x, y, z])
# array([1, 2, 3, 4, 5, 6, 7, 8, 9])
x = [[1, 2], [3, 4]]
np.concatenate([x, x], axis=1)
# array([[1, 2, 1, 2],
# [3, 4, 3, 4]])
(二)通过 pd.concat 实现简易合并
pd.concat()
函数比np.concatenate()
配置更多参数,功能更强大。
# pandas.concat — pandas 1.0.3 documentation
pandas.concat(objs: Union[Iterable[Union[ForwardRef('DataFrame'), ForwardRef('Series')]], Mapping[Union[Hashable, NoneType], Union[ForwardRef('DataFrame'), ForwardRef('Series')]]], axis=0, join='outer', ignore_index: bool = False, keys=None, levels=None, names=None, verify_integrity: bool = False, sort: bool = False, copy: bool = True) → Union[ForwardRef('DataFrame'), ForwardRef('Series')]
Parameters:
objs: a sequence or mapping of Series or DataFrame objects
If a dict is passed, the sorted keys will be used as the keys argument, unless it is passed, in which case the values will be selected (see below). Any None objects will be dropped silently unless they are all None in which case a ValueError will be raised.
axis: {
0/’index’, 1/’columns’}, default 0
The axis to concatenate along.
join: {
‘inner’, ‘outer’}, default ‘outer’
How to handle indexes on other axis (or axes).
ignore_index: bool, default False
If True, do not use the index values along the concatenation axis. The resulting axis will be labeled 0, …, n - 1. This is useful if you are concatenating objects where the concatenation axis does not have meaningful indexing information. Note the index values on the other axes are still respected in the join.
keys: sequence, default None
If multiple levels passed, should contain tuples. Construct hierarchical index using the passed keys as the outermost level.
levels: list of sequences, default None
Specific levels (unique values) to use for constructing a MultiIndex. Otherwise they will be inferred from the keys.
names: list, default None
Names for the levels in the resulting hierarchical index.
verify_integrity: bool, default False
Check whether the new concatenated axis contains duplicates. This can be very expensive relative to the actual data concatenation.
sort: bool, default False
Sort non-concatenation axis if it is not already aligned when join is ‘outer’. This has no effect when join='inner', which already preserves the order of the non-concatenation axis.
copy: bool, default True
If False, do not copy data unnecessarily.
Returns: object, type of objs
When concatenating all Series along the index (axis=0), a Series is returned. When objs contains at least one DataFrame, a DataFrame is returned. When concatenating along the columns (axis=1), a DataFrame is returned.
# 一维合并
ser1