对 pd.concat 的理解，以及报错 ValueError: Shape of passed values is 。。。的解决办法-CSDN博客

本文链接：https://blog.csdn.net/ljr_123/article/details/106091542

本文探讨了Python中pandas库的pd.concat函数，详细介绍了如何通过该函数将多个pandas对象按行或列连接，并重点解析了ignore_index参数的作用，帮助读者理解其在避免ValueError中的应用。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

1、pd.concat：将多个 pandas 对象，按行索引或列索引进行连接

语法： pandas.concat(objs, axis=0, join=‘outer’, ignore_index=False, keys=None, levels=None, names=None, verify_integrity=False, sort=False, copy=True)（详见：pandas.concat 官方链接）

f = pd.DataFrame([[1, 2], [3, 4]], columns=list('AB'))
# f
#    A  B
# 0  1  2
# 1  3  4

f1 = pd.concat([f, pd.DataFrame({"A": 20, "B": 40}, index=[0])])  # 行连接
# f1                      # f            # pd.DataFrame({"A": 20, "B": 40}, index=[0])
#     A   B               #    A  B      #      A   B     
# 0   1   2               # 0  1  2      #  0  20  40
# 1   3   4               # 1  3  4  
# 0  20  40     

f2 = pd.concat([f1, pd.DataFrame([9], index=[0], columns=["A"])])  # 行连接
# f2                      # f1            # pd.DataFrame([9], index=[0], columns=["A"])
#     A     B             #     A   B     #    A
# 0   1   2.0             # 0   1   2     # 0  9
# 1   3   4.0             # 1   3   4   
# 0  20  40.0             # 0  20  40
# 0   9   NaN     


"""
注意：如果连接的 pandas 对象中，有任意一个对象 用于连接的索引中存在重复值，将会报错。
	  除非是自己与自己连接
"""
pd.concat([f1, f1], axis=1)  # 将 f1 与 f1 在列方向进行连接
# Out[72]:                  # f1        
#     A   B   A   B         #     A   B 
# 0   1   2   1   2         # 0   1   2 
# 1   3   4   3   4         # 1   3   4 
# 0  20  40  20  40         # 0  20  40 

"""
下面两种情况都会报错 ValueError: Shape of passed values is 。。。
因为，进行连接的 DataFrame 中，存在一个对象 用于连接的索引包含重复值
"""
# 情况一，ValueError: Shape of passed values is (5, 3), indices imply (3, 3)；f1的索引重复  
pd.concat([f1, pd.DataFrame([[9], [90]], index=[0, 1])], axis=1)  
# f1                 #  pd.DataFrame([[9], [90]], index=[0, 1])
#     A   B          #     0      
# 0   1   2          # 0   9      
# 1   3   4          # 1  90        
# 0  20  40 

"""正确连接方法，如下"""
# reset_index：用 “以 0 开始的序列”，重置索引。drop=True 时，删除原有的索引
pd.concat([f1.reset_index(drop=True), pd.DataFrame([[9], [90]], index=[0, 1])], axis=1)  
# Out[75]:                   # f1.reset_index(drop=True)       #  pd.DataFrame([[9], [90]], index=[0, 1])
#     A   B     0            #     A   B                       #     0      
# 0   1   2   9.0            # 0   1   2                       # 0   9      
# 1   3   4  90.0            # 1   3   4                       # 1  90        
# 2  20  40   NaN            # 2  20  40     


# 情况二，ValueError: Shape of passed values is (6, 3), indices imply (4, 3)；第二个对象的索引重复
pd.concat([f1.reset_index(drop=True), pd.DataFrame([[9], [90]], index=[0, 0])], axis=1)  
# f1.reset_index(drop=True)       #  pd.DataFrame([[9], [90]], index=[0, 0])
#     A   B                       #     0      
# 0   1   2                       # 0   9      
# 1   3   4                       # 0  90        
# 2  20  40     
"""
总结：
	通过以上报错可以发现，
	1）在“行方向”（横向）上进行连接时，是通过要连接的 DataFrame 行索引进行匹配；
	2）在“列方向”（纵向）上进行连接时，是通过要连接的 DataFrame 列索引进行匹配；
	
	所以，如果存在一个对象 用于连接的索引有重复值，就会报错。
	其实，它跟 merge 连接的道理是一样的，只是所用的关键字不同，
	concat 是把“索引”作为连接关键字，而 merge 是用“指定的字段”作为连接关键字
"""

2、对参数 ignore_index 的理解

f = pd.DataFrame([[1, 2], [3, 4]], columns=list('AB'), index=[1, 5])
# f
#    A  B
# 1  1  2
# 5  3  4

pd.concat([f, pd.Series(20, index=[5])], axis=1)
# Out[84]:                 # f            # pd.Series(20, index=[5])
#    A  B     0            #    A  B      # 5    20
# 1  1  2   NaN            # 1  1  2 
# 5  3  4  20.0            # 5  3  4 

# ignore_index=True：在哪个方向上进行的连接，就将该方向的索引重置
pd.concat([f, pd.Series(20, index=[5])], axis=1, ignore_index=True)
# Out[85]:                 # f            # pd.Series(20, index=[5])
#    0  1     2            #    A  B      # 5    20
# 1  1  2   NaN            # 1  1  2 
# 5  3  4  20.0            # 5  3  4