pandas 中的 tolist() 和 to_list()

魏晋小子

已于 2024-04-17 21:31:25 修改

阅读量5.9k

点赞数 14

分类专栏： pandas 文章标签： pandas list 数据结构

于 2024-04-17 21:30:17 首次发布

本文链接：https://blog.csdn.net/weixin_43684951/article/details/137840431

版权

pandas 专栏收录该内容

2 篇文章

订阅专栏

本文详细介绍了Pandas的tolist()和to_list()方法，以及它们在DataFrame、Series、Index和多维数组中的使用，同时对比了与numpy.tolist()的不同之处，重点在于数据类型转换和适用场景。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

在使用pandas的时候，有时候会需要将pandas中的数据类型转换为python中的list，而pandas也提供了tolist()和to_list()这两个方法来实现这一需求
几乎可以认为pandas中的tolist()和to_list()用法没有差别
还顺便介绍了numpy中的tolist()方法，其主要特点是可以作用于任意维度的数组

1. `tolist()`

	pandas.api.extensions.ExtensionArray.tolist()
		
		Return a list of the values.
 		
 		These are each a scalar type, which is a Python scalar (for str, int, float) or 
 		a pandas scalar (for Timestamp/Timedelta/Interval/Period)

>>> arr = pd.array([1, 2, 3])
>>> arr.tolist()
[1, 2, 3]

这是官方文档上对于tolist()的说明与示例。从中可以看出：

该方法属于pandas扩展的，从其所属的pandas.api.extensions.ExtensionArray即可看出
该方法返回一个list，list中元素的类型既可以为python的数据类型，也可以pandas中的类型，（在较早的版本中，返回列表中的元素类型为numpy类型或者pandas类型）

下面用示例来介绍tolist()方法的使用

df = pd.DataFrame(
    {"A": [1, 2, 3], "B": [4, 5, 6]},
    index=["x", "y", "z"]
)

1.1. 不能直接用于`DataFrame`

df.tolist()
# AttributeError: 'DataFrame' object has no attribute 'tolist'

1.2. 用于`index`和`column`属性上

index_tolist = df.index.tolist()
print(index_tolist)
print(type(index_tolist))
print(type(index_tolist[0]))

# ['x', 'y', 'z']
# <class 'list'>
# <class 'str'>

columns_tolist = df.columns.tolist()
print(columns_tolist)
print(type(columns_tolist))
print(type(columns_tolist[0]))

# ['A', 'B']
# <class 'list'>
# <class 'str'>

1.3. 用于行列数据上

row_tolist = df.iloc[0].tolist()
print(row_tolist)
print(type(row_tolist))
print(type(row_tolist[0]))

# [1, 4]
# <class 'list'>
# <class 'int'>

col_tolist = df["A"].tolist()
print(col_tolist)
print(type(col_tolist))
print(type(col_tolist[0]))

# [1, 2, 3]
# <class 'list'>
# <class 'int'>

此处也表明tolist()在Series()的用法

1.4. 用在多维索引上

index_df = pd.DataFrame(
    [["bar", "one"], ["bar", "two"], ["foo", "one"], ["foo", "two"]],
    columns=["first", "second"],
)

mul_index = pd.MultiIndex.from_frame(index_df)
mul_df = pd.DataFrame(np.random.randn(4, 3), index=mul_index)

                     0         1         2
first second                              
bar   one    -0.625643  0.533483  0.066657
      two    -1.759180  1.116185  0.264087
foo   one    -0.773947 -1.649559  1.865090
      two     1.200301 -3.090575 -1.464554

mul_index_tolist = mul_df.index.tolist()
print(mul_index_tolist)
print(type(mul_index_tolist))
print(type(mul_index_tolist[0]))
print(type(mul_index_tolist[0][0]))

# [('bar', 'one'), ('bar', 'two'), ('foo', 'one'), ('foo', 'two')]
# <class 'list'>
# <class 'tuple'>
# <class 'str'>

2. `to_list()`

	pandas.Index.to_list()
	pandas.Series.to_list()
		
		Return a list of the values.
 		
 		These are each a scalar type, which is a Python scalar (for str, int, float) or 
 		a pandas scalar (for Timestamp/Timedelta/Interval/Period)

从官方文档可以看出，to_list() 与 tolist() 的解释说明完全一致，所不同的是tolist()属于pandas扩展方法，而to_list()则属于Index和Series类型的方法。

2.1. 不能直接用于`DataFrame`

df.to_list()
# AttributeError: 'DataFrame' object has no attribute 'to_list'

2.2. 用于`index`和`column`属性上

index_to_list = df.index.to_list()
print(index_to_list)
print(type(index_to_list))
print(type(index_to_list[0]))

# ['x', 'y', 'z']
# <class 'list'>
# <class 'str'>

columns_to_list = df.columns.to_list()
print(columns_to_list)
print(type(columns_to_list))
print(type(columns_to_list[0]))

# ['A', 'B']
# <class 'list'>
# <class 'str'>

2.3. 用于行列数据上

row_to_list = df.iloc[0].to_list()
print(row_to_list)
print(type(row_to_list))
print(type(row_to_list[0]))

# [1, 4]
# <class 'list'>
# <class 'int'>

col_to_list = df["A"].to_list()
print(col_to_list)
print(type(col_to_list))
print(type(col_to_list[0]))

# [1, 2, 3]
# <class 'list'>
# <class 'int'>

此处也表明to_list()在Series()上的用法

2.4. 用在多维索引上

index_df = pd.DataFrame(
    [["bar", "one"], ["bar", "two"], ["foo", "one"], ["foo", "two"]],
    columns=["first", "second"],
)

mul_index = pd.MultiIndex.from_frame(index_df)
mul_df = pd.DataFrame(np.random.randn(4, 3), index=mul_index)

                     0         1         2
first second                              
bar   one    -0.625643  0.533483  0.066657
      two    -1.759180  1.116185  0.264087
foo   one    -0.773947 -1.649559  1.865090
      two     1.200301 -3.090575 -1.464554

mul_index_to_list = mul_df.index.to_list()

print(mul_index_to_list)
print(type(mul_index_to_list))
print(type(mul_index_to_list[0]))
print(type(mul_index_to_list[0][0]))

# [('bar', 'one'), ('bar', 'two'), ('foo', 'one'), ('foo', 'two')]
# <class 'list'>
# <class 'tuple'>
# <class 'str'>

3. `numpy` 中的 `tolist()`

numpy.ndarray.tolist()

	Return the array as an a.ndim-levels deep nested list of Python scalars.

	Return a copy of the array data as a (nested) Python list. 
	Data items are converted to the nearest compatible builtin Python type, via the item function.
	If a.ndim is 0, then since the depth of the nested list is 0, it will not be a list at all, but a simple Python scalar.

numpy 中的 tolist() 着重强调了两点：

列表中的元素类型都是python数据类型
可以对0维、1维和2维以及更高维度的numpy.ndarray进行转换，这一点是pandas中的tolist或to_list所不具备的

a = np.uint32([1, 2])
a_list = list(a)
a_list    # [1, 2]
type(a_list[0])   # <class 'numpy.uint32'>

a_tolist = a.tolist()
a_tolist   # [1, 2]
type(a_tolist[0])  # <class 'int'>

a = np.array([[1, 2], [3, 4]])
list(a)   # [array([1, 2]), array([3, 4])]
a.tolist()   # [[1, 2], [3, 4]]

a = np.array(1)
# list(a)
# Traceback (most recent call last):
#  ...
# TypeError: iteration over a 0-d array
a.tolist()    # 1

pandas 中的 tolist() 和 to_list()

1. tolist()

1.1. 不能直接用于DataFrame

1.2. 用于index和column属性上

1.3. 用于行列数据上

1.4. 用在多维索引上

2. to_list()

2.1. 不能直接用于DataFrame

2.2. 用于index和column属性上