1数据分析库pandas的使用

最新推荐文章于 2023-06-17 00:57:17 发布

AIHUBEI

最新推荐文章于 2023-06-17 00:57:17 发布

阅读量676

点赞数 2

分类专栏：数据分析库pandas 文章标签： python 数据分析数据挖掘

本文链接：https://blog.csdn.net/AIHUBEI/article/details/104820028

版权

数据分析库pandas 专栏收录该内容

6 篇文章 0 订阅

订阅专栏

Pandas 的使用

author by xiaoyao

Numpy提供了方便的数组处理功能，但其缺少的是：数据处理、分析所需要的快速工具。pandas基于Numpy开发，提供了很多的高级数据处理功能。

import pandas as pd
import numpy as np
# pd.set_option("display.show_dimensions", False)
# pd.set_option("display.float_format", "{:4.2g}".format)

Pandas-方便的数据分析库

import pandas as pd
pd.__version__

'0.25.3'

import pandas as pd
pd.__version__

'0.25.3'

Pandas中的数据对象

`Series`对象

s = pd.Series([1, 2, 3, 4, 5], index=["a", "b", "c", "d", "e"])
print ("索引:", s.index)
print ("值数组:", s.values)

索引: Index(['a', 'b', 'c', 'd', 'e'], dtype='object')
值数组: [1 2 3 4 5]

Series为Pandas中最基本的对象，定义了Numpy的ndarray对象的接口__array__()，因此可以调用Numpy的数组处理函数直接对Series对象进行处理。Series支持使用下标存取元素，也支持使用索引存取元素。

a = pd.Series([2,3,4,5,6],['a','b','c','e','f'])
print(a)

a    2
b    3
c    4
e    5
f    6
dtype: int64

print(s)
print('长度为:',len(s))

a    1
b    2
c    3
d    4
e    5
dtype: int64
长度为: 5

print (u"位置下标   s[2]:", s[2])
print (u"标签下标 s['d']:", s['d'])

位置下标   s[2]: 3
标签下标 s['d']: 4

# 注意这里使用的式冒号隔开，而不是逗号
# 使用下标进行存取，截取操作不包括最后的值，使用index进行存取，是包括的。。。注意区别
print(s[1:3])
print(s['b':'d'])

b    2
c    3
dtype: int64
b    2
c    3
d    4
dtype: int64

# %c 5 s[1:3]; s['b':'d']

UsageError: Line magic function `%c` not found.

print(s[1:3],s['b':'d'])

b    2
c    3
dtype: int64 b    2
c    3
d    4
dtype: int64

# 把要查询的元素下标或者index作为参数传入，这里是以列表形式传入的
print(s[[1,3,2]],s[['b','d','c']])

b    2
d    4
c    3
dtype: int64 b    2
d    4
c    3
dtype: int64

# Returns an iterator over the dictionary’s (key, value) pairs.就是元素为键值对的列表
list(s.iteritems())

[('a', 1), ('b', 2), ('c', 3), ('d', 4), ('e', 5)]

# 直接进行print，显示为zip对象
print(a.iteritems())
a.iteritems()
# 使用list进行格式转换
list(a.iteritems())

<zip object at 0x00000263B36743C8>





[('a', 2), ('b', 3), ('c', 4), ('e', 5), ('f', 6)]

# s2 = pd.Series([20,30,40,50,60], index=["b","c","d","e","f"])
# %C 5 s; s2; s+s2

     s                s2               s+s2     
------------     ------------     --------------
a    1           b    20          a    nan      
b    2           c    30          b     22      
c    3           d    40          c     33      
d    4           e    50          d     44      
e    5           f    60          e     55      
dtype: int64     dtype: int64     f    nan      
                                  dtype: float64

s2 = pd.Series([20,30,30,40,50], index=['a','b','c','d','e'])
print(list(s2.iteritems()))
print('*'*50)# 此处打印分割线
print('s2-Serirs:{}.'.format(s2))
print('*'*50)# 此处打印分割线
print('s2+s-Series',s2+s)
print('*'*50)# 此处打印分割线
print('s-Series,s2_series,s2+s-Series{}{}{}.'.format(s,s2,s2+s))

[('a', 20), ('b', 30), ('c', 30), ('d', 40), ('e', 50)]
**************************************************
s2-Serirs:a    20
b    30
c    30
d    40
e    50
dtype: int64.
**************************************************
s2+s-Series a    21
b    32
c    33
d    44
e    55
dtype: int64
**************************************************
s-Series,s2_series,s2+s-Seriesa    1
b    2
c    3
d    4
e    5
dtype: int64a    20
b    30
c    30
d    40
e    50
dtype: int64a    21
b    32
c    33
d    44
e    55
dtype: int64.

`DataFrame`对象

`DataFrame`的各个组成元素

DataFrame对象(数据表)是pandas中最常用的数据对象。

%pwd  # 用于查看当前工作目录

# int, str, sequence of int / str, or False, default ``None``
# Column(s) to use as the row labels of the ``DataFrame``, either given as
# string name or column index. If a sequence of int / str is given, a MultiIndex is used.
df_soil = pd.read_csv("./data/Soils-simple.csv", index_col=[0, 1], parse_dates=["Date"])
df_soil.columns.name = "Measures"

df_soil.info()

<class 'pandas.core.frame.DataFrame'>
MultiIndex: 6 entries, (0-10, Depression) to (10-30, Top)
Data columns (total 6 columns):
pH        6 non-null float64
Dens      6 non-null float64
Ca        6 non-null float64
Conduc    6 non-null float64
Date      6 non-null datetime64[ns]
Name      6 non-null object
dtypes: datetime64[ns](1), float64(4), object(1)
memory usage: 450.0+ bytes

print(type(df_soil)) # 显示为DataFame类型

<class 'pandas.core.frame.DataFrame'>

print(df_soil.dtypes)

Measures
pH               float64
Dens             float64
Ca               float64
Conduc           float64
Date      datetime64[ns]
Name              object
dtype: object

print(df_soil.shape)

(6, 6)

# DataFrame对象拥有行索引和列索引，可以通过索引标签对其进行存取

index属性保存行索引，columns属性保存列索引

# 列索引
print (df_soil.columns)
print (df_soil.columns.name)

Index(['pH', 'Dens', 'Ca', 'Conduc', 'Date', 'Name'], dtype='object', name='Measures')
Measures

print(df_soil)

Measures              pH    Dens       Ca  Conduc       Date   Name
Depth Contour                                                      
0-10  Depression  5.3525  0.9775  10.6850  1.4725 2015-05-26   Lois
      Slope       5.5075  1.0500  12.2475  2.0500 2015-04-30    Roy
      Top         5.3325  1.0025  13.3850  1.3725 2015-05-21    Roy
10-30 Depression  4.8800  1.3575   7.5475  5.4800 2015-03-21   Lois
      Slope       5.2825  1.3475   9.5150  4.9100 2015-02-06  Diana
      Top         4.8500  1.3325  10.2375  3.5825 2015-04-11  Diana

# 行索引
print (df_soil.index)
print (df_soil.index.names)

MultiIndex([( '0-10', 'Depression'),
            ( '0-10',      'Slope'),
            ( '0-10',        'Top'),
            ('10-30', 'Depression'),
            ('10-30',      'Slope'),
            ('10-30',        'Top')],
           names=['Depth', 'Contour'])
['Depth', 'Contour']

print(df_soil["pH"],"\n",df_soil[["Dens", "Ca"]])

Depth  Contour   
0-10   Depression    5.3525
       Slope         5.5075
       Top           5.3325
10-30  Depression    4.8800
       Slope         5.2825
       Top           4.8500
Name: pH, dtype: float64 
 Measures            Dens       Ca
Depth Contour                    
0-10  Depression  0.9775  10.6850
      Slope       1.0500  12.2475
      Top         1.0025  13.3850
10-30 Depression  1.3575   7.5475
      Slope       1.3475   9.5150
      Top         1.3325  10.2375

# 与二维数组类似，DataFame对象也具有两个轴，他的第0轴为纵轴，第1轴为横轴。当某个方法或者函数具有axis,orient参数的时候

# 该参数可以使用整数0和1，或者"index"和“columns”来表示纵轴和横轴方向。

# loc可以通过行索引标签获得指定的行，当结果为一行的时候，结果为Series对象，当结果为多行，结果为DataFame对象
print('{}{}.'.format(df_soil.loc["0-10", "Top"],df_soil.loc["10-30"]))
# 或者
# print(df_soil.loc["0-10","Top"],df_soil.loc["10-30"])
print('*'*50)
print(type(df_soil.loc["0-10","Top"]))
print(type(df_soil.loc["10-30"]))

Measures
pH                     5.3325
Dens                   1.0025
Ca                     13.385
Conduc                 1.3725
Date      2015-05-21 00:00:00
Name                      Roy
Name: (0-10, Top), dtype: objectMeasures        pH    Dens       Ca  Conduc       Date   Name
Contour                                                      
Depression  4.8800  1.3575   7.5475  5.4800 2015-03-21   Lois
Slope       5.2825  1.3475   9.5150  4.9100 2015-02-06  Diana
Top         4.8500  1.3325  10.2375  3.5825 2015-04-11  Diana.
**************************************************
<class 'pandas.core.series.Series'>
<class 'pandas.core.frame.DataFrame'>

print(df_soil.loc["0-10","Top"])

Measures
pH                     5.3325
Dens                   1.0025
Ca                     13.385
Conduc                 1.3725
Date      2015-05-21 00:00:00
Name                      Roy
Name: (0-10, Top), dtype: object

# values将DataFame对象转化为数组。由于这里的数据类型不统一，故而得到的是一个元素类型为object的数组。
df_soil.values.dtype

dtype('O')

将内存中的数据转换为`DataFrame`对象

调用DataFrame()可以将多种格式的数据转换成DataFame对象，有三个参数：数据，index，columns,其中，data 可以是：二维数组或者可以转换为二维数组的嵌套列表。可以是字典，

# 1首先产生一个形状为(4,2)的二维数组，其中取值的范围是：0到9.通过DataFrame对象，指定好index和columns参数指定行和列的索引。

# 2将字典转换为DataFrame对象

# 3将结构数组转换为DataFrame对象

df1 = pd.DataFrame(np.random.randint(0, 10, (4, 2)), #❶
                   index=["A", "B", "C", "D"], 
                   columns=["a", "b"])

df2 = pd.DataFrame({"a":[1, 2, 3, 4], "b":[5, 6, 7, 8]},  #❷
                   index=["A", "B", "C", "D"])

arr = np.array([("item1", 1), ("item2", 2), ("item3", 3), ("item4", 4)], 
               dtype=[("name", "10S"), ("count", int)])

df3 = pd.DataFrame(arr) #❸

print("df1的类型:{}.".format(type(df1)))
print("df2的类型:{}.".format(type(df2)))
print("df3的类型:{}.".format(type(df3)))

print("*"*50)
print(df1)
print("*"*50)
print(df2)
print("*"*50)
print(df3)

df1的类型:<class 'pandas.core.frame.DataFrame'>.
df2的类型:<class 'pandas.core.frame.DataFrame'>.
df3的类型:<class 'pandas.core.frame.DataFrame'>.
**************************************************
   a  b
A  8  1
B  3  5
C  3  1
D  9  8
**************************************************
   a  b
A  1  5
B  2  6
C  3  7
D  4  8
**************************************************
       name  count
0  b'item1'      1
1  b'item2'      2
2  b'item3'      3
3  b'item4'      4

# 也可以调用from_开头的方法，将特定格式的数据转为DataFrame对象。from_dict（）将四点转换为DataFrame对象，其中的orient参数可以指定字典键值

# 对应的方向，默认值为columns,意思就是将字典的键转换为列索引，即：字典中的每个值与一列对应。对应的如果orient为index，就是字典的每个值与

# 一行对应。

dict1 = {"a":[1, 2, 3], "b":[4, 5, 6]}
dict2 = {"a":{"A":1, "B":2}, "b":{"A":3, "C":4}} # 嵌套字典
df1 = pd.DataFrame.from_dict(dict1, orient="index")
df2 = pd.DataFrame.from_dict(dict1, orient="columns")

df3 = pd.DataFrame.from_dict(dict2, orient="index")# 嵌套字典中的缺失数据使用NaN表示，依然遵循：字典中的每个值对应一行
df4 = pd.DataFrame.from_dict(dict2, orient="columns")

# %C 6 df1; df2; df3; df4

print(df1)
print("*"*50)
print(df2)
print("*"*50)
print(df3)
print("*"*50)
print(df4)

   0  1  2
a  1  2  3
b  4  5  6
**************************************************
   a  b
0  1  4
1  2  5
2  3  6
**************************************************
   A    B    C
a  1  2.0  NaN
b  3  NaN  4.0
**************************************************
     a    b
A  1.0  3.0
B  2.0  NaN
C  NaN  4.0

# from_items()将（键,值）序列转换为DataFrame对象，其中的值：可以是一维数据的列表，数组或者Series对象。当其中的orient参数为：index的时候，

# 需要通过columns指定列索引。

注意，这里使用的python3，直接使用from_items()会产生如下的警告提示。

如下，最好使用from_dict(dict(items),…)

D:\installation\anaconda3\lib\site-packages\ipykernel_launcher.py:3: FutureWarning: from_items is deprecated. Please use DataFrame.from_dict(dict(items), …) instead. DataFrame.from_dict(OrderedDict(items)) may be used to preserve the key order.
This is separate from the ipykernel package so we can avoid doing imports until
D:\installation\anaconda3\lib\site-packages\ipykernel_launcher.py:4: FutureWarning: from_items is deprecated. Please use DataFrame.from_dict(dict(items), …) instead. DataFrame.from_dict(OrderedDict(items)) may be used to preserve the key order.
after removing the cwd from sys.path.

# dict1 = {"a":[1, 2, 3], "b":[4, 5, 6]}
items = dict1.items()
df1 = pd.DataFrame.from_dict(dict(items), orient="index", columns=["A", "B", "C"])
df2 = pd.DataFrame.from_dict(dict(items), orient="columns")

print(df1)
print("*"*50)
print(df2)

   A  B  C
a  1  2  3
b  4  5  6
**************************************************
   a  b
0  1  4
1  2  5
2  3  6

将`DataFrame`对象转换为其它格式的数据

to_dict()方法，将DataFrame对象转化为字典，其orient参数决定字典元素的类型：

print ("转换为字典列表之后的df2:",df2.to_dict(orient="records")) #字典列表,orient参数为：records,,或者称之为：结构数组
print("*"*50)
print ("转换为列表字典之后的df2:",df2.to_dict(orient="list")) #列表字典,orient参数为：list
print("*"*50)
print ("转换为嵌套字典之后的df2:",df2.to_dict(orient="dict")) #嵌套字典,orient参数为：dict

转换为字典列表之后的df2: [{'a': 1, 'b': 4}, {'a': 2, 'b': 5}, {'a': 3, 'b': 6}]
**************************************************
转换为列表字典之后的df2: {'a': [1, 2, 3], 'b': [4, 5, 6]}
**************************************************
转换为嵌套字典之后的df2: {'a': {0: 1, 1: 2, 2: 3}, 'b': {0: 4, 1: 5, 2: 6}}

# to_records（）方法可以将DataFrame对象转化为结构数组,其中，如若index参数值为True(默认值为True)，则其返回的数组中包含行索引数据。

print (df2.to_records().dtype)
print (df2.to_records(index=False).dtype)

(numpy.record, [('index', '<i8'), ('a', '<i8'), ('b', '<i8')])
(numpy.record, [('a', '<i8'), ('b', '<i8')])

print(df2.to_records())

[(0, 1, 4) (1, 2, 5) (2, 3, 6)]

`Index`对象 Index对象是只读的.

Index对象用来保存索引标签数据，它可以快速的找到标签对应的下标，其中values属性可以实现获得保存标签的数组。

index = df_soil.columns # columns获得数据df_soil的列索引
print(index.values)
index.values

['pH' 'Dens' 'Ca' 'Conduc' 'Date' 'Name']





array(['pH', 'Dens', 'Ca', 'Conduc', 'Date', 'Name'], dtype=object)

# 为了观察方便，在这里打印输出数据df_soil
print(df_soil)

Measures              pH    Dens       Ca  Conduc       Date   Name
Depth Contour                                                      
0-10  Depression  5.3525  0.9775  10.6850  1.4725 2015-05-26   Lois
      Slope       5.5075  1.0500  12.2475  2.0500 2015-04-30    Roy
      Top         5.3325  1.0025  13.3850  1.3725 2015-05-21    Roy
10-30 Depression  4.8800  1.3575   7.5475  5.4800 2015-03-21   Lois
      Slope       5.2825  1.3475   9.5150  4.9100 2015-02-06  Diana
      Top         4.8500  1.3325  10.2375  3.5825 2015-04-11  Diana

print(index)

Index(['pH', 'Dens', 'Ca', 'Conduc', 'Date', 'Name'], dtype='object', name='Measures')

print (index[[1, 3]]) # 注意，原始计数从零开始.   打印输出，下标从1~3的index值，不包括下标3
print (index[index > 'c']) # 打印输出，index值，首字母大于"c"的值，要知道大小写字母的AscII码相差：32，c-C=32
print (index[1::2]) # 从下标1开始，步长为2，逐个打印输出

Index(['Dens', 'Conduc'], dtype='object', name='Measures')
Index(['pH'], dtype='object', name='Measures')
Index(['Dens', 'Conduc', 'Name'], dtype='object', name='Measures')

Index对象也是具有字典的映射功能的，通过具体操作，可以实现将数组中的值映射到具体的位置。
– index.get_loc()获得单个值的下标
– index.get_indexer()获得一组值的下标

print(type(index))

<class 'pandas.core.indexes.base.Index'>

print (index.get_loc('Ca'))
print (index.get_indexer(['Dens', 'Conduc', 'nothing'])) # 当这里的值不存在，就直接返回-1

2
[ 1  3 -1]

可以直接调用Index()来创建多个的index对象。然后可以将其传递给DataFrame()的index，或者columns参数。由于index对象是不可变的对象，因此多个数据对象的索引可以引用的是同一个index对象.

index = pd.Index(["A", "B", "C", "D", "E"], name="level")
s1 = pd.Series([1, 2, 3, 4, 5], index=index)
df1 = pd.DataFrame({"a":[1, 2, 3, 4, 5], "b":[6, 7, 8, 9, 10]}, index=index)
print (s1.index is df1.index)

True

`MultiIndex`对象

MultiIndex表示多级索引，继承自Index,其中的多级标签采用元组对象来表示。依然可以通过;get_loc()和get_indexer()获取单个或者多个的下标.

mindex = df_soil.index
print (mindex[1])
print (mindex.get_loc(("0-10", "Slope")))
print (mindex.get_indexer([("10-30", "Top"), ("0-10", "Depression"), "nothing"]))

('0-10', 'Slope')
1
[ 5  0 -1]

print(mindex) # 可以看到这里的mindex是一个多级索引

MultiIndex([( '0-10', 'Depression'),
            ( '0-10',      'Slope'),
            ( '0-10',        'Top'),
            ('10-30', 'Depression'),
            ('10-30',      'Slope'),
            ('10-30',        'Top')],
           names=['Depth', 'Contour'])

# 在多级索引内部，不直接保存元组对象，而是使用多个Index对象来保存索引中每一级的标签

print (mindex.levels[0])
print (mindex.levels[1])

Index(['0-10', '10-30'], dtype='object', name='Depth')
Index(['Depression', 'Slope', 'Top'], dtype='object', name='Contour')

# 使用多个整数数组保存这些标签的下标:

如果直接使用mindex.labelx[]将会产生如下的警报，建议使用：.codes

D:\installation\anaconda3\lib\site-packages\ipykernel_launcher.py:1: FutureWarning: .labels was deprecated in version 0.24.0. Use .codes instead.
“”"Entry point for launching an IPython kernel.
D:\installation\anaconda3\lib\site-packages\ipykernel_launcher.py:2: FutureWarning: .labels was deprecated in version 0.24.0. Use .codes instead.

print (mindex.codes[0])
print (mindex.codes[1])

[0 0 0 1 1 1]
[0 1 2 0 1 2]

level0, level1 = mindex.levels
label0, label1 = mindex.codes
zip(level0[label0], level1[label1])

<zip at 0x263b60a99c8>

# 当把一个元组列表传递给Index()的时候，将会自动创建MultiIndex对象。

mult = pd.Index([("A", "x"), ("A", "y"), ("B", "x"), ("B", "y")], name=["class1", "class2"])
mult

MultiIndex([('A', 'x'),
            ('A', 'y'),
            ('B', 'x'),
            ('B', 'y')],
           names=['class1', 'class2'])

print(mult[0])

('A', 'x')

print(mult.levels[0])

Index(['A', 'B'], dtype='object', name='class1')

mult.codes[0]

FrozenNDArray([0, 0, 1, 1], dtype='int8')

# 通过from_方法从特定的数据结构创建MultiIndex对象

class1 = ["A", "A", "B", "B"]
class2 = ["x", "y", "x", "y"]
pd.MultiIndex.from_arrays([class1, class2], names=["class1", "class2"])

MultiIndex([('A', 'x'),
            ('A', 'y'),
            ('B', 'x'),
            ('B', 'y')],
           names=['class1', 'class2'])

# from_procuct从多个集合的笛卡尔积创建MultiIndex对象。

midx = pd.MultiIndex.from_product([["A", "B", "C"], ["x", "y"]], 
                           names=["class1", "class2"])
df1 = pd.DataFrame(np.random.randint(0, 10, (6, 6)), columns=midx, index=midx)

df1

	class1	A		B		C
	class2	x	y	x	y	x	y
class1	class2
A	x	5	6	5	5	3	1
A	y	5	7	0	2	3	0
B	x	7	6	2	9	0	7
B	y	7	4	2	1	6	2
C	x	4	8	0	2	4	8
C	y	3	5	9	7	4	8

常用的函数参数

df_soil

	Measures	pH	Dens	Ca	Conduc	Date	Name
Depth	Contour
0-10	Depression	5.3525	0.9775	10.6850	1.4725	2015-05-26	Lois
	Slope	5.5075	1.0500	12.2475	2.0500	2015-04-30	Roy
	Top	5.3325	1.0025	13.3850	1.3725	2015-05-21	Roy
10-30	Depression	4.8800	1.3575	7.5475	5.4800	2015-03-21	Lois
	Slope	5.2825	1.3475	9.5150	4.9100	2015-02-06	Diana
	Top	4.8500	1.3325	10.2375	3.5825	2015-04-11	Diana

print(df_soil.mean())
print("*"*50)
print(df_soil.mean(axis=1)) # 指定运算对应的轴
print("*"*50)
df_soil.mean(level=1)  # 取值为整数或者索引的级别名，用以指定：运算对应的级别

Measures
pH         5.200833
Dens       1.177917
Ca        10.602917
Conduc     3.144583
dtype: float64
**************************************************
Depth  Contour   
0-10   Depression    4.621875
       Slope         5.213750
       Top           5.273125
10-30  Depression    4.816250
       Slope         5.263750
       Top           5.000625
dtype: float64
**************************************************

Measures	pH	Dens	Ca	Conduc
Contour
Depression	5.11625	1.16750	9.11625	3.47625
Slope	5.39500	1.19875	10.88125	3.48000
Top	5.09125	1.16750	11.81125	2.47750

`DataFrame`的内部结构

DataFrame对象内部使用Numpy数组保存数据，所以也会出现和数组相同的共享数据存储区的问题。

scpy是用于通过ssh自动将文件和目录同步到远程服务器的命令行工具。

#%fig=DataFrame对象的内部结构
# from scpy2.common import GraphvizDataFrame
# %dot GraphvizDataFrame.graphviz(df_soil)

type(df_soil)

pandas.core.frame.DataFrame

DataFrame 对象的columns属性是index对象，而index属性表示是多级索引的MultiIndex对象。

type(df_soil.index)

pandas.core.indexes.multi.MultiIndex

type(df_soil.columns)

pandas.core.indexes.base.Index

Index对象的所用功能由其_engine属性-----这是一个ObjectEngine对象提供。该对象通过哈希表PyObjectHashTable对象将标签映射到其对应的整数下标。

df_soil.columns._engine.mapping.get_item("Date")

获取DataFrame对象的某一列，则其与原来的DataFame对象内存共享

s = df_soil["Dens"]
s.values.base is df_soil._data.blocks[0].values

True

当通过使用[]获取多列，将复制所有的数据。故而保存新的DataFrame对象数据的数组的base属性为None.

print (df_soil[["Dens"]]._data.blocks[0].values.base)

None

如果DataFrame对象只有一个数据块，则通过vaules属性获得的数组是数据块中数组的转置，故而他与DataFrame对象共享内存。

# df_float中所有元素的类型相同，只有一个数据块。
df_float = df_soil[['pH', 'Dens', 'Ca', 'Conduc']]
df_float.values.base is df_float._data.blocks[0].values

True

# 当DataFrame对象只有一个数据块时候，获取他的行数据得到的Series对象也与其共享内存。
df_float.loc["0-10", "Top"].values.base is df_float._data.blocks[0].values

True

df_soil.values.dtype

dtype('O')

AIHUBEI

关注

2
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
1数据分析库pandas的使用

Pandas 的使用author by xiaoyaoNumpy提供了方便的数组处理功能，但其缺少的是：数据处理、分析所需要的快速工具。pandas基于Numpy开发，提供了很多的高级数据处理功能。import pandas as pdimport numpy as np# pd.set_option("display.show_dimensions", False)# pd.set_...
复制链接

扫一扫