python的pandas库里的数据结构介绍

最新推荐文章于 2022-06-13 22:30:51 发布

orchidzouqr

最新推荐文章于 2022-06-13 22:30:51 发布

阅读量482

点赞数

分类专栏： Python 文章标签： python pandas库数据结构一Series

本文链接：https://blog.csdn.net/orchidzouqr/article/details/52809571

版权

Python 专栏收录该内容

8 篇文章 0 订阅

订阅专栏

首先介绍第一类：Series （注意大小写）

defination：Series is a one-dimensional labeled array capable of holding any data type(integers,strings,floating point numbers,python objects,etc.)

定义：Series是一维带标签的数组，其可包含任意类型的元素（整数，字符串，浮点数，Python对象等）

<1>创建Series对象的典型方法是：

import pandas as pd
import numpy as np
dic={'a':1,'b':2,'c':3}
p1=pd.Series(dic,index=['b','a','d','c','f'])
print(p1)

output is:
b 2
a 1
d NaN
c 3
f NaN
dtype: float64

p2=pd.Series([1,2],index=['apple','orange'])
print(p2)
>>>
apple 1
orange 2
dtype: int64

p3=pd.Series(6,index=range(3))
print(p3)
>>>
0 6
1 6
2 6
dtype: int64

<2>关于Series对象的操作

对Series的操作非常类似于对ndarray的操作

如：

import pandas as pd
import numpy as np
dic={'a':1,'b':2,'c':3}
p1=pd.Series(dic,index=['b','a','d','c','f'])
print(p1)

output is:
b 2
a 1
d NaN
c 3
f NaN
dtype: float64

2.对Series的操作类似于对字典的操作，可通过索引值来设置。

如：

print(s['a'])

>>>

-0.325154630498

s['e']=12

print(s)

>>>

a 1.221680
b 0.676150
c -0.966844
d 1.123409
e 12.000000
dtype: float64

print('e' in s)

>>>

True

print(s['f'])

>>>

KeyError:'f'

可采用get（）方法，对于调用没有的标签返回值为None或者特殊的默认值，如

print(s.get('f'))

>>>

None

print(s.get('f',np.nan))

>>>

nan

3.Series的向量化操作和标签调整

如：

print(s+s)

>>>

a 1.515210
b 1.236786
c -1.338008
d -2.445264
e 24.000000
dtype: float64

print(s*2)

>>>

a 4.028001
b 1.056986
c -2.799877
d -3.421514
e 24.000000
dtype: float64

Series与ndarray的一个关键区别是对Series进行操作时，根据标签数据进行自动调整。因此，无需考虑序列是否有相同的标签。

（A key difference between Series and ndarray is that operations between Series antomatically align the data based on label. Thus, you can write computations without giving consideration to whether the Series involved have the same labels.）

如：

s1=s[1:]+s[:-1]

print(s1)

>>>

a NaN
b -1.721291

c 0.848257
d 0.410060
e NaN
dtype: float64

注意：平时处理数据时，为了避免信息损失，对Series无需去掉数据缺失的样本。若是想去掉Series对象中的缺失数据，可采用dropna（）.

（Having an index label,though the data is missing, is typically importtant information sa part of a computation.）

print(s1.dropna())

>>>

b -1.145174
c 0.802023
d 1.354568
dtype: float64

4.命名属性（name attribute）

如：

s=pd.Series(np.random.randn(5),name='something')

print(s)

>>>

0 0.921333
1 -0.736353
2 -1.202390
3 -0.101308
4 0.017583
Name: something, dtype: float64

print(s.name)

>>>

'something'

对序列Series重命名可采用pandas.Series.rename()

s2=s.rename('difference')

print(s2.name)

>>>

'difference'

注意：s和s2是不同的对象。

orchidzouqr

关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
python的pandas库里的数据结构介绍

首先介绍第一类：Series （注意大小写）defination：Series is a one-dimensional labeled array capable of holding any data type(integers,strings,floating point numbers,python objects,etc.)定义：Series是一维带标签的数组，其可包含任意类型的
复制链接

扫一扫