要是用pandas,你首先得了解它的两个主要数据结构:Series和DataFrame,这里我将简单介绍一下Series
Series,Python,pandas
- 导入相关库
>>> from pandas import Series, DataFrame
>>> import pandas as pd
>>> import numpy as np
- Series是一种类似于数组的对象,它由一组数据以及一组与之相关的数据标签(即索引)组成
>>> x=[1,2,np.nan,7,9]
>>> obj=Series(x)
>>> obj
0 1.0
1 2.0
2 NaN
3 7.0
4 9.0
dtype: float64
- 我们可以查看Series的索引和值,也可以指定数据索引
>>> obj.values
array([ 1., 2., nan, 7., 9.])
>>> obj.index
RangeIndex(start=0, stop=5, step=1)
>>> obj2=Series(x,index=list('bacde'))
>>> obj2
b 1.0
a 2.0
c NaN
d 7.0
e 9.0
dtype: float64
- 与numpy的多维数组相比,你可以通过索引的方式选取Series中的单一或一组值
>>> obj2['b']
1.0
>>> obj2[list('abc')]
a 2.0
b 1.0
c NaN
dtype: float64
- 一些数学运算操作及其他操作
>>> obj2[obj2>4]
d 7.0
e 9.0
dtype: float64
>>> obj2*2#未改变原数据
b 2.0
a 4.0
c NaN
d 14.0
e 18.0
dtype: float64
>>> np.exp(obj2)#e^x,并未改变原数据
b 2.718282
a 7.389056
c NaN
d 1096.633158
e 8103.083928
dtype: float64
>>> 'd' in obj2
True
>>> 9.0 in obj2
False
- 通过字典创建Series
>>> family={'jun':51,'aiqin':49,'dan':23,'hao':21,'lianying':84}
>>> obj3=Series(family)
>>> obj3
aiqin 49
dan 23
hao 21
jun 51
lianying 84
dtype: int64
######
>>> people=['lianying','aiqin','jun','dan','wang']
>>> obj4=Series(family,index=people)
>>> obj4
lianying 84.0
aiqin 49.0
jun 51.0
dan 23.0
wang NaN#family找不到people里'wang'对应的值,所以其结果为NaN(即‘非数字’)
dtype: float64
- 用pandas的isnull和notnull检测确实数据
>>> pd.isnull(obj4)
lianying False
aiqin False
jun False
dan False
wang True
dtype: bool
>>> pd.notnull(obj4)
lianying True
aiqin True
jun True
dan True
wang False
dtype: bool
- Series对象本身及索引都有一个NAME属性
>>> obj4.index.name='name'
>>> obj4.name='WANG FAMILY'
>>> obj4
name
lianying 84.0
aiqin 49.0
jun 51.0
dan 23.0
wang NaN
Name: WANG FAMILY, dtype: float64