构造/初始化Series的3种方法:
(1)用列表list构建Series
import pandas as pd
my_list = [7,"beijing","19大",3.1415,-10000,"Happy"]
s=pd.Series(my_list)
print(type(s))
print(s)
<class 'pandas.core.series.Series'>
0 7
1 beijing
2 19大
3 3.1415
4 -10000
5 Happy
dtype: object
(1.2)pandas会默认用0到n来做Series的index,但是我们也可以自己指定index,index你可以理解为dict里面的key
s = pd.Series([7,"beijing","19大",3.1415,-10000,"Happy"],index=["A","B","C","D","E","F"])
print(s)
A 7
B beijing
C 19大
D 3.1415
E -10000
F Happy
dtype: object
(2)用字典dict来构建Series,因为Series本身其实就是key-value的结构
cities={"Beijing":55000,"Shanghai":60000,"shenzhen":50000,"Hangzhou":20000,"Guangzhou":45000,"Suzhou":None}
apts=pd.Series(cities,name="income")
print(apts)
Beijing 55000.0
Guangzhou 45000.0
Hangzhou 20000.0
Shanghai 60000.0
Suzhou NaN
shenzhen 50000.0
Name: income, dtype: float64
(3)用numpy array来构建Series
import numpy as np
d = pd.Series(np.random.randn(6),index=["a","b","c","d","e","f"])
print(d)
a 0.405263
b 0.092748
c 0.380906
d -0.580236
e -1.149417
f -0.659308
dtype: float64
选择数据
(1)可以像对待一个list一样对待一个Series,完成各种切片的操作
import pandas as pd
cities = {"Beijing":55000,"Shanghai":60000,"shenzhen":50000,"Hangzhou":20000,"Guangzhou":45000,"Suzhou":None}
apts = pd.Series(cities,name="income")
print(apts)
Beijing 55000.0
Guangzhou 45000.0
Hangzhou 20000.0
Shanghai 60000.0
Suzhou NaN
shenzhen 50000.0
Name: income, dtype: float64
print(apts[3])
60000.0
print(apts[3:])
Shanghai 60000.0
Suzhou NaN
shenzhen 50000.0
Name: income, dtype: float64
print(apts[1:]+(apts[:-1]))
Beijing NaN
Guangzhou 90000.0
Hangzhou 40000.0
Shanghai 120000.0
Suzhou NaN
shenzhen NaN
Name: income, dtype: float64
(2)Series就像一个dict,前面定义的index就是用来选择数据的
import pandas as pd
cities = {"Beijing":55000,"Shanghai":60000,"shenzhen":50000,"Hangzhou":20000,"Guangzhou":45000,"Suzhou":None}
apts=pd.Series(cities,name="income")
print(apts["Shanghai"])
print("Hangzhou" in apts)
True
print("Jilin" in apts)
False
(3)boolean indexing,和numpy很像
import pandas as pd
cities = {"Beijing":55000,"Shanghai":60000,"shenzhen":50000,"Hangzhou":20000,"Guangzhou":45000,"Suzhou":None}
apts=pd.Series(cities,name="income")
less_than_50000=(apts<=50000)
print (apts[less_than_50000])
Guangzhou 45000.0
Hangzhou 20000.0
shenzhen 50000.0
Name: income, dtype: float64
注:可以使用numpy的各种函数mean,median,max,min
例如:print(apts.max()) #求最大值
Series元素赋值
(1)直接利用索引值赋值
import pandas as pd
cities = {"Beijing":55000,"Shanghai":60000,"shenzhen":50000,"Hangzhou":20000,"Guangzhou":45000,"Suzhou":None}
apts=pd.Series(cities,name="income")
print(apts)
print('Old incone of shenzhen:{}'.format(apts['shenzhen']))
Beijing 55000.0
Guangzhou 45000.0
Hangzhou 20000.0
Shanghai 60000.0
Suzhou NaN
shenzhen 50000.0
Name: income, dtype: float64
Old incone of shenzhen:50000.0
apts['shenzhen']=70000
print(apts)
print('new income of shenzhen:{}'.format(apts['shenzhen']))
Beijing 55000.0
Guangzhou 45000.0
Hangzhou 20000.0
Shanghai 60000.0
Suzhou NaN
shenzhen 70000.0
Name: income, dtype: float64
new income of shenzhen:70000.0
(2)boolean indexing,在赋值里它也可以用
import pandas as pd
cities = {"Beijing":55000,"Shanghai":60000,"shenzhen":50000,"Hangzhou":20000,"Guangzhou":45000,"Suzhou":None}
apts=pd.Series(cities,name="income")
apts['shenzhen']=70000
print(apts)
print('new income of shenzhen:{}'.format(apts['shenzhen']))
less_than_50000= (apts<50000)
print(less_than_50000)
apts[less_than_50000]=40000
print(apts)
Beijing 55000.0
Guangzhou 45000.0
Hangzhou 20000.0
Shanghai 60000.0
Suzhou NaN
shenzhen 70000.0
Name: income, dtype: float64
new income of shenzhen:70000.0
Beijing False
Guangzhou True
Hangzhou True
Shanghai False
Suzhou False
shenzhen False
Name: income, dtype: bool
Beijing 55000.0
Guangzhou 40000.0
Hangzhou 40000.0
Shanghai 60000.0
Suzhou NaN
shenzhen 70000.0
Name: income, dtype: float64
数学运算
import pandas as pd
import numpy as np
cities = {"Beijing":55000,"Shanghai":60000,"shenzhen":50000,"Hangzhou":20000,"Guangzhou":45000,"Suzhou":None}
apts=pd.Series(cities,name="income")
apts['shenzhen']=70000#赋值
print('new income of shenzhen:{}'.format(apts['shenzhen']))
less_than_50000=(apts<50000)
apts[less_than_50000]=40000
print(apts/2)#除法
new income of shenzhen:70000.0
Beijing 27500.0
Guangzhou 20000.0
Hangzhou 20000.0
Shanghai 30000.0
Suzhou NaN
shenzhen 35000.0
Name: income, dtype: float64
print (apts**1.5)
Beijing 1.289864e+07
Guangzhou 8.000000e+06
Hangzhou 8.000000e+06
Shanghai 1.469694e+07
Suzhou NaN
shenzhen 1.852026e+07
Name: income, dtype: float64
apts2=pd.Series({'beijing':10000,'shanghai':8000,"shenzhen":60000,"guangzhou":7000,"chongqing":3000})
print (apts2)
print (apts+apts2)
Beijing 65000.0
Guangzhou NaN
Hangzhou NaN
Shanghai 68000.0
Suzhou NaN
chongqing NaN
guangzhou NaN
shenzhen 110000.0
dtype: float64
#注意区分大小写,否则会返回NaN数据缺失
数据缺失
cities={'Beijing':55000,'Shanghai':60000,'shenzhen':50000,'Hangzhou':20000,'Guangzhou':45000,'Suzhou':None}
apts=pd.Series(cities,name='income')
apts['shenzhen']=70000
less_than_50000=(apts<50000)
apts[less_than_50000]=40000
print(apts)
Beijing 55000.0
Guangzhou 40000.0
Hangzhou 40000.0
Shanghai 60000.0
Suzhou NaN
shenzhen 70000.0
Name: income, dtype: float64
apts2=pd.Series({'Beijing':10000,'Shanghai':8000,'shenzhen':6000,'Tianjin':40000,'Gunagzhou':7000,'Chongqing':30000})
print(apts2)
Beijing 10000
Chongqing 30000
Gunagzhou 7000
Shanghai 8000
Tianjin 40000
shenzhen 6000
dtype: int64
print('Hangzhou'in apts)#检测成员资格
True
print(apts.notnull())#判断是否为空,返回布尔值
Beijing True
Guangzhou True
Hangzhou True
Shanghai True
Suzhou False
shenzhen True
Name: income, dtype: bool
print(apts.isnull())#检验空值
Suzhou NaN
Name: income, dtype: float64
print(apts[apts.notnull()])#检测不为空的值
Beijing 55000.0
Guangzhou 40000.0
Hangzhou 40000.0
Shanghai 60000.0
shenzhen 70000.0
Name: income, dtype: float64
apts = apts + apts2#相加,不同键的值相加为NaN
print(apts)
Beijing 65000.0
Chongqing NaN
Guangzhou NaN
Gunagzhou NaN
Hangzhou NaN
Shanghai 68000.0
Suzhou NaN
Tianjin NaN
shenzhen 76000.0
dtype: float64
apts[apts.isnull()]=apts.mean()#将缺失位置赋值为中值
print(apts)
Beijing 65000.000000
Chongqing 69666.666667
Guangzhou 69666.666667
Gunagzhou 69666.666667
Hangzhou 69666.666667
Shanghai 68000.000000
Suzhou 69666.666667
Tianjin 69666.666667
shenzhen 76000.000000
dtype: float64