Python3 pandas库Series用法基础

构造/初始化Series的3种方法:

(1)用列表list构建Series

import pandas as pd
my_list = [7,"beijing","19大",3.1415,-10000,"Happy"]
s=pd.Series(my_list)
print(type(s))
print(s)
<class 'pandas.core.series.Series'>
0          7
1    beijing
2        19大
3     3.1415
4     -10000
5      Happy
dtype: object

(1.2)pandas会默认用0到n来做Series的index,但是我们也可以自己指定index,index你可以理解为dict里面的key

s = pd.Series([7,"beijing","19大",3.1415,-10000,"Happy"],index=["A","B","C","D","E","F"])
print(s)
A          7
B    beijing
C        19大
D     3.1415
E     -10000
F      Happy
dtype: object

(2)用字典dict来构建Series,因为Series本身其实就是key-value的结构

cities={"Beijing":55000,"Shanghai":60000,"shenzhen":50000,"Hangzhou":20000,"Guangzhou":45000,"Suzhou":None}
apts=pd.Series(cities,name="income")
print(apts)
Beijing      55000.0
Guangzhou    45000.0
Hangzhou     20000.0
Shanghai     60000.0
Suzhou           NaN
shenzhen     50000.0
Name: income, dtype: float64

(3)用numpy array来构建Series

import numpy as np
d = pd.Series(np.random.randn(6),index=["a","b","c","d","e","f"])
print(d)
a    0.405263
b    0.092748
c    0.380906
d   -0.580236
e   -1.149417
f   -0.659308
dtype: float64

选择数据

(1)可以像对待一个list一样对待一个Series,完成各种切片的操作

import pandas as pd
cities = {"Beijing":55000,"Shanghai":60000,"shenzhen":50000,"Hangzhou":20000,"Guangzhou":45000,"Suzhou":None}
apts = pd.Series(cities,name="income")
print(apts)
Beijing      55000.0
Guangzhou    45000.0
Hangzhou     20000.0
Shanghai     60000.0
Suzhou           NaN
shenzhen     50000.0
Name: income, dtype: float64
print(apts[3])
 
60000.0
print(apts[3:])
Shanghai    60000.0
Suzhou          NaN
shenzhen    50000.0
Name: income, dtype: float64
print(apts[1:]+(apts[:-1]))
Beijing           NaN
Guangzhou     90000.0
Hangzhou      40000.0
Shanghai     120000.0
Suzhou            NaN
shenzhen          NaN
Name: income, dtype: float64
(2)Series就像一个dict,前面定义的index就是用来选择数据的

import pandas as pd
cities = {"Beijing":55000,"Shanghai":60000,"shenzhen":50000,"Hangzhou":20000,"Guangzhou":45000,"Suzhou":None}
apts=pd.Series(cities,name="income")
print(apts["Shanghai"])
print("Hangzhou" in apts)
True
print("Jilin" in apts)
False
(3)boolean indexing,和numpy很像

import pandas as pd
cities = {"Beijing":55000,"Shanghai":60000,"shenzhen":50000,"Hangzhou":20000,"Guangzhou":45000,"Suzhou":None}
apts=pd.Series(cities,name="income")
less_than_50000=(apts<=50000)
print (apts[less_than_50000])
Guangzhou    45000.0
Hangzhou     20000.0
shenzhen     50000.0
Name: income, dtype: float64


 

注:可以使用numpy的各种函数mean,median,max,min

例如:print(apts.max()) #求最大值

Series元素赋值

(1)直接利用索引值赋值

import pandas as pd
cities = {"Beijing":55000,"Shanghai":60000,"shenzhen":50000,"Hangzhou":20000,"Guangzhou":45000,"Suzhou":None}
apts=pd.Series(cities,name="income")
print(apts)
print('Old incone of shenzhen:{}'.format(apts['shenzhen']))

Beijing      55000.0
Guangzhou    45000.0
Hangzhou     20000.0
Shanghai     60000.0
Suzhou           NaN
shenzhen     50000.0
Name: income, dtype: float64
Old incone of shenzhen:50000.0

apts['shenzhen']=70000
print(apts)
print('new income of shenzhen:{}'.format(apts['shenzhen']))

Beijing      55000.0
Guangzhou    45000.0
Hangzhou     20000.0
Shanghai     60000.0
Suzhou           NaN
shenzhen     70000.0
Name: income, dtype: float64
new income of shenzhen:70000.0

(2)boolean indexing,在赋值里它也可以用

import pandas as pd
cities = {"Beijing":55000,"Shanghai":60000,"shenzhen":50000,"Hangzhou":20000,"Guangzhou":45000,"Suzhou":None}
apts=pd.Series(cities,name="income")
apts['shenzhen']=70000
print(apts)
print('new income of shenzhen:{}'.format(apts['shenzhen']))
less_than_50000= (apts<50000)
print(less_than_50000)
apts[less_than_50000]=40000
print(apts)

Beijing      55000.0
Guangzhou    45000.0
Hangzhou     20000.0
Shanghai     60000.0
Suzhou           NaN
shenzhen     70000.0
Name: income, dtype: float64
new income of shenzhen:70000.0
Beijing      False
Guangzhou     True
Hangzhou      True
Shanghai     False
Suzhou       False
shenzhen     False
Name: income, dtype: bool
Beijing      55000.0
Guangzhou    40000.0
Hangzhou     40000.0
Shanghai     60000.0
Suzhou           NaN
shenzhen     70000.0
Name: income, dtype: float64

数学运算

import pandas as pd
import numpy as np
cities = {"Beijing":55000,"Shanghai":60000,"shenzhen":50000,"Hangzhou":20000,"Guangzhou":45000,"Suzhou":None}
apts=pd.Series(cities,name="income")
apts['shenzhen']=70000#赋值
print('new income of shenzhen:{}'.format(apts['shenzhen']))
less_than_50000=(apts<50000)
apts[less_than_50000]=40000
print(apts/2)#除法
new income of shenzhen:70000.0
Beijing      27500.0
Guangzhou    20000.0
Hangzhou     20000.0
Shanghai     30000.0
Suzhou           NaN
shenzhen     35000.0
Name: income, dtype: float64


乘方
print (apts**1.5)
Beijing      1.289864e+07
Guangzhou    8.000000e+06
Hangzhou     8.000000e+06
Shanghai     1.469694e+07
Suzhou                NaN
shenzhen     1.852026e+07
Name: income, dtype: float64


两组数据相加
apts2=pd.Series({'beijing':10000,'shanghai':8000,"shenzhen":60000,"guangzhou":7000,"chongqing":3000})
print (apts2)
print (apts+apts2)
Beijing       65000.0
Guangzhou         NaN
Hangzhou          NaN
Shanghai      68000.0
Suzhou            NaN
chongqing         NaN
guangzhou         NaN
shenzhen     110000.0
dtype: float64
#注意区分大小写,否则会返回NaN数据缺失

数据缺失

cities={'Beijing':55000,'Shanghai':60000,'shenzhen':50000,'Hangzhou':20000,'Guangzhou':45000,'Suzhou':None}
apts=pd.Series(cities,name='income')
apts['shenzhen']=70000
less_than_50000=(apts<50000)
apts[less_than_50000]=40000
print(apts)
Beijing      55000.0
Guangzhou    40000.0
Hangzhou     40000.0
Shanghai     60000.0
Suzhou           NaN
shenzhen     70000.0
Name: income, dtype: float64
apts2=pd.Series({'Beijing':10000,'Shanghai':8000,'shenzhen':6000,'Tianjin':40000,'Gunagzhou':7000,'Chongqing':30000})
print(apts2)
Beijing      10000
Chongqing    30000
Gunagzhou     7000
Shanghai      8000
Tianjin      40000
shenzhen      6000
dtype: int64

print('Hangzhou'in apts)#检测成员资格
True
print(apts.notnull())#判断是否为空,返回布尔值
Beijing       True
Guangzhou     True
Hangzhou      True
Shanghai      True
Suzhou       False
shenzhen      True
Name: income, dtype: bool
print(apts.isnull())#检验空值
Suzhou   NaN
Name: income, dtype: float64
print(apts[apts.notnull()])#检测不为空的值
Beijing      55000.0
Guangzhou    40000.0
Hangzhou     40000.0
Shanghai     60000.0
shenzhen     70000.0
Name: income, dtype: float64
apts = apts + apts2#相加,不同键的值相加为NaN
print(apts)
Beijing      65000.0
Chongqing        NaN
Guangzhou        NaN
Gunagzhou        NaN
Hangzhou         NaN
Shanghai     68000.0
Suzhou           NaN
Tianjin          NaN
shenzhen     76000.0
dtype: float64
apts[apts.isnull()]=apts.mean()#将缺失位置赋值为中值
print(apts)
Beijing      65000.000000
Chongqing    69666.666667
Guangzhou    69666.666667
Gunagzhou    69666.666667
Hangzhou     69666.666667
Shanghai     68000.000000
Suzhou       69666.666667
Tianjin      69666.666667
shenzhen     76000.000000
dtype: float64







  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值