pandas是基于NumPy的一种工具,提供了快速便捷地处理结构化数据的大量数据结构和函数。使用最多的pandas对象主要是Series(一组数据及相应的索引标签)和DataFrame (二维表结构)。
Series (Series)是能够保存任何类型的数据(整数,字符串,浮点数,Python对象等)的一维标记数组。轴标签统称为索引。
列表创建
ser1 = pd.Series([10,20,30,40,50])
ser1
*********************
0 10
1 20
2 30
3 40
4 50
dtype: int64
*********************
标量值创建
ser3 = Series(100,index=['A','B','C','D','E'])
ser3
***************
A 10
B 20
C 30
D 40
E 50
dtype: int64
***************
随机数的创建
ser6 = pd.Series(np.random.randn(5),index=['a','b','c','d','e'])
print(ser6)
***************
a -0.329401
b -0.435921
c -0.232267
d -0.846713
e -0.406585
dtype: float64
***************
字典创建
ser4 = Series({'咖啡':30,'可乐':10,'奶茶':20},name = "price")
ser4
***************
咖啡 30
可乐 10
奶茶 20
Name: price ,dtype: int64
***************
ndarray数组创建
ser5 = Series(np.arange(5))
ser5
***************
0 0
1 1
2 2
3 3
4 4
dtype: int32
***************
Series属性
ser4
***************
咖啡 30
可乐 10
奶茶 20
dtype: int64
***************
ser4.index
# Index(['咖啡', '可乐', '奶茶'], dtype='object')
ser4.values
# array([30, 10, 20], dtype=int64)
ser4[2]
#20
ser4['奶茶']
#20
ser2
**********
A 10
B 20
C 30
D 40
E 50
dtype: int64
**********
ser2[ser2>ser2.median()]
*********
D 40
E 50
dtype: int64
*********
'D' in ser2
# True
Series的基本运算
import pandas as pd
import numpy as np
cities={'Beijing':55000,'Shanghai':60000,'shenzhen':50000,'Hangzhou':20000,'Guangzhou':45000,'Suzhou':None}
apts=pd.Series(cities,name='income')
apts
***********************
Beijing 55000.0
Shanghai 60000.0
shenzhen 50000.0
Hangzhou 20000.0
Guangzhou 45000.0
Suzhou NaN
Name: income, dtype: float64
***********************
apts[3]
# 20000.0
apts[[3,4,1]]
***************************
Hangzhou 20000.0
Guangzhou 45000.0
Shanghai 60000.0
Name: income, dtype: float64
***************************
apts[1:]
*****************************
Shanghai 60000.0
shenzhen 50000.0
Hangzhou 20000.0
Guangzhou 45000.0
Suzhou NaN
Name: income, dtype: float64
*****************************
apts[:-2]
*****************************
Beijing 55000.0
Shanghai 60000.0
shenzhen 50000.0
Hangzhou 20000.0
Name: income, dtype: float64
*****************************
apts[1:]+apts[:-1]
*****************************
Beijing NaN
Guangzhou 90000.0
Hangzhou 40000.0
Shanghai 120000.0
Suzhou NaN
shenzhen 100000.0
Name: income, dtype: float64
*****************************
apts['Shanghai']
# 60000.0
'Hangzhou' in apts
# True
'Choingqing' in apts
# False
less_than_50000=(apts<=50000)
apts[less_than_50000]
*****************************
shenzhen 50000.0
Hangzhou 20000.0
Guangzhou 45000.0
Name: income, dtype: float64
*****************************
apts.mean()
# 46000.0
'Old income of shenzhen:{}'.format(apts['shenzhen'])
# 'Old income of shenzhen:50000.0'
apts['shenzhen']=70000
apts
*****************************
Beijing 55000.0
Shanghai 60000.0
shenzhen 70000.0
Hangzhou 20000.0
Guangzhou 45000.0
Suzhou NaN
Name: income, dtype: float64
*****************************
#将小于50000的数据全部都转化成40000
less_than_50000=(apts<50000)
apts[less_than_50000]=40000
apts
*****************************
Beijing 55000.0
Shanghai 60000.0
shenzhen 70000.0
Hangzhou 40000.0
Guangzhou 40000.0
Suzhou NaN
Name: income, dtype: float64
*****************************
apts/2
*****************************
Beijing 27500.0
Shanghai 30000.0
shenzhen 35000.0
Hangzhou 20000.0
Guangzhou 20000.0
Suzhou NaN
Name: income, dtype: float64
*****************************
apts**1.5
*****************************
Beijing 1.289864e+07
Shanghai 1.469694e+07
shenzhen 1.852026e+07
Hangzhou 8.000000e+06
Guangzhou 8.000000e+06
Suzhou NaN
Name: income, dtype: float64
*****************************
np.log(apts)
*****************************
Beijing 10.915088
Shanghai 11.002100
shenzhen 11.156251
Hangzhou 10.596635
Guangzhou 10.596635
Suzhou NaN
Name: income, dtype: float64
*****************************
apts.notnull()
*****************************
Beijing True
Shanghai True
shenzhen True
Hangzhou True
Guangzhou True
Suzhou False
Name: income, dtype: bool
*****************************
apts.isnull()
*****************************
Beijing False
Shanghai False
shenzhen False
Hangzhou False
Guangzhou False
Suzhou True
Name: income, dtype: bool
*****************************
apts[apts.isnull()]
*****************************
Suzhou NaN
Name: income, dtype: float64
*****************************
apts2=pd.Series({'Beijing':10000,'Shanghai':8000,'shenzhen':6000,'Tianjin':40000,'Guangzhou':7000,'Chongqing':30000})
apts2
*****************************
Beijing 10000
Shanghai 8000
shenzhen 6000
Tianjin 40000
Guangzhou 7000
Chongqing 30000
dtype: int64
*****************************
#索引缺失相加
apts3 = apts+apts2
apts3
*****************************
Beijing 65000.0
Chongqing NaN
Guangzhou 47000.0
Hangzhou NaN
Shanghai 68000.0
Suzhou NaN
Tianjin NaN
shenzhen 76000.0
dtype: float64
*****************************
apts3[apts3.isnull()]=apts3.mean() #将缺失位置赋值为中值
apts3
*****************************
Beijing 65000.0
Chongqing 64000.0
Guangzhou 47000.0
Hangzhou 64000.0
Shanghai 68000.0
Suzhou 64000.0
Tianjin 64000.0
shenzhen 76000.0
dtype: float64
*****************************