Pandas——Series(一维)
类似于一维数组与字典的结合,可以保存任何数据类型,有索引。
函数如下:
pandas.Series(data,index,dtype,name,copy)
参数说明:
data 一组数据(可接受多种输入:ndarray类型、列表、字典等)
index 数据索引标签,如果不指定,默认从0开始
dtype 数据类型,默认自己判断
name 设置名称
copy 拷贝数据,默认为false
创建简单的Series对象
import pandas as pd
a = [1, 2, 3]
x = pd.Series(a)
print(x)
'''
0 1
1 2
2 3
dtype: int64
'''
print(x[2])
//3
通过字典创建Series对象(即指定了索引值,不使用默认的0、1、2…)
import pandas as pd
z={'a':1,'b':2,'c':3}
zz=pd.Series(z)
print(zz)
'''
a 1
b 2
c 3
dtype: int64
'''
#同样可以在创建中指定索引值
import pandas as pd
a = ["Google", "Runoob", "Wiki"]
yebi = pd.Series(a, index = ["x", "y", "z"])
print(yebi)
'''
x Google
y Runoob
z Wiki
dtype: object
'''
#一样可以通过索引值取数据(类似字典)
print(yebi["y"])
#Runoob
参数
index 数据索引标签
#取部分数据
import pandas as pd
a={1:"yebi",2:"sandy",3:"hapi"}
b=pd.Series(a,index=[1,2])
print(b)
'''
1 yebi
2 sandy
dtype: object
'''
name 设置名称参数
import pandas as pd
a={1:"yebi",2:"sandy",3:"hapi"}
b=pd.Series(a,index=[1,2],name="小蔡不菜")
print(b)
'''
1 yebi
2 sandy
Name: 小蔡不菜, dtype: object
'''
Pandas——SataFrame(二维)
表格型的数据结构,有行索引和列索引。类似于二维数组?
格式:
pandas.DataFrame(data,index,columns,dtype,copy)
参数说明:
data 一组数据(ndarray类型、series、map、lists、dict等)
index 行标签
columns 列标签
dtype 数据类型
copy 拷贝数据,默认为False
创建
使用列表创建
import pandas as pd
data=[['spring',3],['summer',6],['autumn',9],['winter',12]]
a=pd.DataFrame(data,columns=['season','month'],dtype=float)
print(a)
'''
season month
0 spring 3.0
1 summer 6.0
2 autumn 9.0
3 winter 12.0
'''
使用值为列表的字典创建DataFrame
import pandas as pd
data={'spring':[1,2,3],'summer':[4,5,6],'autumn':[7,8,9],'winter':[10,11,12]}
a=pd.DataFrame(data)
print(a)
'''
spring summer autumn winter
0 1 4 7 10
1 2 5 8 11
2 3 6 9 12
'''
使用ndarrays创建
import pandas as pd
data={'season':['spring','summer','autumn','winter'],'month':[3,6,9,12]}
a=pd.DataFrame(data)
print(a)
'''
season month
0 spring 3
1 summer 6
2 autumn 9
3 winter 12
'''
使用字典创建
import pandas as pd
data = [{'a': 1, 'b': 2},{'a': 5, 'b': 10, 'c': 20}]
d = pd.DataFrame(data)
print (d)
'''
a b c
0 1 2 NaN
1 5 10 20.0
没有对应的数据返回显示为NaN
'''
读取、添加、删除
原始数据
import pandas as pd
d={'spring':pd.Series([1,2,3],index=['a','b','c']),
'summer':pd.Series([4,5,6,7],index=['a','b','c','d'])}
data=pd.DataFrame(d)
print(data)
'''
spring summer
a 1.0 4
b 2.0 5
c 3.0 6
d NaN 7
'''
读取
读列
import pandas as pd
d={'spring':pd.Series([1,2,3],index=['a','b','c']),
'summer':pd.Series([4,5,6,7],index=['a','b','c','d'])}
data=pd.DataFrame(d)
print(data['spring'])
'''
a 1.0
b 2.0
c 3.0
d NaN
Name: spring, dtype: float64
'''
读行
按行索引值传递给loc()函数选择行
import pandas as pd
d={'spring':pd.Series([1,2,3],index=['a','b','c']),
'summer':pd.Series([4,5,6,7],index=['a','b','c','d'])}
data=pd.DataFrame(d)
print(data['b'])
'''
spring 2.0
summer 5.0
Name: b, dtype: float64
'''
按行整数位置传递给iloc()函数选择行
import pandas as pd
d={'spring':pd.Series([1,2,3],index=['a','b','c']),
'summer':pd.Series([4,5,6,7],index=['a','b','c','d'])}
data=pd.DataFrame(d)
print(data.iloc[2])
'''
spring 3.0
summer 6.0
Name: c, dtype: float64
'''
#整数位置是从0开始
切片(前闭后开)
import pandas as pd
d={'spring':pd.Series([1,2,3],index=['a','b','c']),
'summer':pd.Series([4,5,6,7],index=['a','b','c','d'])}
data=pd.DataFrame(d)
print(data[1:3])
'''
spring summer
b 2.0 5
c 3.0 6
'''
添加
添加列
添加一列赋值为空
import pandas as pd
d={'spring':pd.Series([1,2,3],index=['a','b','c']),
'summer':pd.Series([4,5,6,7],index=['a','b','c','d'])}
d['winter']=''
data=pd.DataFrame(d)
print(data)
'''
spring summer winter
a 1.0 4
b 2.0 5
c 3.0 6
d NaN 7
'''
添加一列赋值一个list
列表的元素个数要与行一样长
import pandas as pd
d={'spring':pd.Series([1,2,3],index=['a','b','c']),
'summer':pd.Series([4,5,6,7],index=['a','b','c','d'])}
d['winter']=[9,10,11,12]
data=pd.DataFrame(d)
print(data)
'''
spring summer winter
a 1.0 4 10
b 2.0 5 11
c 3.0 6 12
d NaN 7 13
'''
如果需要在指定的位置添加新的一列,可以使用insert( )方法
同样可以插入空值
import pandas as pd
d={'spring':pd.Series([1,2,3],index=['a','b','c']),
'summer':pd.Series([4,5,6,7],index=['a','b','c','d'])}
data=pd.DataFrame(d)
data.insert(1,'winter',[10,11,12,13])
print(data)
'''
spring winter summer
a 1.0 10 4
b 2.0 11 5
c 3.0 12 6
d NaN 13 7
'''
对应的列值也可以直接相加
import pandas as pd
d={'spring':pd.Series([1,2,3],index=['a','b','c']),
'summer':pd.Series([4,5,6,7],index=['a','b','c','d'])}
data=pd.DataFrame(d)
data['autumn']=data['spring']+data['summer']
print(data)
'''
spring summer autumn
a 1.0 4 5.0
b 2.0 5 7.0
c 3.0 6 9.0
d NaN 7 NaN
'''
#可以看到空值加一个实值结果返回为空
添加行
使用append()添加
import pandas as pd
d=pd.DataFrame([[1,4],[2,5]],columns=['spring','summer'])
d2=pd.DataFrame([[3,6],[4,7]],columns=['spring','summer'])
d=d.append(d2)
print(d)
'''
spring summer
0 1 4
1 2 5
0 3 6
1 4 7
'''
删除
删除列
import pandas as pd
d={'spring':pd.Series([1,2,3],index=['a','b','c']),
'summer':pd.Series([4,5,6,7],index=['a','b','c','d']),
'autumn':pd.Series([8,9,10],index=['a','b','c'])}
data=pd.DataFrame(d)
print ("Our dataframe is:")
print(data)
print ("after dele:")
del data['spring']
print(data)
'''
Our dataframe is:
autumn spring summer
a 8.0 1.0 4
b 9.0 2.0 5
c 10.0 3.0 6
d NaN NaN 7
after dele:
autumn summer
a 8.0 4
b 9.0 5
c 10.0 6
d NaN 7
'''
删除行
import pandas as pd
d=pd.DataFrame([[1,4],[2,5]],columns=['spring','summer'])
d2=pd.DataFrame([[3,6],[4,7]],columns=['spring','summer'])
d=d.append(d2)
print ("Our dataframe is:")
print(d)
print ("after drop:")
d=d.drop(0)
print(d)
'''
Our dataframe is:
spring summer
0 1 4
1 2 5
0 3 6
1 4 7
after drop:
spring summer
1 2 5
1 4 7
'''
DataFrame对象的drop、del、pop操作
drop
格式
DataFrame.drop(self,labels = None,axis = 0,index = None,columns = None,level = None,inplace = False,errors ='raise' )
参数
labels : 单个标签或类似列表要删除的索引或列标签。
axis : {0或'index',1或'columns'},默认为0,是否从索引(0或'索引')或列(1或'列')中删除标签。
index : 单个标签或类似列表,指定轴的替代方法(labels, axis=0 相当于index=labels)。
columns : 单个标签或类似列表,指定轴的替代方法( labels, axis=1相当于columns=labels)。
level : int或level name,可选,对于MultiIndex,将从中删除标签的级别。
inplace : bool,默认为False,如果为True,则执行就地操作并返回None。
errors : {‘ignore’, ‘raise’},,默认'raise',如果“忽略”,则禁止错误,仅删除现有标签。
用法举例
import pandas as pd
import numpy as np
d=pd.DataFrame(np.random.randn(4,5),
columns=list('ABCDE'),
index=range(1,5))
print(d)
'''
A B C D E
1 -0.213820 0.153960 -0.618904 -0.356701 -1.088774
2 0.545769 -0.615198 0.016386 -1.079026 1.314265
3 0.090459 -1.938242 1.101826 1.377980 0.984999
4 -0.395841 0.229053 -2.652719 1.276688 0.606977
'''
d.drop(['A'])
print(d)
'''
B C D E
1 0.153960 -0.618904 -0.356701 -1.088774
2 -0.615198 0.016386 -1.079026 1.314265
3 -1.938242 1.101826 1.377980 0.984999
4 0.229053 -2.652719 1.276688 0.606977
'''
d.drop(1,inplace=True)
print(d)
'''
A B C D E
2 0.258209 -0.790135 0.779873 -0.503950 -0.283497
3 0.489645 0.845330 -1.317400 -0.309140 0.071767
4 0.819755 -0.926149 1.355650 0.938471 -1.049952
'''
del
import pandas as pd
import numpy as np
d=pd.DataFrame(np.random.randn(4,5),
columns=list('ABCDE'),
index=range(1,5))
print(d)
del d['A']
print(d)
'''
A B C D E
1 0.009558 0.527784 -2.023703 -1.549618 -0.104811
2 -1.585624 0.533658 -1.025577 0.991063 -0.255559
3 0.446651 0.265904 -0.593538 -1.591720 -1.098437
4 0.815546 -0.794852 -1.149644 1.263543 -1.834696
B C D E
1 0.527784 -2.023703 -1.549618 -0.104811
2 0.533658 -1.025577 0.991063 -0.255559
3 0.265904 -0.593538 -1.591720 -1.098437
4 -0.794852 -1.149644 1.263543 -1.834696
'''
pop
# -*- coding: UTF-8 -*-
import pandas as pd
import numpy as np
d=pd.DataFrame(np.random.randn(4,5),
columns=list('ABCDE'),
index=range(1,5))
print("原始:")
print(d)
print("删除的数据:")
z=d.pop('B')
print(z)
print("类型:")
print(type(z))
print("pop后的数据:")
print(d)
'''
原始:
A B C D E
1 -0.223835 0.609844 -0.290016 2.062144 0.137548
2 -0.960690 2.171232 -0.387543 0.421532 1.731145
3 -0.563288 -0.003827 0.107555 0.550450 0.369825
4 -0.954595 -0.705379 1.052727 0.215343 -0.022260
删除的数据:
1 0.609844
2 2.171232
3 -0.003827
4 -0.705379
Name: B, dtype: float64
类型:
<class 'pandas.core.series.Series'>
pop后的数据:
A C D E
1 -0.223835 -0.290016 2.062144 0.137548
2 -0.960690 -0.387543 0.421532 1.731145
3 -0.563288 0.107555 0.550450 0.369825
4 -0.954595 1.052727 0.215343 -0.022260
'''
Panel面板
三维数据的存储结构。相当于一个存储DataFrame的字典,3个轴(axis)分别代表意义如下
axis 0 items item对应一个内部的数据帧(DataFrame)
axis 1 major_axis 每个数据帧的索引行
axis 2 minor_axis 每个数据帧的索引列
格式
pandas.Panel(data, items, major_axis, minor_axis, dtype, copy)
参数说明
data 支持多种数据类型,如:ndarray,series,map,lists,dict,constant和其他数据帧
items axis=0
major_axis axis=1
minor_axis axis=2
dtype 每列的数据类型
copy 是否复制数据,默认为false
创建
import pandas as pd
p=pd.Panel()
print(p)
'''
<class 'pandas.core.panel.Panel'>
Dimensions: 0 (items) x 0 (major_axis) x 0 (minor_axis)
Items axis: None
Major_axis axis: None
Minor_axis axis: None
'''
ndarry创建
import pandas as pd
import numpy as np
data=np.random.rand(2,4,5)
p=pd.Panel(data)
print(p)
'''
<class 'pandas.core.panel.Panel'>
Dimensions: 2 (items) x 4 (major_axis) x 5 (minor_axis)
Items axis: 0 to 1
Major_axis axis: 0 to 3
Minor_axis axis: 0 to 4
'''
字典创建
# -*- coding: UTF-8 -*-
import pandas as pd
import numpy as np
data = {'one' : pd.DataFrame(np.random.randn(4, 3)),
'two' : pd.DataFrame(np.random.randn(4, 2))}
p = pd.Panel(data)
print (p)
'''
<class 'pandas.core.panel.Panel'>
Dimensions: 2 (items) x 4 (major_axis) x 3 (minor_axis)
Items axis: one to two
Major_axis axis: 0 to 3
Minor_axis axis: 0 to 2
'''
读取
item参数读取
# -*- coding: UTF-8 -*-
import pandas as pd
import numpy as np
data = {'Item1' : pd.DataFrame(np.random.randn(4, 3)),
'Item2' : pd.DataFrame(np.random.randn(4, 2))}
p = pd.Panel(data)
print (p['Item1'])
'''
0 1 2
0 -0.387732 0.233845 0.577483
1 -0.644665 0.563973 0.615362
2 1.815091 -1.008655 0.598779
3 1.268601 -1.007749 0.459688
'''
major_axis参数读取
import pandas as pd
import numpy as np
data = {'Item1' : pd.DataFrame(np.random.randn(4, 3)),
'Item2' : pd.DataFrame(np.random.randn(4, 2))}
p = pd.Panel(data)
print (p.major_xs(1))
'''
Item1 Item2
0 -0.641352 -0.109204
1 -0.149680 -1.085415
2 0.528007 NaN
'''
minor_axis参数读取
import pandas as pd
import numpy as np
data = {'Item1' : pd.DataFrame(np.random.randn(4, 3)),
'Item2' : pd.DataFrame(np.random.randn(4, 2))}
p = pd.Panel(data)
print (p.minor_xs(1))
'''
Item1 Item2
0 0.565158 -0.148676
1 -0.574921 -1.188221
2 0.542308 -0.500549
3 0.123562 1.895957
'''
DataFrame属性和方法
T 转置行和列(类似矩阵转置)
axes 返回一个列,行轴标签和列轴标签作为唯一的成员(就是返回行列名称)
dtypes 返回对象的数据类型
empty 判断是否为空,是则返回True,任何轴的长度都为0
ndim 返回数组维度大小,默认二维
shape 返回表示DataFrame的维度的元组(简单来说就是几行几列)
size 返回元素个数
values 将DataFrame中的实际数据作为NDarry返回
head() 返回开头前n行
tail() 返回最后n行
head()和tail()
# -*- coding: UTF-8 -*-
import pandas as pd
import numpy as np
#Create a Dictionary of series
d = {'Name':pd.Series(['Tom','James','Ricky','Vin','Steve','Minsu','Jack']),
'Age':pd.Series([25,26,25,23,30,29,23]),
'Rating':pd.Series([4.23,3.24,3.98,2.56,3.20,4.6,3.8])}
data = pd.DataFrame(d)
print ("原始数据:")
print (data)
print ("前2行数据:")
print (data.head(2))
print("最后3行数据:")
print(data.tail(3))
'''
原始数据:
Age Name Rating
0 25 Tom 4.23
1 26 James 3.24
2 25 Ricky 3.98
3 23 Vin 2.56
4 30 Steve 3.20
5 29 Minsu 4.60
6 23 Jack 3.80
前2行数据:
Age Name Rating
0 25 Tom 4.23
1 26 James 3.24
最后3行数据:
Age Name Rating
4 30 Steve 3.2
5 29 Minsu 4.6
6 23 Jack 3.8
'''
Series属性和方法
用法与DataFrame一致
axes 返回索引列表
dtype 返回对象的数据类型
empty 判断是否为空,是则返回True
ndim 返回数组维度大小,默认为1
size 返回元素个数
values 将Series中的实际数据作为NDarry返回
head() 返回开头前n行
tail() 返回最后n行