https://pandas.pydata.org/
本篇介绍pandasd 入门应用,创建一个Series和DataFrame,以及简单的增删改查。
1.1、Series
导入工具包
import numpy as np
import pandas as pd
from pandas import Series,DataFrame
简单创建 Series
# 简单创建
obj=Series([4,7,-5,3])
# 自己设置索引
obj2=Series([4,5,6,7],index=['a','b','c','d'])
--------------------------------------------------------------------------------------------------------------------------------
Out[2]:
0 4
1 7
2 -5
3 3
dtype: int64
--------------------------------------------------------------------------------------------------------------------------------
a 4
b 5
c 6
d 7
dtype: int64
--------------------------------------------------------------------------------------------------------------------------------
通过字典创建Series
sdata={'Ohio':35000,'Texas':7000,'Oregon':16000,'Utah':5000,'California':5300}
obj3=Series(sdata)
obj3
--------------------------------------------------------------------------------------------------------------------------------
California 5300
Ohio 35000
Oregon 16000
Texas 7000
Utah 5000
dtype: int64
--------------------------------------------------------------------------------------------------------------------------------
# 转回字典
s1=obj3.to_dict()
--------------------------------------------------------------------------------------------------------------------------------
# 设置索引,修改索引
states=['Ohio','Texas','Oregon','Utah','California']
obj4=Series(sdata,index=states)
obj4.index=['a','b','c','d','f']
查看值和索引
obj.values
-------------------------------------------------------------------
array([ 4, 7, -5, 3], dtype=int64)
-------------------------------------------------------------------
obj.index
-------------------------------------------------------------------
RangeIndex(start=0, stop=4, step=1)
-------------------------------------------------------------------
查找
- 通过索引值进行查找
obj2[['a','c']]
---------------------------------------------------------------
a 4
c 6
dtype: int64
---------------------------------------------------------------
obj2[['a','c']]
---------------------------------------------------------------
a 4
b 5
c 6
dtype: int64
---------------------------------------------------------------
更改
- 通过查找到某个值,直接重新赋值
obj2['a']=9
obj2
a 9
b 5
c 6
d 7
dtype: int64
1.2、DataFrame
简单创建DataFrame
# 通过字典创建三列数据
data={'state':['Ohio','Texas','Oregon','Utah'],'year':[2000,2001,2002,2003],'pop':[1.5,1.6,0.8,1.7]}
df1=DataFrame(data)
df1
---------------------------------------------------------------
state year pop
0 Ohio 2000 1.5
1 Texas 2001 1.6
2 Oregon 2002 0.8
3 Utah 2003 1.7
---------------------------------------------------------------
# 重新排列顺序
df2=DataFrame(data,columns=['year','state','pop'])
----------------------------------------------------------------
year state pop
0 2000 Ohio 1.5
1 2001 Texas 1.6
2 2002 Oregon 0.8
3 2003 Utah 1.7
字典创建DataFrame
pop={'Nevada':{2001:2.4,2002:3.1},'Ohio':{2000:1.2,2001:2.4,2002:1.9}}
df3=DataFrame(pop)
df3
---------------------------------------------------------------------
Nevada Ohio
2000 NaN 1.2
2001 2.4 2.4
2002 3.1 1.9
添加索引跟列值的名称
df3.index.name='year'
df3.columns.name='state'
df3
--------------------------------------------
state Nevada Ohio
year
2000 NaN 1.2
2001 2.4 2.4
2002 3.1 1.9
索引,字段名重命名
df1=DataFrame(np.arange(9).reshape(3,3),index=['bj','sh','gz'],columns=['c1','c2','c3'])
df1
----------------------------------------------------------------------------------------------------------------
c1 c2 c3
bj 0 1 2
sh 3 4 5
gz 6 7 8
************************************************************************************************
# map
df1.index=Series(['北京','上海','广州'])
df1.index=df1.index.map(str.upper)
# rename
df1.rename(index={'BJ':'beijing'})
df1.rename(index=str.lower,columns=str.upper)
-----------------------------------------------------------------------------------------------------------------
C1 C2 C3
bj 0 1 2
sh 3 4 5
gz 6 7 8
# 自定义一个函数进行改名
def test_map(x):
return x+"_abc"
df1.index=df1.index.map(test_map)
df1.rename(index=test_map)
查找
# 查找某一列值
df2['year']
df2.year
--------------------------------------------------------
0 2000
1 2001
2 2002
3 2003
Name: year, dtype: int64
--------------------------------------------------------
# 查找某一个值
df2['year'][1]
df2.year[1]
-------------------------------------------------------
2001
-------------------------------------------------------
赋值
# 新增一列(debt)
df2['debt']=np.arange(4.)
df2['debt']=16.5
------------------------------------------------------------------
year state pop debt
0 2000 Ohio 1.5 16.5
1 2001 Texas 1.6 16.5
2 2002 Oregon 0.8 16.5
3 2003 Utah 1.7 16.5
修改
# 指定一个值赋值
df2['pop'][0]=2
---------------------------------------------------------------------
# 根据索引批量修改(未指定的值默认为NaN)
val=Series([-1,-2,-3],index=[0,2,3])
df2['debt']=val
df2
---------------------------------------------------------------------
year state pop debt
0 2000 Ohio 2.0 -1.0
1 2001 Texas 1.6 NaN
2 2002 Oregon 0.8 -2.0
3 2003 Utah 1.7 -3.0
删除
# 判断是否为Ohio,并新增一列
df2['eastern']=df2.state=='Ohio'
-----------------------------------------------------------
year state pop debt eastern
0 2000 Ohio 2.0 -1.0 True
1 2001 Texas 1.6 NaN False
2 2002 Oregon 0.8 -2.0 False
3 2003 Utah 1.7 -3.0 False
-----------------------------------------------------------
# 删除eastern
del df2['eastern']
----------------------------------------------------------
year state pop debt
0 2000 Ohio 2.0 -1.0
1 2001 Texas 1.6 NaN
2 2002 Oregon 0.8 -2.0
3 2003 Utah 1.7 -3.0
----------------------------------------------------------
# 删除多列
df=a.drop(['R10D','R20D'],axis=1)