【Python数据分析与展示】（四）pandas库基本操作

本文链接：https://blog.csdn.net/polarislove36/article/details/78787706

Series

Series是由一组数据和数据的索引构成

import numpy as np
import pandas as pd
a = pd.Series ([9,8,7,6],index = ['a','b','c','d']) #如果index处于属性的第二位，可以省略“index =”
#a    9
 b    8
 c    7
 d    6
dtype: int64
a = pd.Series (25,index = ['a','b','c','d']) #不能省略index = 
d= {'a':1,'b':5,'c':7}
b = pd.Series(d) #从字典创建Series
b = pd.Series(d,index = ["b","c","d"])
#b    5.0
 c    7.0
 d    NaN
dtype: float64
e = pd.Series (np.arange(0,5),index = np.arange(9,4,-1)) #从ndarray创建Series
# 9    0
  8    1
  7    2
  6    3
  5    4
dtype: int32
b.values #array([  5.,   7.,  nan])
b.index #Index(['b', 'c', 'd'], dtype='object')
b[1] #7.0 index的默认索引，如果自定义索引是数字，则不能使用默认索引了
b['b'] #5.0
f = pd.Series([9,8,7,6],index = ['a','b','c','d'])
#a    9
 b    8
 c    7
 d    6
dtype: int64
np.exp(f) #nparray的操作可以用于Series
#a    8103.083928
 b    2980.957987
 c    1096.633158
 d     403.428793
dtype: float64
f[:3]
#a    9
b    8
c    7
dtype: int64
f[f>7]
#a    9
b    8
dtype: int64
g =pd.Series([1,2,3,5],index = ['b','c','d',"e"])
f+g #注意结果中索引的对齐
#a    NaN
b    9.0
c    9.0
d    9.0
e    NaN
dtype: float64
f.name = "f Series 对象"
f.index.name = "索引咧名字"
#索引咧名字
a    9
b    8
c    7
d    6
Name: f Series 对象, dtype: int64
f['a'] = 18 #直接修改
索引咧名字
a    18
b     8
c     7
d     6
Name: f Series 对象, dtype: int64

DataFrame

由共用相同的索引的一组列构成，或者可以说是带行列索引的二维数组

a = pd.DataFrame(np.arange(0,10).reshape(5,2)) # 由二维数组ndarray创建
a
#   0   1
0   0   1
1   2   3
2   4   5
3   6   7
4   8   9
dt = {"one":pd.Series([1,2,3],index = ['a','b','c']),
     "two":pd.Series([9,8,7,6],index = ['a','b','c','d'])}
d = pd.DataFrame(dt)
#   one two
a   1.0 9
b   2.0 8
c   3.0 7
d   NaN 6
d.columns= ["一","二"] #修改列名
d.columns 
d1 = {"one":[1,2,3,4],"two":[5,6,7,8]} #从列表类型的字典创建
pd.DataFrame(d1,index = ['a','b','c','d'])
#
one two
a   1   5
b   2   6
c   3   7
d   4   8
d.reindex(index =['a','d','c','b'],columns =  ["二","一"]，fill)
@   二   一
a   9   1.0
d   6   NaN
c   7   3.0
b   8   2.0

.index .columns 的索引是index类型，不可修改。.drop能删除指定的Series和DataFrame的行或列，删除列的时候需要给出参数，axis =1
索引的常用操作

方法	说明
.append(idx)	链接另一个index对象，产生新的index对象
.diff(idx)	计算差集，产生新的index对象
.intersection(idx)	计算交集
.union(idx)	计算并集
.delect(loc)	删除loc位置处的元素
.insert(loc,c)	在loc位置插入元素

pandas运算·

b1 = pd.DataFrame(np.arange(0,20).reshape(4,5)) 
#   0   1   2   3   4
0   0   1   2   3   4
1   5   6   7   8   9
2   10  11  12  13  14
3   15  16  17  18  19
c = pd.Series(np.arange(4))
#0    0
1    1
2    2
3    3
dtype: int32
b - c  #默认发生在1轴上，也就是按列计算
#0  1   2   3   4
0   0.0 0.0 0.0 0.0 NaN
1   5.0 5.0 5.0 5.0 NaN
2   10.0    10.0    10.0    10.0    NaN
3   15.0    15.0    15.0    15.0    NaN
b1.sub(c,axis = 0) 
#   0   1   2   3   4
0   0   1   2   3   4
1   4   5   6   7   8
2   8   9   10  11  12
3   12  13  14  15  16
#算数运算 .add() .sub() .mul() .div()
#比较运算 同纬度运算，尺寸要一致，不同纬度广播，默认发生在1轴