pandas核心数据结构

pandas序列的创建和特性:

import numpy as np
import pandas as pd

"Series"
# 序列(array创建)
s = pd.Series(np.random.randn(5), index=['a', 'b', 'c', 'd', 'e'])
print(s)

# 字典表创建
d = {'a': 0, 'b': 2, 'd': 5}
s = pd.Series(d, index=list('abcd'))
print(s)

# 索引
print(s[1])
print(s[:2])

# print(s['a'])
s['a'] = 100
# print(s)

# 标签对齐
s1 = pd.Series(np.random.randn(3), index=['a', 'c', 'e'])
s2 = pd.Series(np.random.randn(3), index=['a', 'd', 'e'])
print(s1)
print(s2)
print(s1+s2)

DataFrame的创建和特性:

import numpy as np
import pandas as pd

"dataFrame"
# 字典表创建
d = {'one': pd.Series([1, 2, 3], index=list('abd')),
     'two': pd.Series([1, 2, 3, 4], index=list('abcd'))}
df = pd.DataFrame(d)
print(df)
print(pd.DataFrame(d, index=['d', 'b', 'a']))
print(pd.DataFrame(d, columns=['two', 'three']))

# 列表创建
data = [(1, 2.2, 'hello'), (2, 3, 'world')]
df1 = pd.DataFrame(data)
print(df1)
print(pd.DataFrame(data, index=['one', 'two'], columns=list('ABC')))

data2 = [{'a': 1, 'b': 2}, {'a': 5, 'b': 10, 'c': 20}]  # 标签自动对齐
print(pd.DataFrame(data2, index=['A', 'B']))

# 嵌套
d = {('a', 'b'): {('A', 'B'): 1, ('A', 'C'): 2},
     ('a', 'a'): {('A', 'C'): 3, ('A', 'B'): 4},
     ('a', 'c'): {('A', 'B'): 5, ('A', 'C'): 6},
     ('b', 'a'): {('A', 'C'): 7, ('A', 'B'): 8},
     ('b', 'b'): {('A', 'D'): 9, ('A', 'B'): 10}}

print(pd.DataFrame(d))

特性

df = pd.DataFrame(np.random.randn(6, 4), columns=['a', 'b', 'c', 'd'])
df
Out[3]: 
          a         b         c         d
0  0.356749  0.205543  0.508012  1.168596
1 -1.667490  0.097956  0.342722 -0.590740
2  0.658181  0.425978  0.820667 -1.772120
3  1.305535  0.268792 -0.385685  1.122207
4  0.570168  0.023381  0.399424  0.076078
5  2.232768  0.560912  0.596014  2.268188
df['a']
Out[5]: 
0    0.356749
1   -1.667490
2    0.658181
3    1.305535
4    0.570168
5    2.232768
Name: a, dtype: float64
df.loc[2]
Out[11]: 
a    0.658181
b    0.425978
c    0.820667
d   -1.772120
Name: 2, dtype: float64
df['e']=df['a']+df['c']
df
Out[14]: 
          a         b         c         d         e
0  0.356749  0.205543  0.508012  1.168596  0.864761
1 -1.667490  0.097956  0.342722 -0.590740 -1.324768
2  0.658181  0.425978  0.820667 -1.772120  1.478847
3  1.305535  0.268792 -0.385685  1.122207  0.919850
4  0.570168  0.023381  0.399424  0.076078  0.969591
5  2.232768  0.560912  0.596014  2.268188  2.828782
del df['e']
s=df.pop('d')

插入:

df
Out[20]: 
          a         b         c
0  0.356749  0.205543  0.508012
1 -1.667490  0.097956  0.342722
2  0.658181  0.425978  0.820667
3  1.305535  0.268792 -0.385685
4  0.570168  0.023381  0.399424
5  2.232768  0.560912  0.596014
df.insert(1,'bar',df['a']+df['c'])
df
Out[22]: 
          a       bar         b         c
0  0.356749  0.864761  0.205543  0.508012
1 -1.667490 -1.324768  0.097956  0.342722
2  0.658181  1.478847  0.425978  0.820667
3  1.305535  0.919850  0.268792 -0.385685
4  0.570168  0.969591  0.023381  0.399424
5  2.232768  2.828782  0.560912  0.596014

assign方法插入新列:

df
Out[25]: 
          a       bar         b         c
0  0.356749  0.864761  0.205543  0.508012
1 -1.667490 -1.324768  0.097956  0.342722
2  0.658181  1.478847  0.425978  0.820667
3  1.305535  0.919850  0.268792 -0.385685
4  0.570168  0.969591  0.023381  0.399424
5  2.232768  2.828782  0.560912  0.596014
df.assign(Ratio=df['b']/df['c'])
Out[26]: 
          a       bar         b         c     Ratio
0  0.356749  0.864761  0.205543  0.508012  0.404603
1 -1.667490 -1.324768  0.097956  0.342722  0.285817
2  0.658181  1.478847  0.425978  0.820667  0.519063
3  1.305535  0.919850  0.268792 -0.385685 -0.696922
4  0.570168  0.969591  0.023381  0.399424  0.058537
5  2.232768  2.828782  0.560912  0.596014  0.941105
df
Out[31]: 
          a       bar         b         c
0  0.356749  0.864761  0.205543  0.508012
1 -1.667490 -1.324768  0.097956  0.342722
2  0.658181  1.478847  0.425978  0.820667
3  1.305535  0.919850  0.268792 -0.385685
4  0.570168  0.969591  0.023381  0.399424
5  2.232768  2.828782  0.560912  0.596014
df.assign(Ratio=lambda x: x.a-x.b)
Out[32]: 
          a       bar         b         c     Ratio
0  0.356749  0.864761  0.205543  0.508012  0.151206
1 -1.667490 -1.324768  0.097956  0.342722 -1.765446
2  0.658181  1.478847  0.425978  0.820667  0.232203
3  1.305535  0.919850  0.268792 -0.385685  1.036743
4  0.570168  0.969591  0.023381  0.399424  0.546786
5  2.232768  2.828782  0.560912  0.596014  1.671856

loc,iloc:

df
Out[40]: 
          a       bar         b         c
A  0.356749  0.864761  0.205543  0.508012
B -1.667490 -1.324768  0.097956  0.342722
C  0.658181  1.478847  0.425978  0.820667
D  1.305535  0.919850  0.268792 -0.385685
E  0.570168  0.969591  0.023381  0.399424
F  2.232768  2.828782  0.560912  0.596014
df.loc['B']
Out[41]: 
a     -1.667490
bar   -1.324768
b      0.097956
c      0.342722
Name: B, dtype: float64
df.iloc[1]
Out[42]: 
a     -1.667490
bar   -1.324768
b      0.097956
c      0.342722
Name: B, dtype: float64
df.iloc[1:3]
Out[49]: 
          a       bar         b         c
B -1.667490 -1.324768  0.097956  0.342722
C  0.658181  1.478847  0.425978  0.820667

bool型索引选择:

df
Out[53]: 
          a       bar         b         c
A  0.356749  0.864761  0.205543  0.508012
B -1.667490 -1.324768  0.097956  0.342722
C  0.658181  1.478847  0.425978  0.820667
D  1.305535  0.919850  0.268792 -0.385685
E  0.570168  0.969591  0.023381  0.399424
F  2.232768  2.828782  0.560912  0.596014
df.a>=1
Out[54]: 
A    False
B    False
C    False
D     True
E    False
F     True
Name: a, dtype: bool
df[df.a>=1]
Out[55]: 
          a       bar         b         c
D  1.305535  0.919850  0.268792 -0.385685
F  2.232768  2.828782  0.560912  0.596014

numpy计算:

df
Out[62]: 
          a       bar         b         c
A  0.356749  0.864761  0.205543  0.508012
B -1.667490 -1.324768  0.097956  0.342722
C  0.658181  1.478847  0.425978  0.820667
D  1.305535  0.919850  0.268792 -0.385685
E  0.570168  0.969591  0.023381  0.399424
F  2.232768  2.828782  0.560912  0.596014
np.exp(df)
Out[63]: 
          a        bar         b         c
A  1.428677   2.374438  1.228192  1.661984
B  0.188720   0.265865  1.102914  1.408777
C  1.931275   4.387885  1.531087  2.272014
D  3.689664   2.508914  1.308384  0.679985
E  1.768564   2.636867  1.023657  1.490965
F  9.325644  16.924842  1.752270  1.814871

Panel的创建和特性:

data={'item1':pd.DataFrame(np.random.randn(4,3)),'item2':pd.DataFrame(np.random.randn(4,2))}
pn=pd.Panel(data)
pn
Out[74]: 
<class 'pandas.core.panel.Panel'>
Dimensions: 2 (items) x 4 (major_axis) x 3 (minor_axis)
Items axis: item1 to item2
Major_axis axis: 0 to 3
Minor_axis axis: 0 to 2
pn['item1']
Out[75]: 
          0         1         2
0 -2.224034  1.265258 -1.239071
1  0.884334  1.914626 -2.061604
2 -0.055456  1.014718  1.391456
3  0.371961 -1.768514  0.187332
pn['item2']
Out[76]: 
          0         1   2
0 -1.829447 -1.187154 NaN
1  0.708648  0.699697 NaN
2  1.699782 -0.063514 NaN
3 -0.235694 -1.529910 NaN

三个维度:

pn.items
Out[77]: Index(['item1', 'item2'], dtype='object')
pn.major_axis
Out[78]: RangeIndex(start=0, stop=4, step=1)
pn.minor_axis
Out[79]: RangeIndex(start=0, stop=3, step=1)

行和列索引:

pn.major_xs(1)
Out[80]: 
      item1     item2
0  0.884334  0.708648
1  1.914626  0.699697
2 -2.061604       NaN

pn.minor_xs(1)
Out[81]: 
      item1     item2
0  1.265258 -1.187154
1  1.914626  0.699697
2  1.014718 -0.063514
3 -1.768514 -1.529910

转化为DataFrame二维度数据:

pn.to_frame()
Out[82]: 
                item1     item2
major minor                    
0     0     -2.224034 -1.829447
      1      1.265258 -1.187154
1     0      0.884334  0.708648
      1      1.914626  0.699697
2     0     -0.055456  1.699782
      1      1.014718 -0.063514
3     0      0.371961 -0.235694
      1     -1.768514 -1.529910
  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值