pandas核心数据结构

最新推荐文章于 2022-02-28 13:42:48 发布

GSmate

最新推荐文章于 2022-02-28 13:42:48 发布

阅读量202

点赞数

本文链接：https://blog.csdn.net/qq_44318499/article/details/104494735

版权

pandas序列的创建和特性：

import numpy as np
import pandas as pd

"Series"
# 序列（array创建）
s = pd.Series(np.random.randn(5), index=['a', 'b', 'c', 'd', 'e'])
print(s)

# 字典表创建
d = {'a': 0, 'b': 2, 'd': 5}
s = pd.Series(d, index=list('abcd'))
print(s)

# 索引
print(s[1])
print(s[:2])

# print(s['a'])
s['a'] = 100
# print(s)

# 标签对齐
s1 = pd.Series(np.random.randn(3), index=['a', 'c', 'e'])
s2 = pd.Series(np.random.randn(3), index=['a', 'd', 'e'])
print(s1)
print(s2)
print(s1+s2)

DataFrame的创建和特性：

import numpy as np
import pandas as pd

"dataFrame"
# 字典表创建
d = {'one': pd.Series([1, 2, 3], index=list('abd')),
     'two': pd.Series([1, 2, 3, 4], index=list('abcd'))}
df = pd.DataFrame(d)
print(df)
print(pd.DataFrame(d, index=['d', 'b', 'a']))
print(pd.DataFrame(d, columns=['two', 'three']))

# 列表创建
data = [(1, 2.2, 'hello'), (2, 3, 'world')]
df1 = pd.DataFrame(data)
print(df1)
print(pd.DataFrame(data, index=['one', 'two'], columns=list('ABC')))

data2 = [{'a': 1, 'b': 2}, {'a': 5, 'b': 10, 'c': 20}]  # 标签自动对齐
print(pd.DataFrame(data2, index=['A', 'B']))

# 嵌套
d = {('a', 'b'): {('A', 'B'): 1, ('A', 'C'): 2},
     ('a', 'a'): {('A', 'C'): 3, ('A', 'B'): 4},
     ('a', 'c'): {('A', 'B'): 5, ('A', 'C'): 6},
     ('b', 'a'): {('A', 'C'): 7, ('A', 'B'): 8},
     ('b', 'b'): {('A', 'D'): 9, ('A', 'B'): 10}}

print(pd.DataFrame(d))

特性

df = pd.DataFrame(np.random.randn(6, 4), columns=['a', 'b', 'c', 'd'])
df
Out[3]: 
          a         b         c         d
0  0.356749  0.205543  0.508012  1.168596
1 -1.667490  0.097956  0.342722 -0.590740
2  0.658181  0.425978  0.820667 -1.772120
3  1.305535  0.268792 -0.385685  1.122207
4  0.570168  0.023381  0.399424  0.076078
5  2.232768  0.560912  0.596014  2.268188

df['a']
Out[5]: 
0    0.356749
1   -1.667490
2    0.658181
3    1.305535
4    0.570168
5    2.232768
Name: a, dtype: float64

df.loc[2]
Out[11]: 
a    0.658181
b    0.425978
c    0.820667
d   -1.772120
Name: 2, dtype: float64

df['e']=df['a']+df['c']
df
Out[14]: 
          a         b         c         d         e
0  0.356749  0.205543  0.508012  1.168596  0.864761
1 -1.667490  0.097956  0.342722 -0.590740 -1.324768
2  0.658181  0.425978  0.820667 -1.772120  1.478847
3  1.305535  0.268792 -0.385685  1.122207  0.919850
4  0.570168  0.023381  0.399424  0.076078  0.969591
5  2.232768  0.560912  0.596014  2.268188  2.828782

del df['e']
s=df.pop('d')

插入：

df
Out[20]: 
          a         b         c
0  0.356749  0.205543  0.508012
1 -1.667490  0.097956  0.342722
2  0.658181  0.425978  0.820667
3  1.305535  0.268792 -0.385685
4  0.570168  0.023381  0.399424
5  2.232768  0.560912  0.596014
df.insert(1,'bar',df['a']+df['c'])
df
Out[22]: 
          a       bar         b         c
0  0.356749  0.864761  0.205543  0.508012
1 -1.667490 -1.324768  0.097956  0.342722
2  0.658181  1.478847  0.425978  0.820667
3  1.305535  0.919850  0.268792 -0.385685
4  0.570168  0.969591  0.023381  0.399424
5  2.232768  2.828782  0.560912  0.596014

assign方法插入新列：

df
Out[25]: 
          a       bar         b         c
0  0.356749  0.864761  0.205543  0.508012
1 -1.667490 -1.324768  0.097956  0.342722
2  0.658181  1.478847  0.425978  0.820667
3  1.305535  0.919850  0.268792 -0.385685
4  0.570168  0.969591  0.023381  0.399424
5  2.232768  2.828782  0.560912  0.596014
df.assign(Ratio=df['b']/df['c'])
Out[26]: 
          a       bar         b         c     Ratio
0  0.356749  0.864761  0.205543  0.508012  0.404603
1 -1.667490 -1.324768  0.097956  0.342722  0.285817
2  0.658181  1.478847  0.425978  0.820667  0.519063
3  1.305535  0.919850  0.268792 -0.385685 -0.696922
4  0.570168  0.969591  0.023381  0.399424  0.058537
5  2.232768  2.828782  0.560912  0.596014  0.941105

df
Out[31]: 
          a       bar         b         c
0  0.356749  0.864761  0.205543  0.508012
1 -1.667490 -1.324768  0.097956  0.342722
2  0.658181  1.478847  0.425978  0.820667
3  1.305535  0.919850  0.268792 -0.385685
4  0.570168  0.969591  0.023381  0.399424
5  2.232768  2.828782  0.560912  0.596014
df.assign(Ratio=lambda x: x.a-x.b)
Out[32]: 
          a       bar         b         c     Ratio
0  0.356749  0.864761  0.205543  0.508012  0.151206
1 -1.667490 -1.324768  0.097956  0.342722 -1.765446
2  0.658181  1.478847  0.425978  0.820667  0.232203
3  1.305535  0.919850  0.268792 -0.385685  1.036743
4  0.570168  0.969591  0.023381  0.399424  0.546786
5  2.232768  2.828782  0.560912  0.596014  1.671856

loc，iloc：

df
Out[40]: 
          a       bar         b         c
A  0.356749  0.864761  0.205543  0.508012
B -1.667490 -1.324768  0.097956  0.342722
C  0.658181  1.478847  0.425978  0.820667
D  1.305535  0.919850  0.268792 -0.385685
E  0.570168  0.969591  0.023381  0.399424
F  2.232768  2.828782  0.560912  0.596014
df.loc['B']
Out[41]: 
a     -1.667490
bar   -1.324768
b      0.097956
c      0.342722
Name: B, dtype: float64
df.iloc[1]
Out[42]: 
a     -1.667490
bar   -1.324768
b      0.097956
c      0.342722
Name: B, dtype: float64
df.iloc[1:3]
Out[49]: 
          a       bar         b         c
B -1.667490 -1.324768  0.097956  0.342722
C  0.658181  1.478847  0.425978  0.820667

bool型索引选择：

df
Out[53]: 
          a       bar         b         c
A  0.356749  0.864761  0.205543  0.508012
B -1.667490 -1.324768  0.097956  0.342722
C  0.658181  1.478847  0.425978  0.820667
D  1.305535  0.919850  0.268792 -0.385685
E  0.570168  0.969591  0.023381  0.399424
F  2.232768  2.828782  0.560912  0.596014
df.a>=1
Out[54]: 
A    False
B    False
C    False
D     True
E    False
F     True
Name: a, dtype: bool
df[df.a>=1]
Out[55]: 
          a       bar         b         c
D  1.305535  0.919850  0.268792 -0.385685
F  2.232768  2.828782  0.560912  0.596014

numpy计算：

df
Out[62]: 
          a       bar         b         c
A  0.356749  0.864761  0.205543  0.508012
B -1.667490 -1.324768  0.097956  0.342722
C  0.658181  1.478847  0.425978  0.820667
D  1.305535  0.919850  0.268792 -0.385685
E  0.570168  0.969591  0.023381  0.399424
F  2.232768  2.828782  0.560912  0.596014
np.exp(df)
Out[63]: 
          a        bar         b         c
A  1.428677   2.374438  1.228192  1.661984
B  0.188720   0.265865  1.102914  1.408777
C  1.931275   4.387885  1.531087  2.272014
D  3.689664   2.508914  1.308384  0.679985
E  1.768564   2.636867  1.023657  1.490965
F  9.325644  16.924842  1.752270  1.814871

Panel的创建和特性：

data={'item1':pd.DataFrame(np.random.randn(4,3)),'item2':pd.DataFrame(np.random.randn(4,2))}
pn=pd.Panel(data)
pn
Out[74]: 
<class 'pandas.core.panel.Panel'>
Dimensions: 2 (items) x 4 (major_axis) x 3 (minor_axis)
Items axis: item1 to item2
Major_axis axis: 0 to 3
Minor_axis axis: 0 to 2

pn['item1']
Out[75]: 
          0         1         2
0 -2.224034  1.265258 -1.239071
1  0.884334  1.914626 -2.061604
2 -0.055456  1.014718  1.391456
3  0.371961 -1.768514  0.187332
pn['item2']
Out[76]: 
          0         1   2
0 -1.829447 -1.187154 NaN
1  0.708648  0.699697 NaN
2  1.699782 -0.063514 NaN
3 -0.235694 -1.529910 NaN

三个维度：

pn.items
Out[77]: Index(['item1', 'item2'], dtype='object')
pn.major_axis
Out[78]: RangeIndex(start=0, stop=4, step=1)
pn.minor_axis
Out[79]: RangeIndex(start=0, stop=3, step=1)

行和列索引：

pn.major_xs(1)
Out[80]: 
      item1     item2
0  0.884334  0.708648
1  1.914626  0.699697
2 -2.061604       NaN

pn.minor_xs(1)
Out[81]: 
      item1     item2
0  1.265258 -1.187154
1  1.914626  0.699697
2  1.014718 -0.063514
3 -1.768514 -1.529910

转化为DataFrame二维度数据：

pn.to_frame()
Out[82]: 
                item1     item2
major minor                    
0     0     -2.224034 -1.829447
      1      1.265258 -1.187154
1     0      0.884334  0.708648
      1      1.914626  0.699697
2     0     -0.055456  1.699782
      1      1.014718 -0.063514
3     0      0.371961 -0.235694
      1     -1.768514 -1.529910

GSmate

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
pandas核心数据结构

pandas序列的创建和特性：import numpy as npimport pandas as pd"Series"# 序列（array创建）s = pd.Series(np.random.randn(5), index=['a', 'b', 'c', 'd', 'e'])print(s)# 字典表创建d = {'a': 0, 'b': 2, 'd': 5}s = pd....
复制链接

扫一扫