1 pandas中的数据运算与算术对齐
- pandas可以对不同索引的对象进行算术运算。在将对象相加时,如果存在不同的索引|对,则结果的索引就是该索引对的并集。在对不同索引的对象进
行算术运算时,当一个对象中某个轴标签在另一个对象中找不到时,会自动填充NaN,也可自己填充一个特殊值(比如0)
from pandas import Series,DataFrame
import pandas as pd
import numpy as np
from numpy import nan
df1 = DataFrame(np.arange(12).reshape((3,4)),columns=list("abcd"))
df2 = DataFrame(np.arange(20).reshape((4,5)),columns=list("abcde"))
df1
.dataframe tbody tr th:only-of-type { vertical-align: middle; } .dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; }
|
a |
b |
c |
d |
0 |
0 |
1 |
2 |
3 |
1 |
4 |
5 |
6 |
7 |
2 |
8 |
9 |
10 |
11 |
df2
.dataframe tbody tr th:only-of-type { vertical-align: middle; } .dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; }
|
a |
b |
c |
d |
e |
0 |
0 |
1 |
2 |
3 |
4 |
1 |
5 |
6 |
7 |
8 |
9 |
2 |
10 |
11 |
12 |
13 |
14 |
3 |
15 |
16 |
17 |
18 |
19 |
df1.add(df2,fill_value=0)
.dataframe tbody tr th:only-of-type { vertical-align: middle; } .dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; }
|
a |
b |
c |
d |
e |
0 |
0.0 |
2.0 |
4.0 |
6.0 |
4.0 |
1 |
9.0 |
11.0 |
13.0 |
15.0 |
9.0 |
2 |
18.0 |
20.0 |
22.0 |
24.0 |
14.0 |
3 |
15.0 |
16.0 |
17.0 |
18.0 |
19.0 |
df1.add(df2).fillna(0)
.dataframe tbody tr th:only-of-type { vertical-align: middle; } .dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; }
|
a |
b |
c |
d |
e |
0 |
0.0 |
2.0 |
4.0 |
6.0 |
0.0 |
1 |
9.0 |
11.0 |
13.0 |
15.0 |
0.0 |
2 |
18.0 |
20.0 |
22.0 |
24.0 |
0.0 |
3 |
0.0 |
0.0 |
0.0 |
0.0 |
0.0 |
'''
注意:df1.add(df2),
df1.add(df2,fill_value=0),
df1.add(df2).fillna(0)
本质上不同
'''
2 iloc与loc的切片与索引
- loc,基于label的索引
- iloc,完全基于位置的索引
frame = DataFrame(np.arange(12).reshape((4,3)),
columns=list("bde"),
index=["Utah","Ohio","Texas","Oregon"])
frame
.dataframe tbody tr th:only-of-type { vertical-align: middle; } .dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; }
|
b |
d |
e |
Utah |
0 |
1 |
2 |
Ohio |
3 |
4 |
5 |
Texas |
6 |
7 |
8 |
Oregon |
9 |
10 |
11 |
frame.iloc[1]
b 3
d 4
e 5
Name: Ohio, dtype: int32
frame.index
Index(['Utah', 'Ohio', 'Texas', 'Oregon'], dtype='object')
frame.loc["Oregon"]
b 9
d 10
e 11
Name: Oregon, dtype: int32
frame
.dataframe tbody tr th:only-of-type { vertical-align: middle; } .dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; }
|
b |
d |
e |
Utah |
0 |
1 |
2 |
Ohio |
3 |
4 |
5 |
Texas |
6 |
7 |
8 |
Oregon |
9 |
10 |
11 |
series = frame.iloc[0]
series
b 0
d 1
e 2
Name: Utah, dtype: int32
3 DataFrame与Series之间的运算
- 默认情况下,Dataframe和 Series之间的算术运算会将Series的索引匹配到Dataframe的列,然后沿着行一直向下广播
frame - series
.dataframe tbody tr th:only-of-type { vertical-align: middle; } .dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; }
|
b |
d |
e |
Utah |
0 |
0 |
0 |
Ohio |
3 |
3 |
3 |
Texas |
6 |
6 |
6 |