13. Pandas的DataFrame列操作
本章主要研究一下DataFrame数据结构如何修改、增删等操作。
13.1 rename修改列名字
对一个dataframe的数据使用rename函数后返回新的dataframe,不影响原dataframe。
import pandas as pd
import numpy as np
val = np.arange(10, 60).reshape(10, 5)
col = ["ax", "bx", "cx", "dx", "ex"]
idx = list("abcdefghij")
df1 = pd.DataFrame(val, columns = col, index = idx)
print df1
print "*" * 21, "
df2 = df1.rename(columns = {"ax" : "close", "bx" : "open"})
print df2
print "*" * 21, "
程序执行结果:
ax bx cx dx ex
a 10 11 12 13 14
b 15 16 17 18 19
c 20 21 22 23 24
d 25 26 27 28 29
e 30 31 32 33 34
f 35 36 37 38 39
g 40 41 42 43 44
h 45 46 47 48 49
i 50 51 52 53 54
j 55 56 57 58 59
*********************
close open cx dx ex
a 10 11 12 13 14
b 15 16 17 18 19
c 20 21 22 23 24
d 25 26 27 28 29
e 30 31 32 33 34
f 35 36 37 38 39
g 40 41 42 43 44
h 45 46 47 48 49
i 50 51 52 53 54
j 55 56 57 58 59
*********************
如果想直接影响本dataframe,可以使用参数inplace设置为True。
import pandas as pd
import numpy as np
val = np.arange(10, 60).reshape(10, 5)
col = ["ax", "bx", "cx", "dx", "ex"]
idx = list("abcdefghij")
df1 = pd.DataFrame(val, columns = col, index = idx)
print df1
print "*" * 21, "
df1.rename(columns = {"ax" : "close", "bx" : "open"}, inplace = True)
print df1
print "*" * 21, "
程序的执行结果:
ax bx cx dx ex
a 10 11 12 13 14
b 15 16 17 18 19
c 20 21 22 23 24
d 25 26 27 28 29
e 30 31 32 33 34
f 35 36 37 38 39
g 40 41 42 43 44
h 45 46 47 48 49
i 50 51 52 53 54
j 55 56 57 58 59
*********************
close open cx dx ex
a 10 11 12 13 14
b 15 16 17 18 19
c 20 21 22 23 24
d 25 26 27 28 29
e 30 31 32 33 34
f 35 36 37 38 39
g 40 41 42 43 44
h 45 46 47 48 49
i 50 51 52 53 54
j 55 56 57 58 59
*********************
13.2 增加一列
在pandas里对dataframe数据的增加可以通过[]或者insert函数等方法来实现。
[]方式将新的series添加在原dataframe的尾部。
import pandas as pd
import numpy as np
val = np.arange(10, 60).reshape(10, 5)
col = ["ax", "bx", "cx", "dx", "ex"]
idx = list("abcdefghij")
df1 = pd.DataFrame(val, columns = col, index = idx)
print df1
print "*" * 21, "
nval = val = np.arange(100, 110).reshape(10, 1)
df1["fx"] = nval
print df1
程序的执行结果:
ax bx cx dx ex
a 10 11 12 13 14
b 15 16 17 18 19
c 20 21 22 23 24
d 25 26 27 28 29
e 30 31 32 33 34
f 35 36 37 38 39
g 40 41 42 43 44
h 45 46 47 48 49
i 50 51 52 53 54
j 55 56 57 58 59
*********************
ax bx cx dx ex fx
a 10 11 12 13 14 100
b 15 16 17 18 19 101
c 20 21 22 23 24 102
d 25 26 27 28 29 103
e 30 31 32 33 34 104
f 35 36 37 38 39 105
g 40 41 42 43 44 106
h 45 46 47 48 49 107
i 50 51 52 53 54 108
j 55 56 57 58 59 109
而insert函数可将插入的series放在指定位置。
import pandas as pd
import numpy as np
val = np.arange(10, 60).reshape(10, 5)
col = ["ax", "bx", "cx", "dx", "ex"]
idx = list("abcdefghij")
df1 = pd.DataFrame(val, columns = col, index = idx)
print df1
print "*" * 21, "
nval = val = np.arange(100, 110).reshape(10, 1)
df1["fx"] = nval
print df1
print "*" * 21, "
df1.insert(1, "gx", nval)
print df1
print "*" * 21, "
程序的执行结果:
ax bx cx dx ex
a 10 11 12 13 14
b 15 16 17 18 19
c 20 21 22 23 24
d 25 26 27 28 29
e 30 31 32 33 34
f 35 36 37 38 39
g 40 41 42 43 44
h 45 46 47 48 49
i 50 51 52 53 54
j 55 56 57 58 59
*********************
ax bx cx dx ex fx
a 10 11 12 13 14 100
b 15 16 17 18 19 101
c 20 21 22 23 24 102
d 25 26 27 28 29 103
e 30 31 32 33 34 104
f 35 36 37 38 39 105
g 40 41 42 43 44 106
h 45 46 47 48 49 107
i 50 51 52 53 54 108
j 55 56 57 58 59 109
*********************
ax gx bx cx dx ex fx
a 10 100 11 12 13 14 100
b 15 101 16 17 18 19 101
c 20 102 21 22 23 24 102
d 25 103 26 27 28 29 103
e 30 104 31 32 33 34 104
f 35 105 36 37 38 39 105
g 40 106 41 42 43 44 106
h 45 107 46 47 48 49 107
i 50 108 51 52 53 54 108
j 55 109 56 57 58 59 109
*********************
loc[]来添加新的数据列。
import pandas as pd
import numpy as np
val = np.arange(10, 60).reshape(10, 5)
col = ["ax", "bx", "cx", "dx", "ex"]
idx = list("abcdefghij")
df1 = pd.DataFrame(val, columns = col, index = idx)
print df1
print "*" * 21, "
nval = val = np.arange(100, 110).reshape(10, 1)
df1.loc[:, "ix"] = nval
print df1
print "*" * 21, "
程序的执行结果:
ax bx cx dx ex
a 10 11 12 13 14
b 15 16 17 18 19
c 20 21 22 23 24
d 25 26 27 28 29
e 30 31 32 33 34
f 35 36 37 38 39
g 40 41 42 43 44
h 45 46 47 48 49
i 50 51 52 53 54
j 55 56 57 58 59
*********************
ax bx cx dx ex ix
a 10 11 12 13 14 100
b 15 16 17 18 19 101
c 20 21 22 23 24 102
d 25 26 27 28 29 103
e 30 31 32 33 34 104
f 35 36 37 38 39 105
g 40 41 42 43 44 106
h 45 46 47 48 49 107
i 50 51 52 53 54 108
j 55 56 57 58 59 109
13.3 concat多列连接
pandas有个concat函数可以连接多个dataframe数据组成一个更大的dataframe数据。
import pandas as pd
import numpy as np
val1 = np.arange(10, 40).reshape(10, 3)
val2 = np.arange(50, 80).reshape(10, 3)
col1 = ["ax", "bx", "cx"]
col2 = ["cx", "dx", "ex"]
idx = list("abcdefghij")
df1 = pd.DataFrame(val1, columns = col1, index = idx)
df2 = pd.DataFrame(val2, columns = col2, index = idx)
print df1
print "*" * 21, "
print df2
print "*" * 21, "
df3 = pd.concat([df1, df2[5:], df1[:5],df2], axis = 1)
print df3
程序执行结果:
********************
ax bx cx
a 10 11 12
b 13 14 15
c 16 17 18
d 19 20 21
e 22 23 24
f 25 26 27
g 28 29 30
h 31 32 33
i 34 35 36
j 37 38 39
*********************
cx dx ex
a 50 51 52
b 53 54 55
c 56 57 58
d 59 60 61
e 62 63 64
f 65 66 67
g 68 69 70
h 71 72 73
i 74 75 76
j 77 78 79
*********************
ax bx cx cx dx ex ax bx cx cx dx ex
a 10 11 12 NaN NaN NaN 10 11 12 50 51 52
b 13 14 15 NaN NaN NaN 13 14 15 53 54 55
c 16 17 18 NaN NaN NaN 16 17 18 56 57 58
d 19 20 21 NaN NaN NaN 19 20 21 59 60 61
e 22 23 24 NaN NaN NaN 22 23 24 62 63 64
f 25 26 27 65 66 67 NaN NaN NaN 65 66 67
g 28 29 30 68 69 70 NaN NaN NaN 68 69 70
h 31 32 33 71 72 73 NaN NaN NaN 71 72 73
i 34 35 36 74 75 76 NaN NaN NaN 74 75 76
j 37 38 39 77 78 79 NaN NaN NaN 77 78 79
从结果可以看出,连接的两个dataframe结构不同,即有的dataframe没有相应的行,那么数据行上无数据用NaN填充。
13.4 列的内容替换
可以通过赋值的方式更换列的数值。
import pandas as pd
import numpy as np
val1 = np.arange(10, 40).reshape(10, 3)
val2 = np.arange(50, 80).reshape(10, 3)
col1 = ["ax", "bx", "cx"]
col2 = ["cx", "dx", "ex"]
idx = list("abcdefghij")
df1 = pd.DataFrame(val1, columns = col1, index = idx)
df2 = pd.DataFrame(val2, columns = col2, index = idx)
print df1[:3]
print "*" * 21, "
print df2[:3]
print "*" * 21, "
df1.cx = df2.cx
print df1[:3]
这里df1里的cx列被换成了df2里的cx内容。
ax bx cx
a 10 11 12
b 13 14 15
c 16 17 18
*********************
cx dx ex
a 50 51 52
b 53 54 55
c 56 57 58
*********************
ax bx cx
a 10 11 50
b 13 14 53
c 16 17 56
13.5 删除列
删除dataframe的列可以用del()、dataframe的pop函数、drop函数。del函数直接影响原dataframe,pop函数返回被删除的数据即某列,其结果是一个Series,而drop可以指定多列删除。
import pandas as pd
import numpy as np
val1 = np.arange(10, 40).reshape(10, 3)
val2 = np.arange(50, 80).reshape(10, 3)
col1 = ["ax", "bx", "cx"]
col2 = ["cx", "dx", "ex"]
idx = list("abcdefghij")
df1 = pd.DataFrame(val1, columns = col1, index = idx)
df2 = pd.DataFrame(val2, columns = col2, index = idx)
print "*" * 21
print df1[:3]
print "*" * 21
print df2[:3]
del df1["cx"]
print "*" * 21
print df1[:3]
df3 = df2.pop("cx")
print "+" * 21
print df2[:3]
print "-" * 21
print df3[:3]
print "/" * 21
df1 = pd.DataFrame(val1, columns = col1, index = idx)
df4 = df1.drop(["ax", "cx"], axis = 1)
print df1[:3]
print df4[:3]
程序执行结果如下:
ax bx cx
a 10 11 12
b 13 14 15
c 16 17 18
*********************
cx dx ex
a 50 51 52
b 53 54 55
c 56 57 58
*********************
ax bx
a 10 11
b 13 14
c 16 17
+++++++++++++++++++++
dx ex
a 51 52
b 54 55
c 57 58
---------------------
a 50
b 53
c 56
Name: cx, dtype: int64
/
ax bx cx
a 10 11 12
b 13 14 15
c 16 17 18
bx
a 11
b 14
c 17