1. 数据查看
.head()
查看头部数据
.tail()
查看尾部数据
默认查看5条
df = pd.DataFrame(np.random.rand(16).reshape(8,2)*100,
columns = ['a','b'])
print(df.head(2))
print(df.tail())
–> 输出的结果为:
a b
0 80.800250 97.333282
1 91.433429 81.323805
a b
3 3.655392 81.143852
4 70.394713 52.598872
5 62.170747 73.813017
6 40.934632 7.242002
7 75.889400 84.418156
2. 数据转置
print(df.T)
–> 输出的结果为:
0 1 2 ... 5 6 7
a 80.800250 91.433429 5.563492 ... 62.170747 40.934632 75.889400
b 97.333282 81.323805 10.411445 ... 73.813017 7.242002 84.418156
[2 rows x 8 columns]
3. 添加与修改
3.1新增列/行并赋值
df = pd.DataFrame(np.random.rand(16).reshape(4,4)*100,
columns = ['a','b','c','d'])
print(df)
df['e'] = 10
df.loc[4] = 20
print(df)
–> 输出的结果为:
a b c d
0 14.552288 22.852489 50.584815 31.153962
1 91.475232 27.827945 98.790335 74.487188
2 94.963093 5.227859 33.461076 71.792757
3 52.321047 77.474292 0.497665 7.623358
a b c d e
0 14.552288 22.852489 50.584815 31.153962 10
1 91.475232 27.827945 98.790335 74.487188 10
2 94.963093 5.227859 33.461076 71.792757 10
3 52.321047 77.474292 0.497665 7.623358 10
4 20.000000 20.000000 20.000000 20.000000 20
3.2 索引后直接修改值
df['e'] = 20
df[['a','c']] = 100
print(df)
–> 输出的结果为:
a b c d e
0 100 22.852489 100 31.153962 20
1 100 27.827945 100 74.487188 20
2 100 5.227859 100 71.792757 20
3 100 77.474292 100 7.623358 20
4 100 20.000000 100 20.000000 20
4. 数据删除
4.1 del语句 - 删除列
df = pd.DataFrame(np.random.rand(16).reshape(4,4)*100,
columns = ['a','b','c','d'])
print(df)
del df['a']
print(df)
–> 输出的结果为:
a b c d
0 30.469916 41.632874 5.182408 21.456072
1 22.080842 96.395829 17.761205 54.596288
2 89.695677 32.556029 36.625757 22.049501
3 43.686114 96.212541 7.441507 80.726133
b c d
0 41.632874 5.182408 21.456072
1 96.395829 17.761205 54.596288
2 32.556029 36.625757 22.049501
3 96.212541 7.441507 80.726133
4.2 drop()删除行
默认参数 inplace=False
→ 删除后生成新的数据,不改变原数据
print(df.drop(0))
print(df.drop([1,2]))
print(df)
–> 输出的结果为:
b c d
1 96.395829 17.761205 54.596288
2 32.556029 36.625757 22.049501
3 96.212541 7.441507 80.726133
b c d
0 41.632874 5.182408 21.456072
3 96.212541 7.441507 80.726133
b c d
0 41.632874 5.182408 21.456072
1 96.395829 17.761205 54.596288
2 32.556029 36.625757 22.049501
3 96.212541 7.441507 80.726133
4.3 drop()删除列
需要加上axis = 1,inplace=False
→ 删除后生成新的数据,不改变原数据
print(df.drop(['d'], axis = 1))
print(df)
–> 输出的结果为:
b c
0 41.632874 5.182408
1 96.395829 17.761205
2 32.556029 36.625757
3 96.212541 7.441507
b c d
0 41.632874 5.182408 21.456072
1 96.395829 17.761205 54.596288
2 32.556029 36.625757 22.049501
3 96.212541 7.441507 80.726133
5. 数据对齐
DataFrame对象之间的数据自动按照列和索引(行标签)对齐
df1 = pd.DataFrame(np.random.randn(10, 4), columns=['A', 'B', 'C', 'D'])
df2 = pd.DataFrame(np.random.randn(7, 3), columns=['A', 'B', 'C'])
print(df1 + df2)
–> 输出的结果为:
A B C D
0 1.951824 0.119430 -1.982792 NaN
1 -1.063710 0.682838 -0.484747 NaN
2 0.543521 3.587741 0.121565 NaN
3 0.501066 1.992348 0.569522 NaN
4 2.074808 -0.544962 0.403096 NaN
5 -0.565621 -0.232803 -0.830447 NaN
6 -1.384398 -0.675027 -0.314824 NaN
7 NaN NaN NaN NaN
8 NaN NaN NaN NaN
9 NaN NaN NaN NaN
6. 数据排序
6.1 按值排序 .sort_values
除了对dataframe有效外,对series数据也有效
ascending = True
升序
ascending = False
降序
6.1.1 单列排序
df1 = pd.DataFrame(np.random.rand(16).reshape(4,4)*100,
columns = ['a','b','c','d'])
print(df1)
print(df1.sort_values(['a'], ascending = True)) # 升序
print(df1.sort_values(['a'], ascending = False)) # 降序
–> 输出的结果为:
a b c d
0 9.819919 71.007572 22.839585 63.658534
1 10.029993 54.830601 46.236912 19.465751
2 1.837689 98.963422 64.585373 29.611975
3 16.754768 50.427218 14.561929 6.969858
a b c d
2 1.837689 98.963422 64.585373 29.611975
0 9.819919 71.007572 22.839585 63.658534
1 10.029993 54.830601 46.236912 19.465751
3 16.754768 50.427218 14.561929 6.969858
a b c d
3 16.754768 50.427218 14.561929 6.969858
1 10.029993 54.830601 46.236912 19.465751
0 9.819919 71.007572 22.839585 63.658534
2 1.837689 98.963422 64.585373 29.611975
6.1.2 多列排序
df2 = pd.DataFrame({'a':[1,1,1,1,2,2,2,2],
'b':list(range(8)),
'c':list(range(8,0,-1))})
print(df2)
print(df2.sort_values(['a','c']))
–> 输出的结果为:(先按照a进行排序,然后在按照c排序)
a b c
0 1 0 8
1 1 1 7
2 1 2 6
3 1 3 5
4 2 4 4
5 2 5 3
6 2 6 2
7 2 7 1
a b c
3 1 3 5
2 1 2 6
1 1 1 7
0 1 0 8
7 2 7 1
6 2 6 2
5 2 5 3
4 2 4 4
6.2 索引排序 .sort_index
除了对dataframe有效外,对series数据也有效
ascending = True
升序
ascending = False
降序
df1 = pd.DataFrame(np.random.rand(16).reshape(4,4)*100,
index = [5,4,3,2],
columns = ['a','b','c','d'])
df2 = pd.DataFrame(np.random.rand(16).reshape(4,4)*100,
index = ['h','s','x','g'],
columns = ['a','b','c','d'])
print(df1)
print(df1.sort_index())
print(df2)
print(df2.sort_index())
–> 输出的结果为:(不仅对数字有效,对字符也有效)
a b c d
5 98.079322 81.223109 39.534693 39.763032
4 42.068402 83.658613 14.678341 97.784928
3 10.901214 15.797918 26.516650 37.804133
2 8.326599 2.813564 41.619509 74.280190
a b c d
2 8.326599 2.813564 41.619509 74.280190
3 10.901214 15.797918 26.516650 37.804133
4 42.068402 83.658613 14.678341 97.784928
5 98.079322 81.223109 39.534693 39.763032
a b c d
h 8.519638 39.267385 89.480081 25.455433
s 81.948385 2.519190 6.892622 43.315483
x 16.037407 56.810954 20.749150 19.843433
g 96.832274 77.508434 96.155294 33.028485
a b c d
g 96.832274 77.508434 96.155294 33.028485
h 8.519638 39.267385 89.480081 25.455433
s 81.948385 2.519190 6.892622 43.315483
x 16.037407 56.810954 20.749150 19.843433