【python数据分析(9)】Pandas数据结构Dataframe:数据查看、转置 、添加/修改、删除值 / 对齐 / 排序

1. 数据查看

.head()查看头部数据

.tail()查看尾部数据

默认查看5条

df = pd.DataFrame(np.random.rand(16).reshape(8,2)*100,
                   columns = ['a','b'])
print(df.head(2))
print(df.tail())

–> 输出的结果为:

           a          b
0  80.800250  97.333282
1  91.433429  81.323805

           a          b
3   3.655392  81.143852
4  70.394713  52.598872
5  62.170747  73.813017
6  40.934632   7.242002
7  75.889400  84.418156

2. 数据转置

print(df.T)

–> 输出的结果为:

           0          1          2  ...          5          6          7
a  80.800250  91.433429   5.563492  ...  62.170747  40.934632  75.889400
b  97.333282  81.323805  10.411445  ...  73.813017   7.242002  84.418156

[2 rows x 8 columns]

3. 添加与修改

3.1新增列/行并赋值
df = pd.DataFrame(np.random.rand(16).reshape(4,4)*100,
                   columns = ['a','b','c','d'])
print(df)

df['e'] = 10
df.loc[4] = 20
print(df)

–> 输出的结果为:

           a          b          c          d
0  14.552288  22.852489  50.584815  31.153962
1  91.475232  27.827945  98.790335  74.487188
2  94.963093   5.227859  33.461076  71.792757
3  52.321047  77.474292   0.497665   7.623358

           a          b          c          d   e
0  14.552288  22.852489  50.584815  31.153962  10
1  91.475232  27.827945  98.790335  74.487188  10
2  94.963093   5.227859  33.461076  71.792757  10
3  52.321047  77.474292   0.497665   7.623358  10
4  20.000000  20.000000  20.000000  20.000000  20
3.2 索引后直接修改值
df['e'] = 20
df[['a','c']] = 100
print(df)

–> 输出的结果为:

     a          b    c          d   e
0  100  22.852489  100  31.153962  20
1  100  27.827945  100  74.487188  20
2  100   5.227859  100  71.792757  20
3  100  77.474292  100   7.623358  20
4  100  20.000000  100  20.000000  20

4. 数据删除

4.1 del语句 - 删除列
df = pd.DataFrame(np.random.rand(16).reshape(4,4)*100,
                   columns = ['a','b','c','d'])
print(df)

del df['a']
print(df)

–> 输出的结果为:

           a          b          c          d
0  30.469916  41.632874   5.182408  21.456072
1  22.080842  96.395829  17.761205  54.596288
2  89.695677  32.556029  36.625757  22.049501
3  43.686114  96.212541   7.441507  80.726133

           b          c          d
0  41.632874   5.182408  21.456072
1  96.395829  17.761205  54.596288
2  32.556029  36.625757  22.049501
3  96.212541   7.441507  80.726133
4.2 drop()删除行

默认参数 inplace=False → 删除后生成新的数据,不改变原数据

print(df.drop(0))
print(df.drop([1,2]))
print(df)

–> 输出的结果为:

           b          c          d
1  96.395829  17.761205  54.596288
2  32.556029  36.625757  22.049501
3  96.212541   7.441507  80.726133

           b         c          d
0  41.632874  5.182408  21.456072
3  96.212541  7.441507  80.726133

           b          c          d
0  41.632874   5.182408  21.456072
1  96.395829  17.761205  54.596288
2  32.556029  36.625757  22.049501
3  96.212541   7.441507  80.726133
4.3 drop()删除列

需要加上axis = 1,inplace=False → 删除后生成新的数据,不改变原数据

print(df.drop(['d'], axis = 1))
print(df)

–> 输出的结果为:

           b          c
0  41.632874   5.182408
1  96.395829  17.761205
2  32.556029  36.625757
3  96.212541   7.441507

           b          c          d
0  41.632874   5.182408  21.456072
1  96.395829  17.761205  54.596288
2  32.556029  36.625757  22.049501
3  96.212541   7.441507  80.726133

5. 数据对齐

DataFrame对象之间的数据自动按照列和索引(行标签)对齐

df1 = pd.DataFrame(np.random.randn(10, 4), columns=['A', 'B', 'C', 'D'])
df2 = pd.DataFrame(np.random.randn(7, 3), columns=['A', 'B', 'C'])
print(df1 + df2)

–> 输出的结果为:

          A         B         C   D
0  1.951824  0.119430 -1.982792 NaN
1 -1.063710  0.682838 -0.484747 NaN
2  0.543521  3.587741  0.121565 NaN
3  0.501066  1.992348  0.569522 NaN
4  2.074808 -0.544962  0.403096 NaN
5 -0.565621 -0.232803 -0.830447 NaN
6 -1.384398 -0.675027 -0.314824 NaN
7       NaN       NaN       NaN NaN
8       NaN       NaN       NaN NaN
9       NaN       NaN       NaN NaN

6. 数据排序

6.1 按值排序 .sort_values

除了对dataframe有效外,对series数据也有效

ascending = True 升序

ascending = False 降序

6.1.1 单列排序
df1 = pd.DataFrame(np.random.rand(16).reshape(4,4)*100,
                   columns = ['a','b','c','d'])
print(df1)
print(df1.sort_values(['a'], ascending = True))  # 升序
print(df1.sort_values(['a'], ascending = False))  # 降序

–> 输出的结果为:

           a          b          c          d
0   9.819919  71.007572  22.839585  63.658534
1  10.029993  54.830601  46.236912  19.465751
2   1.837689  98.963422  64.585373  29.611975
3  16.754768  50.427218  14.561929   6.969858

           a          b          c          d
2   1.837689  98.963422  64.585373  29.611975
0   9.819919  71.007572  22.839585  63.658534
1  10.029993  54.830601  46.236912  19.465751
3  16.754768  50.427218  14.561929   6.969858

           a          b          c          d
3  16.754768  50.427218  14.561929   6.969858
1  10.029993  54.830601  46.236912  19.465751
0   9.819919  71.007572  22.839585  63.658534
2   1.837689  98.963422  64.585373  29.611975
6.1.2 多列排序
df2 = pd.DataFrame({'a':[1,1,1,1,2,2,2,2],
                  'b':list(range(8)),
                  'c':list(range(8,0,-1))})
print(df2)
print(df2.sort_values(['a','c']))

–> 输出的结果为:(先按照a进行排序,然后在按照c排序)

   a  b  c
0  1  0  8
1  1  1  7
2  1  2  6
3  1  3  5
4  2  4  4
5  2  5  3
6  2  6  2
7  2  7  1

   a  b  c
3  1  3  5
2  1  2  6
1  1  1  7
0  1  0  8
7  2  7  1
6  2  6  2
5  2  5  3
4  2  4  4
6.2 索引排序 .sort_index

除了对dataframe有效外,对series数据也有效

ascending = True 升序

ascending = False 降序

df1 = pd.DataFrame(np.random.rand(16).reshape(4,4)*100,
                  index = [5,4,3,2],
                   columns = ['a','b','c','d'])
df2 = pd.DataFrame(np.random.rand(16).reshape(4,4)*100,
                  index = ['h','s','x','g'],
                   columns = ['a','b','c','d'])
print(df1)
print(df1.sort_index())
print(df2)
print(df2.sort_index())

–> 输出的结果为:(不仅对数字有效,对字符也有效)

           a          b          c          d
5  98.079322  81.223109  39.534693  39.763032
4  42.068402  83.658613  14.678341  97.784928
3  10.901214  15.797918  26.516650  37.804133
2   8.326599   2.813564  41.619509  74.280190

           a          b          c          d
2   8.326599   2.813564  41.619509  74.280190
3  10.901214  15.797918  26.516650  37.804133
4  42.068402  83.658613  14.678341  97.784928
5  98.079322  81.223109  39.534693  39.763032

           a          b          c          d
h   8.519638  39.267385  89.480081  25.455433
s  81.948385   2.519190   6.892622  43.315483
x  16.037407  56.810954  20.749150  19.843433
g  96.832274  77.508434  96.155294  33.028485

           a          b          c          d
g  96.832274  77.508434  96.155294  33.028485
h   8.519638  39.267385  89.480081  25.455433
s  81.948385   2.519190   6.892622  43.315483
x  16.037407  56.810954  20.749150  19.843433
  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 打赏
    打赏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

lys_828

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值