DataFrame取差集、去重、拼接、shuffle

演示数据df_nba:

+-------------+-----------+---------+--------+--------+
|     name    |    team   | poision | height | weight |
+-------------+-----------+---------+--------+--------+
|    jordan   |   Bulls   |    SG   |  198   |  98.0  |
|     kobe    |   Lakers  |    SG   |  198   |  96.0  |
|    Oneil    |   Lakers  |    C    |  216   | 147.0  |
|   McGradyg  |  Rockets  |    SF   |  203   | 101.0  |
|    jordan   |   Bulls   |    SG   |  198   |  98.0  |
|     kobe    |   Lakers  |    SG   |  198   |  96.0  |
|  Larry_Bird |   Boston  |    PF   |  206   | 100.0  |
|   Iverson   |    76s    |    PG   |  183   |  75.0  |
|     kobe    |   Lakers  |    SG   |  198   |  96.0  |
|    jordan   |   Bulls   |    SG   |  198   |  98.0  |
|   Olajuwon  |  Rockets  |    C    |  208   | 116.0  |
+-------------+-----------+---------+--------+--------+

df_res:

+-------------+-----------+---------+--------+--------+
|     name    |    team   | poision | height | weight |
+-------------+-----------+---------+--------+--------+
|    James    | Cavaliers |    PF   |  206   | 113.0  |
|    Jabbar   | Cavaliers |    PF   |  228   | 120.0  |
| Chamberlain |    76s    |    PF   |  216   | 125.0  |
|   Russell   |   Boston  |    PF   |  206   |  98.0  |
+-------------+-----------+---------+--------+--------+

调整某列数据类型

df['col_name'] = df['col_name'].astype('int')  | 将df的col_name列的数据类型改为int!

两个DataFrame对象取差集

 res = df[~(df['a'].isin(a_to_drop))]

数据去重

 

  • df_nba.drop_duplicates(subset=['name','team'], keep='last', inplace=False)
+------------+---------+---------+--------+--------+
|    name    |   team  | poision | height | weight |
+------------+---------+---------+--------+--------+
|   Oneil    |  Lakers |    C    |  216   | 147.0  |
|  McGradyg  | Rockets |    SF   |  203   | 101.0  |
| Larry_Bird |  Boston |    PF   |  206   | 100.0  |
|  Iverson   |   76s   |    PG   |  183   |  75.0  |
|    kobe    |  Lakers |    SG   |  198   |  96.0  |
|   jordan   |  Bulls  |    SG   |  198   |  98.0  |
|  Olajuwon  | Rockets |    C    |  208   | 116.0  |
+------------+---------+---------+--------+--------+

数据拼接   

  • data = pd.concat([df_nba, df_res], axis = 0)
+-------------+-----------+---------+--------+--------+
|     name    |    team   | poision | height | weight |
+-------------+-----------+---------+--------+--------+
|    jordan   |   Bulls   |    SG   |  198   |  98.0  |
|     kobe    |   Lakers  |    SG   |  198   |  96.0  |
|    Oneil    |   Lakers  |    C    |  216   | 147.0  |
|   McGradyg  |  Rockets  |    SF   |  203   | 101.0  |
|    jordan   |   Bulls   |    SG   |  198   |  98.0  |
|     kobe    |   Lakers  |    SG   |  198   |  96.0  |
|  Larry_Bird |   Boston  |    PF   |  206   | 100.0  |
|   Iverson   |    76s    |    PG   |  183   |  75.0  |
|     kobe    |   Lakers  |    SG   |  198   |  96.0  |
|    jordan   |   Bulls   |    SG   |  198   |  98.0  |
|   Olajuwon  |  Rockets  |    C    |  208   | 116.0  |
|    James    | Cavaliers |    PF   |  206   | 113.0  |
|    Jabbar   | Cavaliers |    PF   |  228   | 120.0  |
| Chamberlain |    76s    |    PF   |  216   | 125.0  |
|   Russell   |   Boston  |    PF   |  206   |  98.0  |
+-------------+-----------+---------+--------+--------+

打乱数据

  • data = df_nba.sample(frac=1)
+------------+---------+---------+--------+--------+
|    name    |   team  | poision | height | weight |
+------------+---------+---------+--------+--------+
|   jordan   |  Bulls  |    SG   |  198   |  98.0  |
|   jordan   |  Bulls  |    SG   |  198   |  98.0  |
|    kobe    |  Lakers |    SG   |  198   |  96.0  |
|  McGradyg  | Rockets |    SF   |  203   | 101.0  |
|   jordan   |  Bulls  |    SG   |  198   |  98.0  |
|    kobe    |  Lakers |    SG   |  198   |  96.0  |
| Larry_Bird |  Boston |    PF   |  206   | 100.0  |
|  Iverson   |   76s   |    PG   |  183   |  75.0  |
|  Olajuwon  | Rockets |    C    |  208   | 116.0  |
|    kobe    |  Lakers |    SG   |  198   |  96.0  |
|   Oneil    |  Lakers |    C    |  216   | 147.0  |
+------------+---------+---------+--------+--------+

 

  • 2
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 打赏
    打赏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

ReLuJie

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值