Python-科学计算-pandas-23-按列去重

系统:Windows 10
编辑器:JetBrains PyCharm Community Edition 2018.2.2 x64
pandas:1.1.5

  • 这个系列讲讲Python的科学计算及可视化
  • 今天讲讲pandas模块
  • df按某列进行去重

Part 1:场景描述

  1. 已知df1,包括6列,"time", "pos", "value1", "value2", "value3", "value4
  2. 有两个需求:根据pos列,去除重复记录;根据posvalue1列,去除重复记录,即要求这两列都相等时去重

df_1
请添加图片描述

Part 2:根据pos列去重

import pandas as pd

dict_1 = {"time": ["2019-11-02", "2019-11-03", "2019-11-04", "2019-11-05",
                   "2019-12-02", "2019-12-03", "2019-12-04", "2019-12-05"],
          "pos": ["A", "A", "C", "D", "E", "E", "G", "H"],
          "value1": [20, 20, 30, 40, 50, 60, 70, 80],
          "value2": [100, 200, 300, 400, 500, 600, 700, 800],
          "value3": [50, 20, 30, 90, 50, 60, 80, 80],
          "value4": ['21W12', '21W10', '21W01', '21W05', '21W06', '21W36', '21W21', '21W23']}

df_1 = pd.DataFrame(dict_1, columns=["time", "pos", "value1", "value2", "value3", "value4"])
print("\n", "df_1", "\n", df_1, "\n")

df_2 = df_1

df_2.drop_duplicates(subset=["pos"], keep="first", inplace=True)
print("\n", "df_2", "\n", df_2, "\n")

print("\n", "df_1", "\n", df_1, "\n")

代码截图

请添加图片描述

执行结果

请添加图片描述

Part 3:根据posvalue1列去重

import pandas as pd

dict_1 = {"time": ["2019-11-02", "2019-11-03", "2019-11-04", "2019-11-05",
                   "2019-12-02", "2019-12-03", "2019-12-04", "2019-12-05"],
          "pos": ["A", "A", "C", "D", "E", "E", "G", "H"],
          "value1": [20, 20, 30, 40, 50, 60, 70, 80],
          "value2": [100, 200, 300, 400, 500, 600, 700, 800],
          "value3": [50, 20, 30, 90, 50, 60, 80, 80],
          "value4": ['21W12', '21W10', '21W01', '21W05', '21W06', '21W36', '21W21', '21W23']}

df_1 = pd.DataFrame(dict_1, columns=["time", "pos", "value1", "value2", "value3", "value4"])
print("\n", "df_1", "\n", df_1, "\n")

df_3 = df_1

df_3.drop_duplicates(subset=["pos", "value1"], keep="first", inplace=True)
print("\n", "df_3", "\n", df_3, "\n")

print("\n", "df_1", "\n", df_1, "\n")

代码截图
请添加图片描述

执行结果

请添加图片描述

Part 4:部分代码解读

  1. df_2.drop_duplicates(subset=["pos"], keep="first", inplace=True)subset对应列表取值去重参考列,若列表元素大于1个,要求同时满足多列对应记录相同才能去重。keep="first"表示去重后,保留第1个记录
  2. df_2=df_1后对,df_2进行去重后,df_1同时发生了变化,表明两个变量对应的地址应该是同一区域

本文为原创作品,欢迎分享朋友圈

长按图片识别二维码,关注本公众号
Python 优雅 帅气
12x0.8.jpg

  • 2
    点赞
  • 7
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值