mongoDB 多字段条件分组匹配删除多余数据，并只保留最大值

最新推荐文章于 2024-04-19 16:08:04 发布

少威

最新推荐文章于 2024-04-19 16:08:04 发布

阅读量1.2k

点赞数 2

分类专栏： mongoDB笔记文章标签： mongodb mysql 数据库

本文链接：https://blog.csdn.net/jew11111/article/details/106214528

版权

mongoDB笔记专栏收录该内容

1 篇文章 0 订阅

订阅专栏

为了以后自己不会忘记这个执行脚本是什么逻辑，决定写个帖子记录一下。
参考的帖子为：https://blog.csdn.net/haoyuexihuai/article/details/100084075
在此感谢以上帖子作者大佬的分享~~

db.getCollection('ts_data_i').aggregate(
[
//$match 即为筛选条件
//相当于 where time like '%2020-02-19%' and tagCode = 'v0406'
{$match :{
          time:/^2020-02-19/,
          "tagCode" : "v0496"
          }
},

//$group 即为分组条件
//相当于 select time,tagCode,deviceId,max(value) from 表名 group by time,tagCode,deviceId
//$group只会返回参与分组的字段，使用$addToSet在返回结果数组中增加_id字段
{$group : 
    {   _id : {
        'time': '$time',
        'tagCode': '$tagCode',
        'deviceId': '$deviceId'
        },
        value : {$max : "$value"},
        
        'uniqueIds': {
          '$addToSet': '$_id'
        }
    }
}], 
//mongodb查找重复数据过大时需要添加 allowDiskUse属性
//聚合的结果必须要限制在16M以内操作，（mongodb支持的最大影响信息的大小），否则必须放在磁盘中做缓存（allowDiskUse=True）
{
    allowDiskUse: true
}
//以上条件的查询结果转换成了doc 对象，在下面的函数调用是则需要使用doc 这个对象代替查询结果
//以下语法相当于  delete from 表名 where _id in (doc.uniqueIds) and value != doc.value
).forEach(function(doc){
    db.getCollection('ts_data_i').remove({
        _id: {
            $in: doc.uniqueIds
        },
        value:{$ne:doc.value}
    });
})

补充一下~：
以上语句执行后可能会残留数值相同的重复数据，如下

/* 1 */
{
    "_id" : ObjectId("5e4f5c47873c4c8"),
    "deviceId" : "SHKT_1",
    "tagCode" : "S0010",
    "time" : "2020-02-19 00:00:00",
    "type" : 7,
    "value" : 0.0
}

/* 2 */
{
    "_id" : ObjectId("5e4f5c47873c4c7"),
    "deviceId" : "SHKT_1",
    "tagCode" : "S0010",
    "time" : "2020-02-19 00:00:00",
    "type" : 7,
    "value" : 0.0
}

那就需要跑一下参考链接中的脚本了，按照我自己的业务需求则是以下脚本:

db.getCollection('ts_data_i').aggregate(
[
//这个$match 代表的是查询条件  即为where
{$match :{
        time:/^2020-02-19/
        ,deviceId:"SHKT_1"
        ,tagCode:/^S000/
        
    }
},

{$group : 
    {   _id : {
        'time': '$time',
        'tagCode': '$tagCode',
        'deviceId': '$deviceId'
        },
        
        'count': {
                '$sum': 1
            },
        'uniqueIds': {
          '$addToSet': '$_id'
        }
    }
},
//这个$match代表的是分组后的数据的筛选条件  相当于mysql中的having 
{
        '$match': {
            'count': {
                '$gt': 1
            }
        }
    }
], {
    allowDiskUse: true
}

).forEach(function(doc){
	//doc.uniqueIds.shift();表示从数组第一个值开始删除；作用是踢除重复数据其中一个_id，让后面的删除语句不会删除所有数据
    doc.uniqueIds.shift();
    db.getCollection('ts_data_i').remove({
        _id: {
            $in: doc.uniqueIds
        }
    });
})

少威

关注

2
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
mongoDB 多字段条件分组匹配删除多余数据，并只保留最大值

为了以后自己不会忘记这个执行脚本是什么逻辑，决定写个帖子记录一下。参考的帖子为：https://blog.csdn.net/haoyuexihuai/article/details/100084075在此感谢以上帖子作者大佬的分享~~db.getCollection('ts_data_i').aggregate([//$match 即为筛选条件//相当于 where time like '%2020-02-19%' and tagCode = 'v0406'{$match :{
复制链接

扫一扫