数据源结构如下:
{
"_id": ObjectId("5d134888480fb40d91b36814"),
"labeler_label_result": {
"boxes": [{
"label_id": "test",
"area": 2785.185185185186
}, {
"label_id": "test",
"area": 9449.382716049386
}, {
"label_id": "ddddd-d",
"area": 14988.88888888889
}, {
"label_id": "test",
"area": 8444.444444444445
}, {
"label_id": "ddddd-d",
"area": 3887.6543209876545
}]
},
"checker_label_result": {
{
"label_id": "test",
"area": 9449.382716049386
}, {
"label_id": "ddddd-d",
"area": 14988.88888888889
}, {
"label_id": "test",
"area": 8444.444444444445
},
},
}
1. 统计要求:需要以label_id为组,统计每种label_id出现的次数,和每一种label_id对用的area的最大值、最小值、平均值。注意,假如存在checker_label_result(即该字段不为null),则优先使用它作为统计指标。
下面是语句实现:
db.t_tasks.aggregate([
{
$match: { // 筛选label id
"$or":[
{
"labeler_label_result.boxes.label_id": {$in:["test", "ddddd-d"]}
},
{
"checker_label_result.boxes.label_id": {$in:["test", "ddddd-d"]}
},
],
}
},
{
$project:{ // 判断checker_label_result是否为null,为空则使用labeler_label_result
"_id":0,
"label_result" : {$ifNull: ['$checker_label_result', '$labeler_label_result'] },
}
},
{
$unwind:{ // 以label_result.boxes数组为基准,分割成若干行
path:"$label_result.boxes"
},
},
{
$group:{ // 开始group
"_id": "$label_result.boxes.label_id",
"label_area_avg": {$avg:"$label_result.boxes.area"},
"label_area_max": {$max:"$label_result.boxes.area"},
"label_area_min": {$min:"$label_result.boxes.area"},
"label_num":{"$sum":1}
}
},
{
$project:{ //重命名字段
"_id":0,
"label_id":"$_id",
"label_area_avg":1,
"label_area_max":1,
"label_area_min":1,
"label_num":1,
}
},
])
最终效果:
{ "label_area_avg" : 9438.271604938273, "label_area_max" : 14988.88888888889, "label_area_min" : 3887.6543209876545, "label_num" : 3, "label_id" : "ddddd-d" }
{ "label_area_avg" : 6893.004115226339, "label_area_max" : 9449.382716049386, "label_area_min" : 2785.185185185186, "label_num" : 5, "label_id" : "test" }
2. 统计要求:需要以label_id为组,统计每种label_id出现的记录条数。注意,假如存在checker_label_result(即该字段不为null),则优先使用它作为统计指标。
db.t_tasks.aggregate([
{
"$match": {
"status": 6,
"task_set_id": "5d134888480fb40d91b36813"
}
},
{
"$project": {
"label_result": {
"$ifNull": ["$checker_label_result", "$labeler_label_result"]
}
}
},
{
$unwind:{
path:"$label_result.boxes"
},
},
{
$group: { //以_id与label id作为KEY,去重复
"_id": {
"id": "$_id",
"label_id": "$label_result.boxes.label_id"
},
"label_id": { $first: "$label_result.boxes.label_id" },
}
},
{
$group: {
"_id": "$label_id",
"sample_num": { $sum: 1 },
}
}
])
最终效果:
{ "_id" : "test", "sample_num" : 1 }
{ "_id" : "ddddd-d", "sample_num" : 1 }