1.原始数据
2. 需求
3.思路
- 首先考虑需要拆分code,分成三个字段device_code(设备编码),kpi_code(指标code), shift(班次)
- 再根据以上三个字段分组,再每个设备,当天当班次的指标到一个字典里。
4.操作
db.datas.aggregate([
{
'$match': {'code': {'$in': ['dev_plc_LS_01_001~zb_r_cl_bc1', 'dev_plc_LS_01_001~zb_r_cl_bc2', 'dev_plc_LS_01_001~zb_r_bhgs_bc1',
'dev_plc_LS_01_001~zb_r_bhgs_bc2', 'dev_plc_LS_01_001~zb_r_sbjgsj_bc1', 'dev_plc_LS_01_001~zb_r_sbjgsj_bc2',
'dev_plc_LS_01_001~zb_r_sbsjsj_bc1', 'dev_plc_LS_01_001~zb_r_sbsjsj_bc2', 'dev_plc_LS_01_001~zb_r_sbsbsj_bc1',
'dev_plc_LS_01_001~zb_r_sbsbsj_bc2', 'dev_plc_LS_01_001~zb_r_sbrlsj_bc1', 'dev_plc_LS_01_001~zb_r_sbrlsj_bc2',
'dev_plc_LS_01_001~zb_r_sbyxsj_bc1', 'dev_plc_LS_01_001~zb_r_sbyxsj_bc2', 'dev_plc_LS_01_002~zb_r_cl_bc1',
'dev_plc_LS_01_002~zb_r_cl_bc2', 'dev_plc_LS_01_002~zb_r_bhgs_bc1', 'dev_plc_LS_01_002~zb_r_bhgs_bc2',
'dev_plc_LS_01_002~zb_r_sbjgsj_bc1', 'dev_plc_LS_01_002~zb_r_sbjgsj_bc2', 'dev_plc_LS_01_002~zb_r_sbsjsj_bc1',
'dev_plc_LS_01_002~zb_r_sbsjsj_bc2', 'dev_plc_LS_01_002~zb_r_sbsbsj_bc1', 'dev_plc_LS_01_002~zb_r_sbsbsj_bc2',
'dev_plc_LS_01_002~zb_r_sbrlsj_bc1', 'dev_plc_LS_01_002~zb_r_sbrlsj_bc2', 'dev_plc_LS_01_002~zb_r_sbyxsj_bc1',
'dev_plc_LS_01_002~zb_r_sbyxsj_bc2', 'dev_plc_LS_01_003~zb_r_cl_bc1', 'dev_plc_LS_01_003~zb_r_cl_bc2',
'dev_plc_LS_01_003~zb_r_bhgs_bc1', 'dev_plc_LS_01_003~zb_r_bhgs_bc2', 'dev_plc_LS_01_003~zb_r_sbjgsj_bc1',
'dev_plc_LS_01_003~zb_r_sbjgsj_bc2', 'dev_plc_LS_01_003~zb_r_sbsjsj_bc1', 'dev_plc_LS_01_003~zb_r_sbsjsj_bc2',
'dev_plc_LS_01_003~zb_r_sbsbsj_bc1', 'dev_plc_LS_01_003~zb_r_sbsbsj_bc2', 'dev_plc_LS_01_003~zb_r_sbrlsj_bc1',
'dev_plc_LS_01_003~zb_r_sbrlsj_bc2', 'dev_plc_LS_01_003~zb_r_sbyxsj_bc1', 'dev_plc_LS_01_003~zb_r_sbyxsj_bc2',
'dev_plc_LS_01_004~zb_r_cl_bc1', 'dev_plc_LS_01_004~zb_r_cl_bc2', 'dev_plc_LS_01_004~zb_r_bhgs_bc1',
'dev_plc_LS_01_004~zb_r_bhgs_bc2', 'dev_plc_LS_01_004~zb_r_sbjgsj_bc1', 'dev_plc_LS_01_004~zb_r_sbjgsj_bc2',
'dev_plc_LS_01_004~zb_r_sbsjsj_bc1', 'dev_plc_LS_01_004~zb_r_sbsjsj_bc2', 'dev_plc_LS_01_004~zb_r_sbsbsj_bc1',
'dev_plc_LS_01_004~zb_r_sbsbsj_bc2', 'dev_plc_LS_01_004~zb_r_sbrlsj_bc1', 'dev_plc_LS_01_004~zb_r_sbrlsj_bc2',
'dev_plc_LS_01_004~zb_r_sbyxsj_bc1', 'dev_plc_LS_01_004~zb_r_sbyxsj_bc2']},
'timestamp': {'$gte': ISODate("2020-01-01T00:00:00.000+0000"), '$lte': ISODate("2020-12-31T00:00:00.000+0000")},
'value': {'$ne': 0}
}
},
{
'$addFields': {"d_c": {"$split": ["$code", "~"]}}
},
{
'$addFields': {
'device_code': {"$arrayElemAt": ["$d_c", 0]},
'kpi_code': {"$arrayElemAt": ["$d_c", 1]}
}
},
{
'$addFields': {
"length": {'$strLenCP': "$kpi_code"}
}
},
{
'$addFields': {
"kpi_length": {'$toInt': {'$subtract': ["$length", 4]}}
}
},
{
'$project': {
'_id': 0,
'device_code': '$device_code',
'timestamp': '$timestamp',
'value': '$value',
'kpi_code': {'$substr': ["$kpi_code", 0, "$kpi_length"]},
'shift': {'$substr': ["$kpi_code", {'$toInt': {'$add': ["$kpi_length", 1]}}, 3]}
}
},
{
'$group': {'_id': {'device_code': '$device_code', 'timestamp': '$timestamp', 'shift': '$shift'},
"kpi_dict": {"$push": {"k": "$kpi_code", "v": "$value"}}
}
},
{
'$project': {
"_id": 0,
'shift': '$_id.shift',
'device_code': '$_id.device_code',
'kpi_dict': {"$arrayToObject": "$kpi_dict"},
'timestamp': '$_id.timestamp'
}
},
{
'$sort': {'device_code': 1, 'timestamp': 1, 'shift': 1}
}
])
5.解析
5.1 '$match'
操作符先匹配出满足条件的数据,缩小数据操作范围
{
'$match': {'code': {'$in': ['dev_plc_LS_01_001~zb_r_cl_bc1', 'dev_plc_LS_01_001~zb_r_cl_bc2']},
'timestamp': {'$gte': ISODate("2020-01-01T00:00:00.000+0000"), '$lte': ISODate("2020-12-31T00:00:00.000+0000")},
'value': {'$ne': 0}
}
}
5.2 '$split'
拆分code字段分成两部分,设备编码和带班次的指标code. '$addFields'
增加字段d_c,内容是个数组。如['dev_plc_LS_01_001', 'zb_r_cl_bc1']
{
'$addFields': {"d_c": {"$split": ["$code", "~"]}}
}
5.3 '$arrayElemAt'
按下标取数组的数据区分出设备编码和指标code,再'$addFields'
增加以上两个字段,分别命名device_code,kpi_code
{
'$addFields': {
'device_code': {"$arrayElemAt": ["$d_c", 0]},
'kpi_code': {"$arrayElemAt": ["$d_c", 1]}
}
}
5.4 接下来要拆分kpi_code字段,可以确定的是班次名长度是3,所以先'$strLenCP'
统计每个kpi_code总的长度'length
’,再统计每个kpi_code使用'$subtract'
操作符减去’_bc1’后缀以后的’指标长度"kpi_length"
{
'$addFields': {
"length": {'$strLenCP': "$kpi_code"}
}
},
{
'$addFields': {
"kpi_length": {'$toInt': {'$subtract': ["$length", 4]}}
}
},
5.5 由于上述操作引入了很多中间过渡字段,所以这里'$project'
过滤出实际需要的字段。并使用'$substr'
操作符取’kpi_code’下标0开始长度为kpi_length长度即为拆分后的真正指标code,班次shift取’kpi_code’下标kpi_length+1,长度为3得到shift
{
'$project': {
'_id': 0,
'device_code': '$device_code',
'timestamp': '$timestamp',
'value': '$value',
'kpi_code': {'$substr': ["$kpi_code", 0, "$kpi_length"]},
'shift': {'$substr': ["$kpi_code", {'$toInt': {'$add': ["$kpi_length", 1]}}, 3]}
}
},
5.6 对数据进行分组,按设备编码、时间、班次分组,并使用"$push"
操作符将每个设备每天每个班次的指标汇总起来。
{
'$group': {'_id': {'device_code': '$device_code', 'timestamp': '$timestamp', 'shift': '$shift'},
"kpi_dict": {"$push": {"k": "$kpi_code", "v": "$value"}}
}
},
5.7 数据最终显示与排序"$arrayToObject"
将数组转为字典(文档)格式
{
'$project': {
"_id": 0,
'shift': '$_id.shift',
'device_code': '$_id.device_code',
'kpi_dict': {"$arrayToObject": "$kpi_dict"},
'timestamp': '$_id.timestamp'
}
},
{
'$sort': {'device_code': 1, 'timestamp': 1, 'shift': 1}
}
至此,本次mongo聚合操作结束,大功告成!水平有限,上述如有优化还请不吝赐教!