工作日记2017.08.16

最新推荐文章于 2019-01-04 20:25:21 发布

一_嚿_饭团

最新推荐文章于 2019-01-04 20:25:21 发布

阅读量176

点赞数

分类专栏：工作随笔文章标签：工作

本文链接：https://blog.csdn.net/y1yo2/article/details/77318960

版权

工作随笔专栏收录该内容

28 篇文章 0 订阅

订阅专栏

二.聚合操作之group,distinct,count
count()方法可以查询统计符合条件的集合的总数

db.COLLECTION_NAME.count(<query>) // 此方法等价于 db.COLLECTION_NAME.find(<query>).count()

在分布式集合中，会出现计算错误的情况，这个时候推荐使用aggregate；

distinct命令可以找出给定键的所有去重之后的值。使用时也必须指定集合和键

db.runCommand({ distinct: "<collection>", key: "<field>", query: <query> }) //此方法等价于 db.collection.distinct(field, query)

参数说明：

collection :要查询的集合的名称
key:需要去重的字段的名称
query :可选参数，指明查询条件，相当于SQL中的where语句
Group操作：mongodb2.2版本对于返回数据最多只包涵20000个元素，最多支持20000独立分组；对于超过20000的独立分组建议采用mapreduce；

db.runCommand({group:{ns: <namespace>,
                      key: <key>,
                      $reduce: <reduce function>,
                      $keyf: <key function>,
                      cond: <query>,
                      finalize: <finalize function>
                      }
                      }) //该方法等价于 db.collection.group({ key, reduce, initial [, keyf] [, cond] [, finalize] })

参数说明

ns:集合名称
key：用来分组文档的字段。和keyf两者必须有一个
keyf：可以接受一个javascript函数。用来动态的确定分组文档的字段。和key两者必须有一个
initial：reduce中使用变量的初始化
reduce：执行的reduce函数。函数需要返回值。
cond：执行过滤的条件。
finallize：在reduce执行完成，结果集返回之前对结果集最终执行的函数。可选的。

三.Mongodb数据统计分析：group by、count(distinct userId)

select 
      date_format(gmtCreate,'%Y%m%d') day,
      count(1) shareCount,
      count(distinct userId) shareUserCount
   from t_share_log
   where gmtCreate between str_to_date(20161001, '%Y%m%d') and str_to_date(20161231, '%Y%m%d');

mongodb命令行：
db.runCommand({group:
    {
        ns:"t_share_log",
        cond : { "$and":[{"gmtCreate":{"$gt":new ISODate("2016-10-01T00:00:00.000Z")}}, {"gmtCreate":{"$lt":new ISODate("2016-12-31T23:59:59.999Z")}}]},
        $keyf:function(doc){
          var myDate = new Date(doc.gmtCreate);
          var mm = '0'+(myDate.getMonth()+1); 
          var dd = '0'+myDate.getDate();
          return {day:myDate.getFullYear()+''+mm.substring(mm.length-2)+''+dd.substring(dd.length-2)};
        },
        initial:{"shareCount" : 0 , "shareUserCount" : 0 , "userIdMap" : {}},
        $reduce:function(doc, prev){ 
          if(doc.userId != null){  
            prev.shareCount ++;  
            if(prev.userIdMap[doc.userId] == null) {
              prev.shareUserCount ++;
              prev.userIdMap[doc.userId] = 1;
            }
           }
          },
        finalize: function(doc){ delete doc.userIdMap; }
    }
});

通过定义keyf的function格式化group by的日期值，再通过定义reduce:function函数借助userIdMap去重userId(相当于dintinct userId)

或者

db.col1.aggregate(
          [{$group: {
              _id: "$field1",
              count: { $sum : 1 }
            }}, {
            $group: {
              _id: "$_id",
              count: { $sum : "$count" }
            }},{
              $out: "distinctCount"
            }],
         {allowDiskUse:true}
)

解释：The second $group is not actually doing anything here since it’s just using the same _id value, as well as passing through the same output value to $sum. Also $out was introduced in the same version where the default .aggregate() output is actually a “cursor”, so previous problems with a result in a single BSON document are no longer a problem here. Other API’s than the shell might “default” to a BSON document, but they can easily be altered to return a “cursor” instead. You only need to use $out where you actually “intend” to create a new collection for the results.