mongodb mapreduce 总结

mongodb  mapreduce

官方详细说明地址:https://docs.mongodb.org/manual/reference/method/db.collection.mapReduce/#mapreduce-map-mtd

1.语法结构:

db.collection.mapReduce(
	<map>,
	<reduce>,
	{
		out:<collection>,

		query:<document>,

		sort:<document>,

		limit:<number>,

		finalize:<function>,

		scope:<document>,

		jsMode:<bollean>,

		verbose:<bollean>,

		bypassDocumentValidation:<bollean>

	}

);

详细说明:

干净概念:

在map reduce finalize函数中,函数里面应该是干净的,不能出现连接数据库的操作等,但是也可以使用一些函数,如下:

Available Properties 	
args
MaxKey
MinKey

Available Functions 	 	
assert()
BinData()
DBPointer()
DBRef()
doassert()
emit()
gc()
HexData()
hex_md5()
isNumber()
isObject()
ISODate()
isString()
	
Map()
MD5()
NumberInt()
NumberLong()
ObjectId()
print()
printjson()
printjsononeline()
sleep()
Timestamp()
tojson()
tojsononeline()
tojsonObject()
UUID()
version()


1.map

格式:

function() {
   ...
   emit(key, value);
}

把每一个document、转变成0个或者多个emit。用来做mapreduce的初始数据

转变成0行的方式:

function() {
    if (this.status == 'A')
        emit(this.cust_id, 1);
}

转变成多行的方式:

function() {
    this.items.forEach(function(item){ emit(item.sku, 1); });
}

REQUIREMENTS:

1.在map函数中,this代表当前的document

2.不允许访问数据库

3.不能和外部的function进行交互

可以从 scope中取值。

4.emit里面的数据大小 最大为MongoDB’s maximum BSONdocument size. 的一半大,The maximum BSON document size is 16 megabytes.

因此emit里面的数据不能超过8MB

5.一个document 可能得到0个,1个,多个 emit


2.reduce

function(key, values) {
   ...
   return result;
}

REQUIREMENT:

1.不能访问数据库和外部函数

2.当一个key 只有一个数据的时候,reduce函数将不被执行,当前的值作为reduce的结果

3.reduce函数可能被调用多次,譬如在分片的情况下需要多次合并,因此,reduce的结果格式,是可以作为下一次reduce的传入数据。英文如下:

  • MongoDB can invoke the reduce function more than once for thesame key. In this case, the previous output from thereducefunction for that key will become one of the input values to the nextreduce function invocation for that key.

4.可以从 scope中取值。

总之,reduce的结果格式,需要和map函数的emit部分的格式一致,这样才能多次自行reduce


3. OPTIONS

3.1 out 有两种格式

out: <collectionName>

out: { <action>: <collectionName>
        [, db: <dbName>]
        [, sharded: <boolean> ]
        [, nonAtomic: <boolean> ] }

第一种格式,默认为
out: { replace: <collectionName>
        [, db: <inputDB>]
        [, sharded: false ]
        [, nonAtomic: false ] }

action的取值:

   replace:整体替换,相当于如果这个collection存在,则清空,在插入结果

   merge:如果插入的key结果在collection中存在,则会被覆盖,没有的继续存在

   reduce:和collection中的结果合并,如果key存在,将使用reduce 将插入的数据和存在的数据进行reduce处理。

reduce比较适合cron隔断时间执行某个时间的数据,然后结果会合并起来,这样多次执行和一次执行的结果是一样的,这样的好处是可以实时的查看一部分数据。

db的取值:

默认是input的对应的数据库,这里可以自定output数据的库

sharded的取值:

设置为true为启用分片,您需要在output databse中enable sharding,mapreduce将把_id作为shard key将output collection放到不同的分片上。

nonAtomic的取值:

非原子的意思,默认为false,也就是原子性,mapreduce在执行的时候将锁表

只能应用 action为merge或reduce的时候才能设置为true

如果设置为ture,将不锁表,客户端访问有可能读取到output的中间数据。


3.2 finalize Function

function(key, reducedValue) {
   ...
   return modifiedObject;
}

不可以访问数据库和其他函数

可以访问scope中定义的参数


例子:

var mapFunction2 = function() {
                       for (var idx = 0; idx < this.items.length; idx++) {
                           var key = this.items[idx].sku;
                           var value = {
                                         count: 1,
                                         qty: this.items[idx].qty
                                       };
                           emit(key, value);
                       }
                    };

var reduceFunction2 = function(keySKU, countObjVals) {
                     reducedVal = { count: 0, qty: 0 };

                     for (var idx = 0; idx < countObjVals.length; idx++) {
                         reducedVal.count += countObjVals[idx].count;
                         reducedVal.qty += countObjVals[idx].qty;
                     }

                     return reducedVal;
                  };

var finalizeFunction2 = function (key, reducedVal) {

                       reducedVal.avg = reducedVal.qty/reducedVal.count;

                       return reducedVal;

                    };

db.orders.mapReduce( mapFunction2,
                     reduceFunction2,
                     {
                       out: { merge: "map_reduce_example" },
                       query: { ord_date:
                                  { $gt: new Date('01/01/2012') }
                              },
                       finalize: finalizeFunction2
                     }
                   )


This operation uses the query field to select only thosedocuments withord_date greater than newDate(01/01/2012). Then it output the results to a collectionmap_reduce_example. If themap_reduce_example collectionalready exists, the operation will merge the existing contents withthe results of this map-reduce operation.



db.collection.mapReduce() takes the following parameters:

FieldTypeDescription
mapfunction

A JavaScript function that associates or “maps” a value with akey and emits the key and value pair.

See Requirements for the map Function for more information.

reducefunction

A JavaScript function that “reduces” to a single object all thevalues associated with a particular key.

See Requirements for the reduce Function for more information.

optionsdocumentA document that specifies additional parameters todb.collection.mapReduce().
bypassDocumentValidationboolean

Optional. Enables mapReduce to bypass document validationduring the operation. This lets you insert documents that do notmeet the validation requirements.

New in version 3.2.

The following table describes additional arguments thatdb.collection.mapReduce() can accept.

FieldTypeDescription
outstring or document

Specifies the location of the result of the map-reduce operation.You can output to a collection, output to a collection with anaction, or output inline. You may output to a collection whenperforming map reduce operations on the primary members of the set;on secondary members you may only use the inline output.

See out Options for more information.

querydocumentSpecifies the selection criteria using query operators for determining the documents input to themap function.
sortdocumentSorts the input documents. This option is useful foroptimization. For example, specify the sort key to be the same asthe emit key so that there are fewer reduce operations. The sort keymust be in an existing index for this collection.
limitnumberSpecifies a maximum number of documents for the input into themap function.
finalizefunction

Optional. Follows the reduce method and modifies the output.

See Requirements for the finalize Function for more information.

scopedocumentSpecifies global variables that are accessible in the map,reduce and finalize functions.
jsModeboolean

Specifies whether to convert intermediate data into BSONformat between the execution of the map and reducefunctions. Defaults to false.

If false:

  • Internally, MongoDB converts the JavaScript objects emittedby the mapfunction to BSON objects. These BSONobjects are then converted back to JavaScript objects whencalling the reduce function.
  • The map-reduce operation places the intermediate BSON objectsin temporary, on-disk storage. This allows the map-reduceoperation to execute over arbitrarily large data sets.

If true:

  • Internally, the JavaScript objects emitted during mapfunction remain as JavaScript objects. There is no need toconvert the objects for the reduce function, whichcan result in faster execution.
  • You can only use jsMode for result sets with fewer than500,000 distinct key arguments to the mapper’s emit()function.

The jsMode defaults to false.

verboseBooleanSpecifies whether to include the timing information in theresult information. The verbose defaults to true to includethe timing information.


  • 0
    点赞
  • 2
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值