MapReduce是个非常灵活和强大的数据聚合工具。
MongoDB也提供了MapReduce,查询语言为JavaScript。MongoDB中的MapReduce主要有以下几阶段:
1. Map:把一个操作Map到集合中的每一个文档
2. Shuffle:根据Key分组对文档,并且为每个不同的Key生成一系列(>=1个)的值表(List of values)。
3. Reduce:处理值表中的元素,直到值表中只有一个元素。然后将值表返回到Shuffle过程,循环处理,
直到每个Key只对应一个值表,并且此值表中只有一个元素,这就是MR的结果。
4. Finalize此步骤不是必须的。在得到MR最终结果后,再进行一些数据“修剪”性质的处理。
MongoDB中使用emit函数向MapReduce提供Key/Value对。
Reduce函数接受两个参数:Key,emits. Key即为emit函数中的Key。emits是一个数组,它的元素就是emit函数提供的Value。
Reduce函数的返回结果必须要能被Map或者Reduce重复使用,所以返回结果必须与emits中元素结构一致。
Map或者Reduce函数中的this关键字,代表当前被Mapping文档。
例子:在一个Collection test中插入随机这样格式的数据{"user":"A","goodzid":1,"price":1},顾客A买商品1花了钱1
1.每个用户各购买了多少个产品?(单一Key做MR)
SQL实现:select user,count(goodzid) from test group by user
MapReduce实现:
map = function () {
emit(this.user, {
count: 1
})
}
reduce = function (key, values) {
var cnt = 0;
values.forEach(function (val) {
cnt += val.count;
});
return {
"count": cnt
};
}
MR结果存到集合
db.mapReduce(map, reduce, {
out: ""
})
查看MR之后结果样式
{
"_id": "A",
"value": {
"count": 1
}
}
2.每个用户不同的产品购买了多少个?(复合Key做MR)
SQL实现:select user,goodzid,count(*) from test group by user,goodzid
MapReduce实现
map = function () {
emit({
user: this.user,
goodzid: this.goodzid
}, {
count: 1
})
}
reduce = function (key, values) {
var cnt = 0;
values.forEach(function (val) {
cnt += val.count;
});
return {
"count": cnt
};
}
db.mapReduce(map, reduce, {
out: ""
})
查看MR之后结果样式
{
"_id": {
"user": "A",
"goodzid": 0
},
"value": {
"count": 103
}
}
3.每个用户购买的产品数量,总金额是多少?(复合Reduce结果处理)
SQL实现:select user,count(goodzid),sum(price) from test group by user
MapReduce实现
map = function () {
emit(this.user, {
amount: this.price,
count: 1
})
}
reduce = function (key, values) {
var res = {
amount: 0,
count: 0
}
values.forEach(function (val) {
res.amount += val.amount;
res.count += val.count
});
return res;
}
db.mapReduce(map, reduce, {
out: ""
})
查看MR之后结果样式
{
"_id": "A",
"value": {
"amount": 1,
"count": 1
}
}
4.在3中返回的amount的float精度需要改成两位小数,还需要得到商品的平均价格。(使用Finalize处理reduce结果集)
SQL实现:select user,cast(sum(price) as decimal(10, 2)) as amount,count(goodzid) as [count],
cast((sum(price)/count(goodzid)) as decimal(10,2)) as avgPrice from test group by user
MapReduce实现
map = function () {
emit(this.user, {
amount: this.price,
count: 1,
avgPrice: 0
})
}
reduce = function (key, values) {
var res = {
amount: 0,
count: 0,
avgPrice: 0
}
values.forEach(function (val) {
res.amount += val.amount;
res.count += val.count
});
return res;
}
finalizeFun = function (key, reduceResult) {
reduceResult.amount = (reduceResult.amount).toFixed(2);
reduceResult.avgPrice = (reduceResult.amount / reduceResult.count).toFixed(2);
return reduceResult;
}
db.mapReduce(map, reduce, {
out: "",
finalize: finalizeFun
})
查看MR之后结果样式
{
"_id": "A",
"value": {
"amount": "1",
"count": 1,
"avgPrice": "1"
}
}
5.统计单价大于6的goodzid,每个用户的购买数量.(筛选数据子集做MR)
db.mapReduce(map, reduce, {
query: {
price: {
"$gt": 6
}
},
out: ""
})
map和reduce省略