MongoDB Aggregation Framework with Mapper-Reducer Program

MongoDB is the first database which comes to mind when we have work on unstructured data and manipulate the shape of data quickly and efficiently. For this MongoDB comes with a powerful framework which the Aggregation Framework to manipulate data effectively in the server itself.
What is Aggregation Framework ?
The Aggregation framework is just a way to query documents in a collection in MongoDB. This framework exists because when you start working with and manipulating data, you often need to crunch collections together, modify them, pluck out fields, rename fields, concatinate them together, group documents by field, explode array of fields in different documents and so on.
The simple query set in MongoDB only allows you to retrieve full or parts of individual documents. They don’t really allow you to manipulate the documents on the server and then return them to your application. This is where the aggregation framework from MongoDB comes in. It’s nothing external, as aggregation comes baked into MongoDB.
What is Map Reduce ?
MapReduce is a programming paradigm that works on a big data over distributed system. It analysis data and produce aggregated results. Key / values pairs have declared in the map function which we use this values to accumulate data. Later in reduce function we use this accumulated data, accumulated in the map function, to convert them into the aggregated results. The Mapper and Reducer Programs are written in JavaScript language.
Let’s see the example for this !
Example:
Create a sample collection orders
with these documents:
db.orders.insertMany([ { _id: 1, cust_id: "Ant O. Knee", ord_date: new Date("2020-03-01"), price: 25, items: [ { sku: "oranges", qty: 5, price: 2.5 }, { sku: "apples", qty: 5, price: 2.5 } ], status: "A" }, { _id: 2, cust_id: "Ant O. Knee", ord_date: new Date("2020-03-08"), price: 70, items: [ { sku: "oranges", qty: 8, price: 2.5 }, { sku: "chocolates", qty: 5, price: 10 } ], status: "A" }, { _id: 3, cust_id: "Busby Bee", ord_date: new Date("2020-03-08"), price: 50, items: [ { sku: "oranges", qty: 10, price: 2.5 }, { sku: "pears", qty: 10, price: 2.5 } ], status: "A" }, { _id: 4, cust_id: "Busby Bee", ord_date: new Date("2020-03-18"), price: 25, items: [ { sku: "oranges", qty: 10, price: 2.5 } ], status: "A" }, { _id: 5, cust_id: "Busby Bee", ord_date: new Date("2020-03-19"), price: 50, items: [ { sku: "chocolates", qty: 5, price: 10 } ], status: "A"}, { _id: 6, cust_id: "Cam Elot", ord_date: new Date("2020-03-19"), price: 35, items: [ { sku: "carrots", qty: 10, price: 1.0 }, { sku: "apples", qty: 10, price: 2.5 } ], status: "A" }, { _id: 7, cust_id: "Cam Elot", ord_date: new Date("2020-03-20"), price: 25, items: [ { sku: "oranges", qty: 10, price: 2.5 } ], status: "A" }, { _id: 8, cust_id: "Don Quis", ord_date: new Date("2020-03-20"), price: 75, items: [ { sku: "chocolates", qty: 5, price: 10 }, { sku: "apples", qty: 10, price: 2.5 } ], status: "A" }, { _id: 9, cust_id: "Don Quis", ord_date: new Date("2020-03-20"), price: 55, items: [ { sku: "carrots", qty: 5, price: 1.0 }, { sku: "apples", qty: 10, price: 2.5 }, { sku: "oranges", qty: 10, price: 2.5 } ], status: "A" }, { _id: 10, cust_id: "Don Quis", ord_date: new Date("2020-03-23"), price: 25, items: [ { sku: "oranges", qty: 10, price: 2.5 } ], status: "A" }])
Return the Total Price Per Customer
Perform the map-reduce operation on the orders
collection to group by the cust_id
, and calculate the sum of the price
for each cust_id
:
- Define the map function to process each input document:
- In the function,
this
refers to the document that the map-reduce operation is processing. - The function maps the
price
to thecust_id
for each document and emits thecust_id
andprice
.
var mapFunction1 = function() {
emit(this.cust_id, this.price);
};
2. Define the corresponding reduce function with two arguments keyCustId
and valuesPrices
:
- The
valuesPrices
is an array whose elements are theprice
values emitted by the map function and grouped bykeyCustId
. - The function reduces the
valuesPrice
array to the sum of its elements.
var reduceFunction1 = function(keyCustId, valuesPrices) { return Array.sum(valuesPrices);};
3. Perform map-reduce on all documents in the orders
collection using the mapFunction1
map function and the reduceFunction1
reduce function:
db.orders.mapReduce( mapFunction1, reduceFunction1, { out: "map_reduce_example" })
4. This operation outputs the results to a collection named map_reduce_example
. If the map_reduce_example
collection already exists, the operation will replace the contents with the results of this map-reduce operation.
Query the map_reduce_example
collection to verify the results:
db.map_reduce_example.find().sort( { _id: 1 }
The operation returns these documents:
{ "_id" : "Ant O. Knee", "value" : 95 }{ "_id" : "Busby Bee", "value" : 125 }{ "_id" : "Cam Elot", "value" : 60 }{ "_id" : "Don Quis", "value" : 155 }
Aggregation Alternative:
Using the available aggregation pipeline operators, you can rewrite the map-reduce operation without defining custom functions:
db.orders.aggregate([ { $group: { _id: "$cust_id", value: { $sum: "$price" } } }, { $out: "agg_alternative_1" }])
- The
$group
stage groups by thecust_id
and calculates thevalue
field (See also$sum
). Thevalue
field contains the totalprice
for eachcust_id
. - The stage output the following documents to the next stage:
{ "_id" : "Don Quis", "value" : 155 }{ "_id" : "Ant O. Knee", "value" : 95 }{ "_id" : "Cam Elot", "value" : 60 }{ "_id" : "Busby Bee", "value" : 125 }
3. Then, the $out
writes the output to the collection agg_alternative_1
. Alternatively, you could use $merge
instead of $out
.
4. Query the agg_alternative_1
collection to verify the results:
db.agg_alternative_1.find().sort( { _id: 1 } )
5. The operation returns the following documents:
{ "_id" : "Ant O. Knee", "value" : 95 }{ "_id" : "Busby Bee", "value" : 125 }{ "_id" : "Cam Elot", "value" : 60 }{ "_id" : "Don Quis", "value" : 155 }
Hope you would get some insights from this blog.
Thanks !!