MongoDB Aggregation Framework with Mapper-Reducer Program

Jyoti Pawar
5 min readJun 13, 2022

MongoDB is the first database which comes to mind when we have work on unstructured data and manipulate the shape of data quickly and efficiently. For this MongoDB comes with a powerful framework which the Aggregation Framework to manipulate data effectively in the server itself.

What is Aggregation Framework ?

The Aggregation framework is just a way to query documents in a collection in MongoDB. This framework exists because when you start working with and manipulating data, you often need to crunch collections together, modify them, pluck out fields, rename fields, concatinate them together, group documents by field, explode array of fields in different documents and so on.

The simple query set in MongoDB only allows you to retrieve full or parts of individual documents. They don’t really allow you to manipulate the documents on the server and then return them to your application. This is where the aggregation framework from MongoDB comes in. It’s nothing external, as aggregation comes baked into MongoDB.

What is Map Reduce ?

MapReduce is a programming paradigm that works on a big data over distributed system. It analysis data and produce aggregated results. Key / values pairs have declared in the map function which we use this values to accumulate data. Later in reduce function we use this accumulated data, accumulated in the map function, to convert them into the aggregated results. The Mapper and Reducer Programs are written in JavaScript language.

Let’s see the example for this !

Example:

Create a sample collection orders with these documents:

db.orders.insertMany([   { _id: 1, cust_id: "Ant O. Knee", ord_date: new Date("2020-03-01"), price: 25, items: [ { sku: "oranges", qty: 5, price: 2.5 }, { sku: "apples", qty: 5, price: 2.5 } ], status: "A" },   { _id: 2, cust_id: "Ant O. Knee", ord_date: new Date("2020-03-08"), price: 70, items: [ { sku: "oranges", qty: 8, price: 2.5 }, { sku: "chocolates", qty: 5, price: 10 } ], status: "A" },   { _id: 3, cust_id: "Busby Bee", ord_date: new Date("2020-03-08"), price: 50, items: [ { sku: "oranges", qty: 10, price: 2.5 }, { sku: "pears", qty: 10, price: 2.5 } ], status: "A" },   { _id: 4, cust_id: "Busby Bee", ord_date: new Date("2020-03-18"), price: 25, items: [ { sku: "oranges", qty: 10, price: 2.5 } ], status: "A" },   { _id: 5, cust_id: "Busby Bee", ord_date: new Date("2020-03-19"), price: 50, items: [ { sku: "chocolates", qty: 5, price: 10 } ], status: "A"},   { _id: 6, cust_id: "Cam Elot", ord_date: new Date("2020-03-19"), price: 35, items: [ { sku: "carrots", qty: 10, price: 1.0 }, { sku: "apples", qty: 10, price: 2.5 } ], status: "A" },   { _id: 7, cust_id: "Cam Elot", ord_date: new Date("2020-03-20"), price: 25, items: [ { sku: "oranges", qty: 10, price: 2.5 } ], status: "A" },   { _id: 8, cust_id: "Don Quis", ord_date: new Date("2020-03-20"), price: 75, items: [ { sku: "chocolates", qty: 5, price: 10 }, { sku: "apples", qty: 10, price: 2.5 } ], status: "A" },   { _id: 9, cust_id: "Don Quis", ord_date: new Date("2020-03-20"), price: 55, items: [ { sku: "carrots", qty: 5, price: 1.0 }, { sku: "apples", qty: 10, price: 2.5 }, { sku: "oranges", qty: 10, price: 2.5 } ], status: "A" },   { _id: 10, cust_id: "Don Quis", ord_date: new Date("2020-03-23"), price: 25, items: [ { sku: "oranges", qty: 10, price: 2.5 } ], status: "A" }])

Return the Total Price Per Customer

Perform the map-reduce operation on the orders collection to group by the cust_id, and calculate the sum of the price for each cust_id:

  1. Define the map function to process each input document:
  • In the function, this refers to the document that the map-reduce operation is processing.
  • The function maps the price to the cust_id for each document and emits the cust_id and price.
var mapFunction1 = function() {
emit(this.cust_id, this.price);
};

2. Define the corresponding reduce function with two arguments keyCustId and valuesPrices:

  • The valuesPrices is an array whose elements are the price values emitted by the map function and grouped by keyCustId.
  • The function reduces the valuesPrice array to the sum of its elements.
var reduceFunction1 = function(keyCustId, valuesPrices) {   return Array.sum(valuesPrices);};

3. Perform map-reduce on all documents in the orders collection using the mapFunction1 map function and the reduceFunction1 reduce function:

db.orders.mapReduce(   mapFunction1,   reduceFunction1,   { out: "map_reduce_example" })

4. This operation outputs the results to a collection named map_reduce_example. If the map_reduce_example collection already exists, the operation will replace the contents with the results of this map-reduce operation.

Query the map_reduce_example collection to verify the results:

db.map_reduce_example.find().sort( { _id: 1 } 

The operation returns these documents:

{ "_id" : "Ant O. Knee", "value" : 95 }{ "_id" : "Busby Bee", "value" : 125 }{ "_id" : "Cam Elot", "value" : 60 }{ "_id" : "Don Quis", "value" : 155 }

Aggregation Alternative:

Using the available aggregation pipeline operators, you can rewrite the map-reduce operation without defining custom functions:

db.orders.aggregate([   { $group: { _id: "$cust_id", value: { $sum: "$price" } } },   { $out: "agg_alternative_1" }])
  1. The $group stage groups by the cust_id and calculates the value field (See also $sum). The value field contains the total price for each cust_id.
  2. The stage output the following documents to the next stage:
  • { "_id" : "Don Quis", "value" : 155 }{ "_id" : "Ant O. Knee", "value" : 95 }{ "_id" : "Cam Elot", "value" : 60 }{ "_id" : "Busby Bee", "value" : 125 }

3. Then, the $out writes the output to the collection agg_alternative_1. Alternatively, you could use $merge instead of $out.

4. Query the agg_alternative_1 collection to verify the results:

  • db.agg_alternative_1.find().sort( { _id: 1 } )

5. The operation returns the following documents:

{ "_id" : "Ant O. Knee", "value" : 95 }{ "_id" : "Busby Bee", "value" : 125 }{ "_id" : "Cam Elot", "value" : 60 }{ "_id" : "Don Quis", "value" : 155 }

Hope you would get some insights from this blog.

Thanks !!

--

--

Jyoti Pawar

Devops || AWS || ML || Deep learning || Python || Flask || Ansible RH294 || OpenShift DO180