DATA AGGREGATION FRAMEWORK
Ankit Kansal & Nayan Naik
Data Aggregation is a powerful technique provided by MongoDB for performing aggregation analysis. This technique is mostly similar with what we have in oracle as GROUP BY clause.
There are some of guidelines provided by MongoDB developer zone which you must follows to obtain the desired output.
Data flow inside an aggregation function moves from left to right.
db.collection_name.aggregate(–set of operators required)
Let’s understand the working of Aggregation framework with the use of Operators listed above.
$project is a PROJECTION operator and is used with aggregation framework and mainly its work is to project the selected fields from a given document.
In this query $project selects name filed from things collection by writing name:1 and as we know that _id field is by default selected so to avoid displaying _id field we have written _id:0.
NOTE:- Aggregation framework works upon the principle of pipe lines that is once you have select the fields and you have moved to next part then only those selected fields will be available to perform operations for you and other fields from the collection will be vanished.more….
In the above figure the query is selecting/projecting only the names from the collection and later on when i am using $match/WHERE operator to put a condition on the data the output which received is empty. This shows working as a pipe line in aggregation framework.
$match operator is used to filter the documents based upon some defined conditions.This operator is mostly similar with WHERE clause of SQL.
In the previous example i have used $match operator with $project operator. As described earlier $match operator is used for filtering the data based upon some required condition and then $project operator is used which helps in selecting the desired fields only.
LETS TAKE SOME SCENARIOS WHICH WILL COVER MOST OF THE OPERATORS:-
When you are using _id operator which tells about the column on which aggregation is to be performed, there you have to use “$” symbol before the field name and it tells that data should be taken of the field, and keep in mind that when you are using double group by aliasing of columns must be done.
For this scenario i have a collection name employee which is somehow resembles emp table of oracle.
$group operator is similar as group by clause in oracle. As, we already know that $project is used for selecting the columns from the collection. _id operator is used to specify the column on which group by operation is to be performed (such as group by deptno). count is a variable name that is used to display the values generated by $sum operator for each group. $sum is used to add a value specified in the clause within a group for each member. In my example i have used 1 so, in this case 1 will addded for each member within a group. If i have taken 2 instead of 1 then for each member in a group 2 is added.Here 1 behaves as count(empno) function in ORACLE.
In this scenario same employee collection is used and $sort operator is used for sorting the result. Now,
If you want to perform some operations on the data generated within aggregation function then it should be performed to the next level in which it is produced. Such as count data is produced with in the group operator so to perform some functions onto it you must perform onto the next level in which it is produced. In this scenario i want to sort the result on the basis of count generated. That’s why $sort operator was applied onto the next level.
Same employee collection is used in this case
1) Firstly $skip operator is used to skip the top rows generated.
2) Secondly $limit
operator is used to limit the number of rows to be processed further.
This Scenario helps you to understand how a double group by works in MongoDB. Like in Oracle where we do as group by deptno,jobid. Now to achieve a similar kind of functionality in MobgoDB it should be done as-
Note:- In this case i have sorted the data on the basis of deptno which is a sub-document of _id field. So in this case we have to use “_id.deptno” field name to sort the final data. Nested documents can be accessed using . operator and “” quotes were also necessary.(-) negative symbol signifies that data should be sorted in descending order only.
This scenario considered the usage of $unwind operator. $unwind operator is only used when you have array as a data type and $unwind operator is used to separate the values present in the array. For each element in the array a separate document is created with the same object_Id.
Output Received After the query:-
In this scenario i ll show you the usage of $substr operator to be used in mongoDB. Like in other technologies, here also $substr is used for data extraction from a particular selected field, but currently it is only supported by String Fileds.
So, by writing this type of query one can extract data from the given selected field and later on group by operation can also be performed onto it. The first integer inside the syntax specifies the location to start from the string and the next integer tells how many characters you want from the string to be extracted. By writing this query and later on by applying $group operator you can identify and count the number of people who have same aliases.
In this scenario we will see the usage of $subtract operator. This operator provides functionality of deleting one field from the other.
This will return you the difference of two numbers. Second number is subtracted from the first one.
Similarly, you can work upon $sum operator.
$strcasecmp:-This operator is used to compare the two given string in the document if the length of the first string is greater then that of second string then the result is positive, and if the length of the second string is greater then that of first string then the result is in negative. If both the string are same then return output is 0.