1. MongoDB Change Streams

Change streams can benefit architectures with reliant business systems, informing downstream systems once data changes are durable. For example, change streams can save time for developers when implementing Extract, Transform, and Load (ETL) services, cross-platform synchronization, collaboration functionality, and notification services, for any real-time event driven data integration.

    changeStream = db.inventory.watch()
    document = next(changeStream)

The five  Change Operation Types are  INSERT | DELETE | REPLACE | UPDATE | INVALIDATE.

Change Streams

  1. Only need read-access to collection subscribed on any node
  2. Come with Defined API
  3. Changes remain ordered across shards – client receives operations in correct order
  4. durable (to failure)
  5. resumable, use resumeToken, – pick up where you left off, on client side,
    but need to persist the resumeToken.
    changeStream = db.inventory.watch(resume: resumeToken)
  6. power of aggregation to target specific operation types
  7. Use documentKey for sharding, uniqueness is enforced only at shard level 


var coll = client.db('demo').collection('peeps');

var cursor = coll.aggregate([

   { $changeStream: { fullDocument: 'updateLookup' } },

   { $match: { operationType: { $in: ['update', 'replace'] } } },

   { $project: {secret:0}}



2. MongoDB Stitch

MongoDB Stitch is a backend as a service (BaaS) with declarative access control and offers ‘composability’ of data integration to popular third party systems. Availability is only on MongoDB Atlas so far. Stitch is an exciting new feature which will put Mongo back on the map to compete with other easy-to-use BaaS systems such as Firebase or Apigee.

Stitch allows building REST APIs and microservices declaratively. SDKs in Javascript, Swift, and Java exist. The control layer allows to define which fields a user can read or write with simple JSON rules.  Data privacy and compliance can be implemented at this way for read-through data access but also for data aggregations.

Built-in integrations make it simple for your app to use leading third party services and auth APIs, e.g. for AWS,  Facebook, Google, Twilio and others. Your Stitch microservices can thus be defined with external integrations included. Think of data orchestration or safely provide a new service endpoint for new applications.

The Stitch execution cycle consists of:

  1. Request from client is received
  2. Request is parsed, Stitch applies security rules
  3. Stitch orchestrates the database access and services  
  4. Stitch aggregates data and applies rules
  5. client receives result

An API defined in Stitch contains the calling context, such as client IP address or user authentication, which can be used in the audit log for security.

Price is per downloaded data size. Up to 25 GB per month are free. 100 GB per month are $75.


2. Aggregations in MongoDB

Mongo has a powerful aggregation pipeline that can take on any analytics task.

Input stream { } { } { } —-> Result { } { }

One of the more known aggregation operators are $match, $unwind, $group, $sort, $skip, $limit, $project.

https://docs.mongodb.com/manual/reference/operator/aggregation/ has over 120 aggregation operators documented. According to Asya Kamsky they can be categorized by their action into following seven categories:

  1. Group and Transform ($group, $count)
  2. Aliases ($bucket)
  3. Special Input/Output ($facet, $collStats)
  4. Re-order ($sort)
  5. Transform ($project, $lookup)
  6. Decrease ($skip, $limit)
  7. Increase ($unwind)

The aggregation pipeline is a toolbox full of tools for everything, operations on arrays, nested objects, strings and date manipulations, logic, math, and statistics operations. There is a considerable learning curve, but the MongoDB University courses get you started for free.

Data aggregations and transformations (ETL) are absolutely possible in Mongo. Even switching to the more popular ELT approach with data transformed in the same location after it is loaded is conceivable.

However, analytics in Mongo is not as widespread as data warehouses in SQL. The reason I suspect to be the extra verbosity of the Javascript syntax of Mongo (too many curly braces), the learning curve, and the fact that MongoDB is only one of many document databases, which traditionally are good for semi-structured or unstructured data, whereas fully structured data is a perfect fit for SQL data warehouse.



   {$unwind: “$subjects”},

   {$match: [“language”:”English”, “subjects”:/^[ABC]/}},

   {$group:{_id:$”subjects”, count:{$sum:1}}},




Analytic question: “Find top 3 books in English with subject beginning with A, B, or C”


3. MongoDB Compass

Finally, MongoDB provides a GUI for its database to visually explore data, run ad hoc queries, edit data with full CRUD support, performance tune queries, etc. It is available on Mac, Linux, and Windows and has an extensible architecture for plugins. There is a free community edition and a commercial version.


4. MongoDB Spark connector

Spark, the successor to Hadoop, provides a framework for massive parallel processing. MongoDB now offers a Spark connector which allows for heavy weight analytics with data from MongoDB directly. Spark is known for its parallelism, stream processing, caching layer, performance, interactive shell, APIs in Java, Scala, Python, R, and its wide adoption.

While MongoDB comes with a powerful aggregation framework, the bridge to Spark allows to do a lot more faster with the data. The analytics pipeline with the Spark connector could look like so

                    Mongo —> Spark —> Mongo (result)


For instance, Spark comes with the machine learning library MLlib, donated by IBM.

MLlib has distributed algorithms for the classic ML problems of classification, regression, decision trees, recommenders, clustering, linear algebra, and many more.

The MongoDB Spark connector opens the road to many powerful ML algorithms to your MongoDB.




Facebooktwittergoogle_pluslinkedinmailby feather