Helpful MongoDB queries

When I started working with MongoDB, I decided to keep helpful queries in a notebook. Maybe you'll find some of them handy too. Let's dive in!

db.isMaster

Almost daily, I find myself manipulating data on one of our test environments. However, in our set-up, we can only write to the primary instance in our replica set.

That's where db.isMaster() comes into play. The returned document contains the address of the primary member of the replica set. Then I connect to that member and work my magic, all error-free!

Getting all keys from a collection

While distinct gets the distinct values for a specific key, there is no standardized way of getting all the keys that exist within a collection. As usual, Stackoverflow had the answer:


db.collection.aggregate([
  {"$project":{"arrayofkeyvalue":{"$objectToArray":"$$ROOT"}}},
  {"$unwind":"$arrayofkeyvalue"},
  {"$group":{"_id":null,"allkeys":{"$addToSet":"$arrayofkeyvalue.k"}}}
])

This query leverages the power of the aggregation framework: processing data by sequentially applying different operations to it. Let's break it down to see how it works.

Assume we want to get all the keys from the following data set:


[
  {
    "_id": ObjectId("5a934e000102030405000000"),
    "name": "Winter T-shirt",
    "size": "L",
  },
  {
    "_id": ObjectId("5a934e000102030405000001"),
    "name": "Fleece sweater",
    "brand": "Zelia",
  },
  {
    "_id": ObjectId("5a934e000102030405000002"),
    "name": "Jeans",
    "color": "grey",
  }
]

In step 1, the objectToArray method is used to create an array of documents (1 for each original document). Every document in this new array has an id and arrayofkeyvalue, which contains two keys (k and v) per key-value pair in the original data.

Query (step 1):


db.collection.aggregate([
  {
    "$project": {
      "arrayofkeyvalue": {
        "$objectToArray": "$$ROOT"
      }
    }
  }
])

Output (step 1):


[
  {
    "_id": ObjectId("5a934e000102030405000000"),
    "arrayofkeyvalue": [
      {
        "k": "_id",
        "v": ObjectId("5a934e000102030405000000")
      },
      {
        "k": "name",
        "v": "Winter T-shirt"
      },
      {
        "k": "size",
        "v": "L"
      }
    ]
  },
  {
    "_id": ObjectId("5a934e000102030405000001"),
    "arrayofkeyvalue": [
      {
        "k": "_id",
        "v": ObjectId("5a934e000102030405000001")
      },
      {
        "k": "name",
        "v": "Fleece sweater"
      },
      {
        "k": "brand",
        "v": "Zelia"
      }
    ]
  },
  {
    "_id": ObjectId("5a934e000102030405000002"),
    "arrayofkeyvalue": [
      {
        "k": "_id",
        "v": ObjectId("5a934e000102030405000002")
      },
      {
        "k": "name",
        "v": "Jeans"
      },
      {
        "k": "color",
        "v": "grey"
      }
    ]
  }
]

In the next step, the output of step 1 is flattened with unwind into an array of documents that has the size of the amount of existing keys.

Query (step 2):


db.collection.aggregate([
  {
    "$project": {
      "arrayofkeyvalue": {
        "$objectToArray": "$$ROOT"
      }
    }
  },
  {
    "$unwind": "$arrayofkeyvalue"
  }
])

Output (step 2):


[
  {
    "_id": ObjectId("5a934e000102030405000000"),
    "arrayofkeyvalue": {
      "k": "_id",
      "v": ObjectId("5a934e000102030405000000")
    }
  },
  {
    "_id": ObjectId("5a934e000102030405000000"),
    "arrayofkeyvalue": {
      "k": "name",
      "v": "Winter T-shirt"
    }
  },
  {
    "_id": ObjectId("5a934e000102030405000000"),
    "arrayofkeyvalue": {
      "k": "size",
      "v": "L"
    }
  },
  {
    "_id": ObjectId("5a934e000102030405000001"),
    "arrayofkeyvalue": {
      "k": "_id",
      "v": ObjectId("5a934e000102030405000001")
    }
  },
  {
    "_id": ObjectId("5a934e000102030405000001"),
    "arrayofkeyvalue": {
      "k": "name",
      "v": "Fleece sweater"
    }
  },
  {
    "_id": ObjectId("5a934e000102030405000001"),
    "arrayofkeyvalue": {
      "k": "brand",
      "v": "Zelia"
    }
  },
  {
    "_id": ObjectId("5a934e000102030405000002"),
    "arrayofkeyvalue": {
      "k": "_id",
      "v": ObjectId("5a934e000102030405000002")
    }
  },
  {
    "_id": ObjectId("5a934e000102030405000002"),
    "arrayofkeyvalue": {
      "k": "name",
      "v": "Jeans"
    }
  },
  {
    "_id": ObjectId("5a934e000102030405000002"),
    "arrayofkeyvalue": {
      "k": "color",
      "v": "grey"
    }
  }
]

See how we're getting there? In the last step, we're adding all the k values (meaning: the keys from the original data) to a set using addToSet.

If you're not familiar with a set, just think of it (for our use case) as a deduplicated array.

Query (step 3):


db.things.aggregate([
  {"$project":{"arrayofkeyvalue":{"$objectToArray":"$$ROOT"}}},
  {"$unwind":"$arrayofkeyvalue"},
  {"$group":{"_id":null,"allkeys":{"$addToSet":"$arrayofkeyvalue.k"}}}
])

Output (step 3):


[
  {
    "_id": null,
    "allkeys": [
      "name",
      "size",
      "color",
      "brand",
      "_id"
    ]
  }
]

Easy, right?

Querying dates

At the start of working with MongoDB, one of the hardest things was querying for dates. Not just because I wasn't yet accustomed to the date format, but I also found myself confused between lt (less than, which I seemed to often confuse for "larger than") and gt (greater than).

That's why this simple example will live forever in my notebook:


db.collection.find({
  "created": { $lt: new ISODate("2020-03-11T18:00:00Z") }
})

Finding arrays with a minimum size

The MongoDB $size operator has an important limitation: you can only use it to query for equality. However, you can harness array indexes just like in Javascript.

If for example you want to query for a document that has at least 5 values (mind the zero-based indexing!) in the tasks array:


db.collection.find({
  "tasks.4": {$exists: true}
})

Final notes on MongoDB

After a couple of years of working with and learning about MongoDB, I find it incredibly efficient and enjoyable. The documentation is top-notch, MongoDB university provides excellent courses and managed tools such as MongoDB Atlas are great and they have a nice free tier too!

To get the most out of it, I use:

  • Robo3T, formerly known as Robomongo, for manual data validation and executing scripts. It's rather minimal and thus dead simple to work with.
  • MongoDB Compass for creating and debugging aggregation queries or whenever I need a bit more than what Robo3T offers.

Note: I'm not sponsored or endorsed by MongoDB in any way.

⇤ Return to blog overview