Franck Pachot

Posted on Feb 14

Cross join in MongoDB

#database #javascript #mongodb #tutorial

Relational database joins are, conceptually, a cartesian product followed by a filter (the join condition). Without that condition, you get a cross join that returns every possible combination. In MongoDB, you can model the same behavior at read time using $lookup, or at write time by embedding documents.

Example

Define two collections: one for clothing sizes and one for gender-specific fits:

db.sizes.insertMany([  
  { code: "XS", neckCm: { min: 31, max: 33 } },  
  { code: "S",  neckCm: { min: 34, max: 36 } },  
  { code: "M",  neckCm: { min: 37, max: 39 } },  
  { code: "L",  neckCm: { min: 40, max: 42 } },  
  { code: "XL", neckCm: { min: 43, max: 46 } }  
]);

db.fits.insertMany([
  {
    code: "MEN",
    description: "Straight cut, broader shoulders, narrower hips"
  },
  {
    code: "WOMEN",
    description: "Tapered waist, narrower shoulders, wider hips"
  }
]);

Each collection stores independent characteristics, and every size applies to every fit. The goal is to generate all valid product variants.

Cross join on read: $lookup + $unwind

In order to add all sizes to each body shape, use a $lookup without filter condition and, as it adds them as an embedded array, use $unwind to get one document per combination:

db.sizes.aggregate([
  {
    $lookup: {
      from: "fits",
      pipeline: [],
      as: "fit"
    }
  },
  { $unwind: "$fit" },
  { $sort: { "fit.code": 1, code: 1 } },
  {
    $project: {
      _id: 0,
      code: { $concat: ["$fit.code", "-", "$code"] }
    }
  }
]);

Here is the result:

Application-side

For such small static reference collections, the application may simply read both and join with loops:

const sizes = db.sizes.find({}, { code: 1, _id: 0 }).sort({ code: 1 }).toArray();
const fits  = db.fits.find({},  { code: 1, _id: 0 }).sort({ code: 1 }).toArray();

for (const fit of fits) {
  for (const size of sizes) {
    print(`${fit.code}-${size.code}`);
  }
}

While it's good to keep the reference in a database, such static data can stay in cache in the application.

Cross join on write: embed the many-to-many

Because sizes are inherently tied to body shapes (no size exists without a body shape), embedding them in the fits documents is often a better model:

db.fits.aggregate([
  {
    $lookup: {
      from: "sizes",
      pipeline: [
        { $project: { _id: 0, code: 1, neckCm:1 } },
        { $sort: { code: 1 } }
      ],
      as: "sizes"
    }
  },
  {
    $merge: {
      into: "fits",
      on: "_id",
      whenMatched: "merge",
      whenNotMatched: "discard"
    }
  }
]);

Here is the new shape of the single collection:

Once embedded, the query becomes straightforward, simply unwind the embedded array:

db.fits.aggregate([
  { $unwind: "$sizes" },
  {
    $project: {
      _id: 0,
      code: {
        $concat: ["$code", "-", "$sizes.code"]
      }
    }
  }
]);

You may embed only the fields required, like the size code, or all fields like I did here with the neck size, and then remove the size collection.

Although this may duplicate the values for each body shape, it only requires using updateMany() instead of updateOne() when updating it. For example, the following updates one size (the array item i identified by arrayFilters and updated with $set) in the whole collection (the empty {} as a document filter):

db.fits.updateMany(
  {},  
  { $set: { "sizes.$[i].neckCm": { min: 38, max: 40 } } }, 
  { arrayFilters: [ { "i.code": "M" }  ] }  
);

Duplication has the advantage of returning all required information in a single read, without joins or multiple queries, and it is not problematic for updates since it can be handled with a single bulk update operation. Unlike relational databases—where data can be modified through ad‑hoc SQL and business rules must therefore be enforced at the database level—MongoDB applications are typically domain‑driven, with clear ownership of data and a single responsibility for performing updates.

In that context, consistency is maintained by the application's service rather than by cross‑table constraints. This approach also lets business rules evolve, such as defining different sizes for men and women, without changing the data model.

Conclusion

In a fully normalized relational model, all relationships use the same pattern: a one-to-many relationship between two tables, enforced by a primary (or unique) key on one side and a foreign key on the other. This holds regardless of cardinality (many can be three or one million), lifecycle rules (cascade deletes or updates), ownership (shared or exclusive parent), navigation direction (and access patterns). Even many-to-many relationships are just two one-to-many relationships via a junction table.

MongoDB exposes these same concepts as modeling choices—handled at read time with $lookup, at write time through embedding, or in the application—instead of enforcing a single normalized representation. The choice depends on the domain data and access patterns.

DEV Community