MongoDB has the join-like $lookup aggregation operator in versions >= 3.2. Mongoose has a more powerful alternative called populate()
, which lets you reference documents in other collections.
Population is the process of automatically replacing the specified paths in the document with document(s) from other collection(s). We may populate a single document, multiple documents, plain object, multiple plain objects, or all objects returned from a query. Let's look at some examples.
const mongoose = require('mongoose');
const Schema = mongoose.Schema;
const personSchema = Schema({
_id: Schema.Types.ObjectId,
name: String,
age: Number,
stories: [{ type: Schema.Types.ObjectId, ref: 'Story' }]
});
const storySchema = Schema({
author: { type: Schema.Types.ObjectId, ref: 'Person' },
title: String,
fans: [{ type: Schema.Types.ObjectId, ref: 'Person' }]
});
const Story = mongoose.model('Story', storySchema);
const Person = mongoose.model('Person', personSchema);
So far we've created two Models. Our Person
model has its stories
field set to an array of ObjectId
s. The ref
option is what tells Mongoose which model to use during population, in our case the Story
model. All _id
s we store here must be document _id
s from the Story
model.
Note: ObjectId
, Number
, String
, and Buffer
are valid for use as refs. However, you should use ObjectId
unless you are an advanced user and have a good reason for doing so.
Saving refs to other documents works the same way you normally save properties, just assign the _id
value:
const author = new Person({
_id: new mongoose.Types.ObjectId(),
name: 'Ian Fleming',
age: 50
});
author.save(function (err) {
if (err) return handleError(err);
const story1 = new Story({
title: 'Casino Royale',
author: author._id // assign the _id from the person
});
story1.save(function (err) {
if (err) return handleError(err);
// thats it!
});
});
So far we haven't done anything much different. We've merely created a Person
and a Story
. Now let's take a look at populating our story's author
using the query builder:
Story.
findOne({ title: 'Casino Royale' }).
populate('author').
exec(function (err, story) {
if (err) return handleError(err);
console.log('The author is %s', story.author.name);
// prints "The author is Ian Fleming"
});
Populated paths are no longer set to their original _id
, their value is replaced with the mongoose document returned from the database by performing a separate query before returning the results.
Arrays of refs work the same way. Just call the populate method on the query and an array of documents will be returned in place of the original _id
s.
You can manually populate a property by setting it to a document. The document must be an instance of the model your ref
property refers to.
Story.findOne({ title: 'Casino Royale' }, function(error, story) {
if (error) {
return handleError(error);
}
story.author = author;
console.log(story.author.name); // prints "Ian Fleming"
});
Mongoose populate doesn't behave like conventional SQL joins. When there's no document, story.author
will be null
. This is analogous to a left join in SQL.
await Person.deleteMany({ name: 'Ian Fleming' });
const story = await Story.findOne({ title: 'Casino Royale' }).populate('author');
story.author; // `null`
If you have an array of authors
in your storySchema
, populate()
will give you an empty array instead.
const storySchema = Schema({
authors: [{ type: Schema.Types.ObjectId, ref: 'Person' }],
title: String
});
// Later
const story = await Story.findOne({ title: 'Casino Royale' }).populate('authors');
story.authors; // `[]`
What if we only want a few specific fields returned for the populated documents? This can be accomplished by passing the usual field name syntax as the second argument to the populate method:
Story.
findOne({ title: /casino royale/i }).
populate('author', 'name'). // only return the Persons name
exec(function (err, story) {
if (err) return handleError(err);
console.log('The author is %s', story.author.name);
// prints "The author is Ian Fleming"
console.log('The authors age is %s', story.author.age);
// prints "The authors age is null'
});
What if we wanted to populate multiple paths at the same time?
Story.
find(...).
populate('fans').
populate('author').
exec();
If you call populate()
multiple times with the same path, only the last one will take effect.
// The 2nd `populate()` call below overwrites the first because they
// both populate 'fans'.
Story.
find().
populate({ path: 'fans', select: 'name' }).
populate({ path: 'fans', select: 'email' });
// The above is equivalent to:
Story.find().populate({ path: 'fans', select: 'email' });
What if we wanted to populate our fans array based on their age and select just their names?
Story.
find(...).
populate({
path: 'fans',
match: { age: { $gte: 21 } },
// Explicitly exclude `_id`, see http://bit.ly/2aEfTdB
select: 'name -_id'
}).
exec();
Populate does support a limit
option, however, it currently does not limit on a per-document basis. For example, suppose you have 2 stories:
Story.create([
{ title: 'Casino Royale', fans: [1, 2, 3, 4, 5, 6, 7, 8] },
{ title: 'Live and Let Die', fans: [9, 10] }
]);
If you were to populate()
using the limit
option, you would find that the 2nd story has 0 fans:
const stories = Story.find().sort({ name: 1 }).populate({
path: 'fans',
options: { limit: 2 }
});
stories[0].fans.length; // 2
stories[1].fans.length; // 0
That's because, in order to avoid executing a separate query for each document, Mongoose instead queries for fans using numDocuments * limit
as the limit. As a workaround, you should populate each document individually:
const stories = await Story.find().sort({ name: 1 });
for (const story of stories) {
await story.
populate({ path: 'fans', options: { limit: 2 } }).
execPopulate();
}
stories[0].fans.length; // 2
stories[1].fans.length; // 2
We may find however, if we use the author
object, we are unable to get a list of the stories. This is because no story
objects were ever 'pushed' onto author.stories
.
There are two perspectives here. First, you may want the author
know which stories are theirs. Usually, your schema should resolve one-to-many relationships by having a parent pointer in the 'many' side. But, if you have a good reason to want an array of child pointers, you can push()
documents onto the array as shown below.
author.stories.push(story1); author.save(callback);
This allows us to perform a find
and populate
combo:
Person.
findOne({ name: 'Ian Fleming' }).
populate('stories'). // only works if we pushed refs to children
exec(function (err, person) {
if (err) return handleError(err);
console.log(person);
});
It is debatable that we really want two sets of pointers as they may get out of sync. Instead we could skip populating and directly find()
the stories we are interested in.
Story.
find({ author: author._id }).
exec(function (err, stories) {
if (err) return handleError(err);
console.log('The stories are an array: ', stories);
});
The documents returned from query population become fully functional, remove
able, save
able documents unless the lean option is specified. Do not confuse them with sub docs. Take caution when calling its remove method because you'll be removing it from the database, not just the array.
If we have an existing mongoose document and want to populate some of its paths, mongoose >= 3.6 supports the document#populate() method.
If we have one or many mongoose documents or even plain objects (like mapReduce output), we may populate them using the Model.populate() method available in mongoose >= 3.6. This is what document#populate()
and query#populate()
use to populate documents.
Say you have a user schema which keeps track of the user's friends.
var userSchema = new Schema({
name: String,
friends: [{ type: ObjectId, ref: 'User' }]
});
Populate lets you get a list of a user's friends, but what if you also wanted a user's friends of friends? Specify the populate
option to tell mongoose to populate the friends
array of all the user's friends:
User.
findOne({ name: 'Val' }).
populate({
path: 'friends',
// Get friends of friends - populate the 'friends' array for every friend
populate: { path: 'friends' }
});
Let's say you have a schema representing events, and a schema representing conversations. Each event has a corresponding conversation thread.
var eventSchema = new Schema({
name: String,
// The id of the corresponding conversation
// Notice there's no ref here!
conversation: ObjectId
});
var conversationSchema = new Schema({
numMessages: Number
});
Also, suppose that events and conversations are stored in separate MongoDB instances.
var db1 = mongoose.createConnection('localhost:27000/db1');
var db2 = mongoose.createConnection('localhost:27001/db2');
var Event = db1.model('Event', eventSchema);
var Conversation = db2.model('Conversation', conversationSchema);
In this situation, you will not be able to populate()
normally. The conversation
field will always be null, because populate()
doesn't know which model to use. However, you can specify the model explicitly.
Event.
find().
populate({ path: 'conversation', model: Conversation }).
exec(function(error, docs) { /* ... */ });
This is known as a "cross-database populate," because it enables you to populate across MongoDB databases and even across MongoDB instances.
Mongoose can also populate from multiple collections based on the value of a property in the document. Let's say you're building a schema for storing comments. A user may comment on either a blog post or a product.
const commentSchema = new Schema({
body: { type: String, required: true },
on: {
type: Schema.Types.ObjectId,
required: true,
// Instead of a hardcoded model name in `ref`, `refPath` means Mongoose
// will look at the `onModel` property to find the right model.
refPath: 'onModel'
},
onModel: {
type: String,
required: true,
enum: ['BlogPost', 'Product']
}
});
const Product = mongoose.model('Product', new Schema({ name: String }));
const BlogPost = mongoose.model('BlogPost', new Schema({ title: String }));
const Comment = mongoose.model('Comment', commentSchema);
The refPath
option is a more sophisticated alternative to ref
. If ref
is just a string, Mongoose will always query the same model to find the populated subdocs. With refPath
, you can configure what model Mongoose uses for each document.
const book = await Product.create({ name: 'The Count of Monte Cristo' });
const post = await BlogPost.create({ title: 'Top 10 French Novels' });
const commentOnBook = await Comment.create({
body: 'Great read',
on: book._id,
onModel: 'Product'
});
const commentOnPost = await Comment.create({
body: 'Very informative',
on: post._id,
onModel: 'BlogPost'
});
// The below `populate()` works even though one comment references the
// 'Product' collection and the other references the 'BlogPost' collection.
const comments = await Comment.find().populate('on').sort({ body: 1 });
comments[0].on.name; // "The Count of Monte Cristo"
comments[1].on.title; // "Top 10 French Novels"
An alternative approach is to define separate blogPost
and product
properties on commentSchema
, and then populate()
on both properties.
const commentSchema = new Schema({
body: { type: String, required: true },
product: {
type: Schema.Types.ObjectId,
required: true,
ref: 'Product'
},
blogPost: {
type: Schema.Types.ObjectId,
required: true,
ref: 'BlogPost'
}
});
// ...
// The below `populate()` is equivalent to the `refPath` approach, you
// just need to make sure you `populate()` both `product` and `blogPost`.
const comments = await Comment.find().
populate('product').
populate('blogPost').
sort({ body: 1 });
comments[0].product.name; // "The Count of Monte Cristo"
comments[1].blogPost.title; // "Top 10 French Novels"
Defining separate blogPost
and product
properties works for this simple example. But, if you decide to allow users to also comment on articles or other comments, you'll need to add more properties to your schema. You'll also need an extra populate()
call for every property, unless you use mongoose-autopopulate. Using refPath
means you only need 2 schema paths and one populate()
call regardless of how many models your commentSchema
can point to.
So far you've only populated based on the _id
field. However, that's sometimes not the right choice. In particular, arrays that grow without bound are a MongoDB anti-pattern. Using mongoose virtuals, you can define more sophisticated relationships between documents.
const PersonSchema = new Schema({
name: String,
band: String
});
const BandSchema = new Schema({
name: String
});
BandSchema.virtual('members', {
ref: 'Person', // The model to use
localField: 'name', // Find people where `localField`
foreignField: 'band', // is equal to `foreignField`
// If `justOne` is true, 'members' will be a single doc as opposed to
// an array. `justOne` is false by default.
justOne: false,
options: { sort: { name: -1 }, limit: 5 } // Query options, see http://bit.ly/mongoose-query-options
});
const Person = mongoose.model('Person', PersonSchema);
const Band = mongoose.model('Band', BandSchema);
/**
* Suppose you have 2 bands: "Guns N' Roses" and "Motley Crue"
* And 4 people: "Axl Rose" and "Slash" with "Guns N' Roses", and
* "Vince Neil" and "Nikki Sixx" with "Motley Crue"
*/
Band.find({}).populate('members').exec(function(error, bands) {
/* `bands.members` is now an array of instances of `Person` */
});
Keep in mind that virtuals are not included in toJSON()
output by default. If you want populate virtuals to show up when using functions that rely on JSON.stringify()
, like Express' res.json()
function, set the virtuals: true
option on your schema's toJSON
options.
// Set `virtuals: true` so `res.json()` works
const BandSchema = new Schema({
name: String
}, { toJSON: { virtuals: true } });
If you're using populate projections, make sure foreignField
is included in the projection.
Band.
find({}).
populate({ path: 'members', select: 'name' }).
exec(function(error, bands) {
// Won't work, foreign field `band` is not selected in the projection
});
Band.
find({}).
populate({ path: 'members', select: 'name band' }).
exec(function(error, bands) {
// Works, foreign field `band` is selected
});
Populate virtuals also support counting the number of documents with matching foreignField
as opposed to the documents themselves. Set the count
option on your virtual:
const PersonSchema = new Schema({
name: String,
band: String
});
const BandSchema = new Schema({
name: String
});
BandSchema.virtual('numMembers', {
ref: 'Person', // The model to use
localField: 'name', // Find people where `localField`
foreignField: 'band', // is equal to `foreignField`
count: true // And only get the number of docs
});
// Later
const doc = await Band.findOne({ name: 'Motley Crue' }).
populate('numMembers');
doc.numMembers; // 2
You can populate in either pre or post hooks. If you want to always populate a certain field, check out the mongoose-autopopulate plugin.
// Always attach `populate()` to `find()` calls
MySchema.pre('find', function() {
this.populate('user');
});
// Always `populate()` after `find()` calls. Useful if you want to selectively populate
// based on the docs found.
MySchema.post('find', async function(docs) {
for (let doc of docs) {
if (doc.isPublic) {
await doc.populate('user').execPopulate();
}
}
});
// `populate()` after saving. Useful for sending populated data back to the client in an
// update API endpoint
MySchema.post('save', function(doc, next) {
doc.populate('user').execPopulate().then(function() {
next();
});
});
Now that we've covered populate()
, let's take a look at discriminators.
© 2010 LearnBoost
Licensed under the MIT License.
https://mongoosejs.com/docs/populate.html