Mongo Aggregation Cursor & Counting
Solution 1:
This possibly deserves a full explanation for those who might search for this, so adding one for posterity.
Specifically what is returned is an Event Stream for node.js which effectively wraps the stream.Readable interface with a couple of convenience methods. A .count()
is not one of them at present and considering the current interface used would not make much sense.
Similar to the result returned from the .stream()
method available to cursor objects, a "count" would not make much sense here when you consider the implementation, as it is meant to process as a "stream" where eventually you are going to reach an "end" but otherwise just want to process until getting there.
If you considered the standard "Cursor" interface from the driver, there are some solid reasons why the aggregation cursor is not the same:
Cursors allow "modifier" actions to be processed prior to execution. These fall into the categories of
.sort()
,.limit()
and.skip()
. All of these actually have counterpart directives in the aggregation framework that are specified in the pipeline. As pipeline stages that could appear "anywhere" and not just as a post-processing option to a simple query, this would not make much sense to offer the same "cursor" processing.Other cursor modifiers include specials like
.hint()
,.min()
and.max()
which are alterations to "index selection" and processing. Whilst these could be of use to the aggregation pipeline, there is currently no simple way to include these in query selection. Mostly the logic from the previous point overrides any point of using the same type of interface for a "Cursor".
The other considerations are what you actually want to do with a cursor and why you "want" one returned. Since a cursor is usually a "one way trip" in the sense that they are usually only processed until an end is reached and in usable "batches", then it makes a reasonable conclusion that the "count" just actually comes at the end, when in fact that "queue" is finally depleted.
While it is true that in fact the standard "cursor" implementation holds some tricks, the main reason is that this just extends a "meta" data concept as the query profiling engine must "scan" a certain number of document in order to determine which items to return in the result.
The aggregation framework plays with this concept a little though. Since not only are there the same results as would be processed through the standard query profiler, but also there are additional stages. Any of these stages has the potential to "modify" the resulting "count" that would actually be returned in the "stream" to be processed.
Again, if you want to look at this from an academic point of view and say that "Sure, the query engine should keep the 'meta data' for the count, but can we not track what is modified after?". This would be a fair argument, and pipeline operators such as $match
and $group
or $unwind
and possibly even including $project
and the new $redact
, all could be considered a reasonable case for keeping their own track of the "documents processed" in each pipeline stage and update that in the "meta data" that could possibly be returned to explain the full pipeline result count.
The last argument is reasonable, but consider also that at the present time the implementation of a "Cursor" concept for the aggregation pipeline results is a new concept for MongoDB. It could be fairly argued that all "reasonable" expectations at the first design point would have been that "most" results from combining documents would not be of a size that was restrictive to the BSON limitations. But as usage expands then perceptions are altered and things change to adapt.
So this "could" possibly be changed, but it is not how it is "currently" implemented. While .count()
on a standard cursor implementation has access to the "meta data" where the scanned number is recorded, any method on the current implementation would result in retrieving all of the cursor results, just as .itcount()
does in the shell.
Process the "cursor" items by counting on the "data" event and emitting something ( possibly a JSON stream generator ) as the "count" at the end. For any use case that would require a count "up-front" it would not seem like a valid use for a cursor anyway, as surely the output would be a whole document of a reasonable size.
Post a Comment for "Mongo Aggregation Cursor & Counting"