mongodb源码分析(七)查询3之mongod的cursor的产生(续)
上一篇文章我们说道了mongod对于QueryPlan的选取,由于篇幅过长,所以另起一篇文章接上一篇文章
继续谈plan的实际查询流程.
上一篇文章说完了QueryPlanSet::make我们回到:MultiPlanScanner::init.
- // if _or == false, don't use or clauses for index selection
- if ( !_or ) {
- ++_i;//若query为{$or:[{a:1},{:1}]}这种以or开头的语句,那么frsp是无用的,具体见FieldRangeSet::handleMatchField
- auto_ptr<FieldRangeSetPair> frsp( new FieldRangeSetPair( _ns.c_str(), _query, true ) );
- updateCurrentQps( QueryPlanSet::make( _ns.c_str(), frsp, auto_ptr<FieldRangeSetPair>(),
- _query, order, _parsedQuery, _hint,
- _recordedPlanPolicy,
- min, max, true ) );
- }
- else {//继续来看看这里的handleBeginningOfClause函数
- BSONElement e = _query.getField( "$or" );
- massert( 13268, "invalid $or spec",
- e.type() == Array && e.embeddedObject().nFields() > 0 );
- handleBeginningOfClause();
- }
- void MultiPlanScanner::handleBeginningOfClause() {
- assertHasMoreClauses();
- ++_i;//这里使用$or的第一个语句执行QueryPlanSet::make得到其中的plan,过程和一般查询一样
- auto_ptr<FieldRangeSetPair> frsp( _org->topFrsp() );//这里的第一个语句如:{$or:[{x:1},{y:1}]}这里的{x:1},第一个QueryPlanSet执行完后会继续产生一个新的QueryPlanSet继续执行{y:1}的查询动作,后面会说到.
- auto_ptr<FieldRangeSetPair> originalFrsp( _org->topFrspOriginal() );
- updateCurrentQps( QueryPlanSet::make( _ns.c_str(), frsp, originalFrsp, _query,
- BSONObj(), _parsedQuery, _hint, _recordedPlanPolicy,
- BSONObj(), BSONObj(),
- // 'Special' plans are not supported within $or.
- false ) );
- }
- shared_ptr<Cursor> CursorGenerator::generate() {
- setArgumentsHint();
- shared_ptr<Cursor> cursor = shortcutCursor();
- if ( cursor ) {
- return cursor;
- }
- setMultiPlanScanner();
- cursor = singlePlanCursor();//这里产生实际的cursor,只有单个plan时才会产生
- if ( cursor ) {
- return cursor;
- }
- return newQueryOptimizerCursor( _mps, _planPolicy, isOrderRequired(), explain() );
- }
- shared_ptr<Cursor> CursorGenerator::singlePlanCursor() {
- const QueryPlan *singlePlan = _mps->singlePlan();//确实是单plan并且$or语句不起作用,且没有额外的plan这个额外的plan参加QueryPlanSet::hasPossiblyExcludedPlans函数,那么就会返回plan
- if ( !singlePlan || ( isOrderRequired() && singlePlan->scanAndOrderRequired() ) ) {
- return shared_ptr<Cursor>();
- }//之前传入的_planPolicy为any支持任意的plan
- if ( !_planPolicy.permitPlan( *singlePlan ) ) {
- return shared_ptr<Cursor>();
- }
- if ( _singlePlanSummary ) {
- *_singlePlanSummary = singlePlan->summary();
- }//根据QueryPlan产生相应的Cursor
- shared_ptr<Cursor> single = singlePlan->newCursor();
- if ( !_query.isEmpty() && !single->matcher() ) {//建立相应的查询
- single->setMatcher( singlePlan->matcher() );
- }
- if ( singlePlan->keyFieldsOnly() ) {
- single->setKeyFieldsOnly( singlePlan->keyFieldsOnly() );
- }
- if ( _simpleEqualityMatch ) {
- if ( singlePlan->exactKeyMatch() && !single->matcher()->needRecord() ) {
- *_simpleEqualityMatch = true;
- }
- }
- return single;
- }
- shared_ptr<Cursor> QueryPlan::newCursor( const DiskLoc &startLoc ) const {
- if ( _type ) {//index plugin
- // hopefully safe to use original query in these contexts - don't think we can mix type with $or clause separation yet
- int numWanted = 0;//类似的空间地理索引
- if ( _parsedQuery ) {
- // SERVER-5390
- numWanted = _parsedQuery->getSkip() + _parsedQuery->getNumToReturn();
- }
- return _type->newCursor( _originalQuery , _order , numWanted );
- }
- if ( _utility == Impossible ) {//之前分析到如果产生了Impossible的plan那么将产生一个不带任何数据的游标,其就是在这里产生的
- // Dummy table scan cursor returning no results. Allowed in --notablescan mode.
- return shared_ptr<Cursor>( new BasicCursor( DiskLoc() ) );
- }
- if ( willScanTable() ) {//要扫描全表的情况,根据_order情况建立BasicCursor或者ReverseCursor.
- checkTableScanAllowed();
- return findTableScan( _frs.ns(), _order, startLoc );
- }
- massert( 10363 , "newCursor() with start location not implemented for indexed plans", startLoc.isNull() );
- if ( _startOrEndSpec ) {//建立Btree索引树并传入相应的范围条件
- // we are sure to spec _endKeyInclusive
- return shared_ptr<Cursor>( BtreeCursor::make( _d, _idxNo, *_index, _startKey, _endKey, _endKeyInclusive, _direction >= 0 ? 1 : -1 ) );
- }
- else if ( _index->getSpec().getType() ) {
- return shared_ptr<Cursor>( BtreeCursor::make( _d, _idxNo, *_index, _frv->startKey(), _frv->endKey(), true, _direction >= 0 ? 1 : -1 ) );
- }
- else {
- return shared_ptr<Cursor>( BtreeCursor::make( _d, _idxNo, *_index, _frv,
- independentRangesSingleIntervalLimit(),
- _direction >= 0 ? 1 : -1 ) );
- }
- }
cursor时就将去到newQueryOptimizerCursor函数,在我们继续来分析这个函数之前先介绍下这个函数的流程.
这个函数是处理多plan状况或者说$or操作符有效的情况.
一,当$or为false时,对于每一个plan那么建立一个QueryOptimizerCursorOp,这个结构内部保存了这个plan,在
初始化QueryOptimizerCursorOp时会根据其内部保存的plan产生cursor.QueryOptimizerCursorImpl的操作
对象就是这里的QueryOptimizerCursorOp,其内部每一次循环时都选择一个当前扫描对象最少的
QueryOptimizerCursorOp,选取它作为使用的QueryOptimizerCursorOp,当操作QueryOptimizerCursorOp
时,其内部做的动作就是操作cursor,然后做匹配,因为每次每一个QueryOptimizerCursorOp扫描的对象其实是
一样多的,这就相当于多plan的轮流执行,当一个QueryOptimizerCursorOp操作完成了也就是cursor不能再返
回document了那么这次查询也就结束了,第一个结束的plan表明这个plan是最优的,因为其查询的对象最少.
二.当$or语句有效时,举例db.coll.find({$or:[{a:1,c:1},{b:1,d:1}]}),这里之前也分析过首先执行查询的条件是
{a:1,c:1},根据这里的条件产生plan,然后照第一步的过程执行,当结束时继续对这里的{b:1,d:1}执行第一步描述
的流程.
下面来看代码吧.
- shared_ptr<Cursor> newQueryOptimizerCursor( auto_ptr<MultiPlanScanner> mps,
- const QueryPlanSelectionPolicy &planPolicy,
- bool requireOrder, bool explain ) {
- shared_ptr<QueryOptimizerCursorImpl> ret
- ( QueryOptimizerCursorImpl::make( mps, planPolicy, requireOrder, explain ) );
- return ret;
- }
- QueryOptimizerCursorImpl( auto_ptr<MultiPlanScanner> &mps,
- const QueryPlanSelectionPolicy &planPolicy,
- bool requireOrder ) :
- _requireOrder( requireOrder ),
- _mps( mps ),
- _initialCandidatePlans( _mps->possibleInOrderPlan(), _mps->possibleOutOfOrderPlan() ),
- _originalOp( new QueryOptimizerCursorOp( _nscanned, planPolicy, _requireOrder,
- !_initialCandidatePlans.hybridPlanSet() ) ),
- _currOp(),
- _completePlanOfHybridSetScanAndOrderRequired(),
- _nscanned() {
- }
- void init( bool explain ) {
- _mps->initialOp( _originalOp );
- if ( explain ) {
- _explainQueryInfo = _mps->generateExplainInfo();
- }
- shared_ptr<QueryOp> op = _mps->nextOp();
- rethrowOnError( op );
- if ( !op->complete() ) {
- _currOp = dynamic_cast<QueryOptimizerCursorOp*>( op.get() );
- }
- }
- shared_ptr<QueryOp> MultiPlanScanner::nextOp() {
- verify( !doneOps() );//查询时存在$or这种语言并且其是有意义的,具体什么时候有意义请看上一篇文章
- shared_ptr<QueryOp> ret = _or ? nextOpOr() : nextOpSimple();
- if ( ret->error() || ret->complete() ) {
- _doneOps = true;
- }
- return ret;
- }
- shared_ptr<QueryOp> MultiPlanScanner::nextOpSimple() {
- return iterateRunner( *_baseOp );
- }
- shared_ptr<QueryOp> MultiPlanScanner::nextOpOr() {
- shared_ptr<QueryOp> op;
- do {
- op = nextOpSimple();
- if ( !op->completeWithoutStop() ) {
- return op;
- }//记得$or语句中比如说{$or:[{a:1},{b:1}]},之前_or语句只传入了一部分{a:1},这里就是{a:1}这个查询条件结束后继续开始{b:1}的查询条件,同样的过程,创建QueryPlanSet继续执行.
- handleEndOfClause( op->queryPlan() );
- _baseOp = op;
- } while( mayHandleBeginningOfClause() );
- return op;
- }
- shared_ptr<QueryOp> MultiPlanScanner::iterateRunner( QueryOp &originalOp, bool retried ) {
- if ( _runner ) {//第一次还未初始化为null
- return _runner->next();
- }
- _runner.reset( new QueryPlanSet::Runner( *_currentQps, originalOp ) );//简单创建一个对象
- shared_ptr<QueryOp> op = _runner->next();
- if ( op->error() &&
- _currentQps->prepareToRetryQuery() ) {
- // Avoid an infinite loop here - this should never occur.
- verify( !retried );
- _runner.reset();
- return iterateRunner( originalOp, true );
- }
- return op;
- }
- shared_ptr<QueryOp> QueryPlanSet::Runner::next() {
- if ( _ops.empty() ) {//首先还没有开始初始化.
- shared_ptr<QueryOp> initialRet = init();
- if ( initialRet ) {//初始化遇到错误或者plan中已经有执行完毕的plan
- _done = true;
- return initialRet;
- }
- }
- shared_ptr<QueryOp> ret;
- do {
- ret = _next();//选取一个QueryOp这里一个QueryOp其实对应于一个plan.
- } while( ret->error() && !_queue.empty() );
- if ( _queue.empty() ) {
- _done = true;
- }
- return ret;
- }
nextOpSimple->iterateRunner->QueryPlanSet::Runner::next->QueryPlanSet::Runner::init
- shared_ptr<QueryOp> QueryPlanSet::Runner::init() {
- if ( _plans._plans.size() > 1 )//之前建立的多plan查询
- log(1) << " running multiple plans" << endl;
- for( PlanSet::iterator i = _plans._plans.begin(); i != _plans._plans.end(); ++i ) {
- shared_ptr<QueryOp> op( _op.createChild() );
- op->setQueryPlan( i->get() );//一个plan建立一个QueryOptimizerCursorOp
- _ops.push_back( op );
- }
- // Initialize ops.
- for( vector<shared_ptr<QueryOp> >::iterator i = _ops.begin(); i != _ops.end(); ++i ) {
- initOp( **i );//这里不再输入,说明下这是调用QueryOptimizerCursorOp的init函数,根据上面的plan创建cursor,建立matcher这里的matcher是下一篇文章分析的内容
- if ( _explainClauseInfo ) {
- _explainClauseInfo->addPlanInfo( (*i)->generateExplainInfo() );
- }
- }
- // See if an op has completed.
- for( vector<shared_ptr<QueryOp> >::iterator i = _ops.begin(); i != _ops.end(); ++i ) {
- if ( (*i)->complete() ) {//如果一个plan完成则说明任务完成
- return *i;
- }
- }
- // Put runnable ops in the priority queue.
- for( vector<shared_ptr<QueryOp> >::iterator i = _ops.begin(); i != _ops.end(); ++i ) {
- if ( !(*i)->error() ) {//建立一个Queryop的队列,需要注意的是这里的_queue的定义,our_priority_queue<OpHolder> _queue,其是一个优先级队列,每次选取执行扫描数最少的QueryOp
- _queue.push( *i );
- }
- }
- if ( _queue.empty() ) {
- return _ops.front();
- }
- return shared_ptr<QueryOp>();
- }
- shared_ptr<QueryOp> QueryPlanSet::Runner::_next() {
- OpHolder holder = _queue.pop();//这里优先级队列选取已扫描最少对象的plan,一样则选取
- QueryOp &op = *holder._op;//第一个plan
- nextOp( op );//QueryOptimizerCursorOp::nextOp的调用,目的是记录当前扫描的对象数,可能的游标前进动作,检查游标是否已经执行完毕
- if ( op.complete() ) {//几个plan中最早结束的plan,将其cache,以后查找时直接找到这个plan,节约查找时间
- if ( _plans._mayRecordPlan && op.mayRecordPlan() ) {
- op.queryPlan().registerSelf( op.nscanned(), _plans.characterizeCandidatePlans() );
- }
- _done = true;
- return holder._op;
- }
- if ( _plans.hasPossiblyExcludedPlans() &&
- op.nscanned() > _plans._oldNScanned * 10 ) {
- verify( _plans.nPlans() == 1 && _plans.firstPlan()->special().empty() );
- holder._offset = -op.nscanned();
- _plans.addFallbackPlans();
- PlanSet::iterator i = _plans._plans.begin();
- ++i;
- for( ; i != _plans._plans.end(); ++i ) {
- shared_ptr<QueryOp> op( _op.createChild() );
- op->setQueryPlan( i->get() );
- _ops.push_back( op );
- initOp( *op );
- if ( op->complete() )
- return op;
- _queue.push( op );
- }
- _plans._usingCachedPlan = false;
- }//将这个QueryOp插入到优先级队列末尾,最后构成了多plan的轮流执行.
- _queue.push( holder );
- return holder._op;
- }
- void init( bool explain ) {
- shared_ptr<QueryOp> op = _mps->nextOp();//这里就是之前返回的op
- if ( !op->complete() ) {//没有执行完毕则将其设置为当前的QueryOptimizerCursorOp
- _currOp = dynamic_cast<QueryOptimizerCursorOp*>( op.get() );
- }
- }
- virtual bool ok() { return _takeover ? _takeover->ok() : !currLoc().isNull(); }//这里_takeover还没设置为null
- virtual DiskLoc currLoc() { return _takeover ? _takeover->currLoc() : _currLoc(); }
- DiskLoc _currLoc() const {//这里就是调用之前得到的当前操作的QueryOptimizerCursorOp
- return _currOp ? _currOp->currLoc() : DiskLoc();
- }//这里_c是实际的cursor,调用其currLoc查看当前是否还有数据
- DiskLoc QueryOptimizerCursorOp::currLoc() const { return _c ? _c->currLoc() : DiskLoc(); }
- virtual bool advance() {return _advance( false );}
- advance( bool force ) {
- if ( _takeover ) {//目前还为空
- return _takeover->advance();
- }
- if ( !force && !ok() ) {
- return false;
- }
- _currOp = 0;//上面已经分析了这个函数,这里是循环选取下一个QueryOp
- shared_ptr<QueryOp> op = _mps->nextOp();
- // Avoiding dynamic_cast here for performance. Soon we won't need to
- // do a cast at all.
- QueryOptimizerCursorOp *qocop = (QueryOptimizerCursorOp*)( op.get() );
- if ( !op->complete() ) {//这里查询数据还未结束
- // The 'qocop' will be valid until we call _mps->nextOp() again. We return 'current' values from this op.
- _currOp = qocop;//改变了当前的QueryOp
- }//一次查询最多会达到101条满足查询要求的docment,超过这个数目QueryOp就会设置为complete,但是并不是说数据查询完了
- else if ( op->stopRequested() ) {//cursor仍然有效,但是已查出的document已经达到了上限101
- if ( qocop->cursor() ) {//一次查询时得到的document数目有限,并不会一次返回所有数据.
- _takeover.reset( new MultiCursor( _mps,
- qocop->cursor(),
- op->queryPlan().matcher(),
- qocop->explainInfo(),
- *op,
- _nscanned - qocop->cursor()->nscanned() ) );
- }
- }
- else {
- if ( _initialCandidatePlans.hybridPlanSet() ) {
- _completePlanOfHybridSetScanAndOrderRequired =
- op->queryPlan().scanAndOrderRequired();
- }
- }
- return ok();
- }
中执行流程最复杂的部分了,因为空间有限,部分函数并未继续深入探讨,感兴趣的自己研究吧,也可以提出来大家
一起来讨论.下一篇文章我们将主要介绍docment的匹配.
原文链接: http://blog.csdn.net/yhjj0108/article/details/8260518
作者:yhjj0108,杨浩
mongodb源码分析(八)查询4之mongod文档的匹配
前面用两篇文章讲解了游标的产生流程,下面我们将继续讲解文档的匹配过程.查询的流程很简单
就是取出document然后与条件比对,比对成功,在开启shard时还要继续查看当前document是否在在当
前的chunkManager中,最后将查看当前文档是否已经记录了,若已经记录则跳过,需要排序则这里将满足
要求的文档排序.
当我使用mongodb时我对于查询有几个疑问1. mongodb对于{x:{y:{z:1}}这种深层次的嵌套是怎么
做的匹配工作.2 mongodb的elemMatch具体是怎么匹配的.本文将作出解答.先来看看mongodb官方文
档的三种形式的匹配处理吧.
对于这种{author:{name:jane}}的匹配,mongodb将author当着匹配域,{name:jane}当着匹配的内容,匹配方式
为相等,意思是使用类似memcmp的方式,内容必须一致且排列方式也得一样,否则则视为不匹配.
对于{author.name:"jane"}这种方式则将整个的author.name作为匹配域的名称,匹配的时候会递归查询直到
找到name对象为止,以name的值来匹配jane.
这里只有$elemMatch会匹配想要的结果,因为elemMatch会建立子查询,在子查询中完成elem内部的匹配,最后
达到目的,下文将会讨论到.
下面来看代码吧.
document的匹配是通过CoveredIndexMatcher这个类来实现的,首先看看这个类对象的创建吧.
- shared_ptr<Cursor> CursorGenerator::singlePlanCursor() {
- shared_ptr<Cursor> single = singlePlan->newCursor();
- if ( !_query.isEmpty() && !single->matcher() ) {
- single->setMatcher( singlePlan->matcher() );//创建具体的类对象并设置.
- }
- }
- string queryWithQueryOptimizer() {
- for( ; cursor->ok(); cursor->advance() ) {
- if ( pq.getMaxScan() && cursor->nscanned() > pq.getMaxScan() ) {//超出了请求的document数目则停止查询
- break;
- }//实际的比对过程
- if ( !queryResponseBuilder->addMatch() ) {
- continue;
- }
- }
- }
- bool QueryResponseBuilder::addMatch() {
- MatchDetails details;
- if ( _parsedQuery.getFields() && _parsedQuery.getFields()->getArrayOpType() == Projection::ARRAY_OP_POSITIONAL ) {
- // field projection specified, and contains an array operator
- details.requestElemMatchKey();
- }//我们关心的主角,匹配过程
- if ( !currentMatches( details ) ) {//for query match
- return false;
- }
- if ( !chunkMatches() ) {//chunkMatch,用于在开启了shard的服务器查询
- return false;
- }
- bool orderedMatch = false;
- bool match = _builder->handleMatch( orderedMatch, details );//for field projection
- return match;
- }
- bool QueryResponseBuilder::currentMatches( MatchDetails& details ) {
- if ( _cursor->currentMatches( &details ) ) {//最后调用了cursor的匹配函数,这里上一篇文章分析到了cursor的类别其类别很多,这里挑选一个最复杂的QueryOptimizerCursorImpl作讲解,其它cursor的匹配大同小异,感兴趣的自己去看看吧
- return true;
- }
- }
- virtual bool QueryOptimizerCursorImpl::currentMatches( MatchDetails *details = 0 ) {
- if ( _takeover ) {
- return _takeover->currentMatches( details );
- }//调用了QueryOp的currentMatches函数做匹配,也就是转发了匹配要求而已.
- return _currOp->currentMatches( details );
- }
- bool QueryOptimizerCursorOp::currentMatches( MatchDetails *details ) {
- if ( !_c || !_c->ok() ) {//当前游标没有数据了,肯定不会再匹配什么,返回false
- _matchCounter.setMatch( false );
- return false;
- }//匹配过程再次被转发到了queryPlan的matcher中,这里的matcher就是singlePlanCursor那么讲解的生成的CoveredIndexMatcher对象
- bool match = queryPlan().matcher()->matchesCurrent( _c.get(), details );
- // Cache the match, so we can count it in mayAdvance().
- bool newMatch = _matchCounter.setMatch( match );//记录这次匹配结果.
- return match;
- }
- CoveredIndexMatcher::CoveredIndexMatcher( const BSONObj &jsobj,
- const BSONObj &indexKeyPattern ) :
- _docMatcher( new Matcher( jsobj ) ),//建立整个查询的匹配的docMatcher对象
- _keyMatcher( *_docMatcher, indexKeyPattern ) {//根据docMatcher和索引pattern只建立包含索引域的keyMatcher匹配对象
- init();//这里设置匹配过程中是否需要加载实际的数据做匹配,当查询中带有不是索引的域时就需要加载实际数据
- }
- Matcher::Matcher(const BSONObj &jsobj, bool nested) :
- _where(0), _jsobj(jsobj), _haveSize(), _all(), _hasArray(0), _haveNeg(), _atomic(false) {
- //再一次的遍历查询
- BSONObjIterator i(_jsobj);
- while ( i.more() ) {
- parseMatchExpressionElement( i.next(), nested );
- }
- }
- void Matcher::parseMatchExpressionElement( const BSONElement &e, bool nested ) {
- if ( parseClause( e ) ) {//这个地方将对以$and,$or,$not开始的查询语句分别建立结构为list< shared_ptr< Matcher > >的对象_andMatcher,_orMatcher,_notMatcher
- return;
- }
- const char *fn = e.fieldName();//mongodb支持的javascript语句查询
- if ( str::equals(fn, "$where") ) {//like {$where: this.a>3}
- parseWhere(e);
- return;
- }
- if ( e.type() == RegEx ) {//正则表达式匹配
- addRegex( fn, e.regex(), e.regexFlags() );
- return;
- }
- // greater than / less than...
- // e.g., e == { a : { $gt : 3 } },{x:{$elemMatch:{a:1,b:{$gt:1}}}}
- // or
- // { a : { $in : [1,2,3] } }
- if ( e.type() == Object ) {
- // support {$regex:"a|b", $options:"imx"}
- const char* regex = NULL;
- const char* flags = "";
- // e.g., fe == { $gt : 3 }
- BSONObjIterator j(e.embeddedObject());
- bool isOperator = false;
- while ( j.more() ) {//下面遍历对象建立一个ElementMatcher对象,其包括了三个参数,匹配的对象,匹配的操作符,是否是非匹配
- BSONElement fe = j.next();//举例来说就是这里的addBasic(elem,op,isnot)这里的elem比如说为{x:"foo"}
- const char *fn = fe.fieldName();
- if ( fn[0] == '$' && fn[1] ) {
- isOperator = true;
- if ( fn[1] == 'n' && fn[2] == 'o' && fn[3] == 't' && fn[4] == 0 ) {
- _haveNeg = true;
- switch( fe.type() ) {
- case Object: {
- BSONObjIterator k( fe.embeddedObject() );
- uassert( 13030, "$not cannot be empty", k.more() );
- while( k.more() ) {
- addOp( e, k.next(), true, regex, flags );
- }
- break;
- }
- case RegEx:
- addRegex( e.fieldName(), fe.regex(), fe.regexFlags(), true );
- break;
- default:
- uassert( 13031, "invalid use of $not", false );
- }
- }
- else {//对于类似这种{a:{$all:[10,20,30]}}这种操作的匹配动作,方式同样是建立ElementMatcher,这里就不再贴出其代码分析了,感兴趣的自己阅读吧
- if ( !addOp( e, fe, false, regex, flags ) ) {
- isOperator = false;
- break;
- }
- }
- }
- else {
- isOperator = false;
- break;
- }
- }
- if (regex) {
- addRegex(e.fieldName(), regex, flags);
- }
- if ( isOperator )
- return;
- }
- if ( e.type() == Array ) {
- _hasArray = true;
- }
- else if( *fn == '$' ) {
- if( str::equals(fn, "$atomic") || str::equals(fn, "$isolated") ) {
- uassert( 14844, "$atomic specifier must be a top level field", !nested );
- _atomic = e.trueValue();
- return;
- }
- }
- // normal, simple case e.g. { a : "foo" }
- addBasic(e, BSONObj::Equality, false);
- }
- ElementMatcher::ElementMatcher( BSONElement e , int op, bool isNot )
- : _toMatch( e ) , _compareOp( op ), _isNot( isNot ), _subMatcherOnPrimitives(false) {
- else if ( op == BSONObj::opELEM_MATCH ) {//{$elemMatch:{x:1,b:{$gt:1}}}
- BSONElement m = e;
- BSONObj x = m.embeddedObject();//ElementMatcher内部再建立一个子的Matcher对象来完成elemMatch的匹配工作
- if ( x.firstElement().getGtLtOp() == BSONObj::Equality &&
- !str::equals( x.firstElement().fieldName(), "$not" ) ) {
- _subMatcher.reset( new Matcher( x ) );
- _subMatcherOnPrimitives = false;
- }
- else {
- // meant to act on primitives
- _subMatcher.reset( new Matcher( BSON( "" << x ) ) );
- _subMatcherOnPrimitives = true;
- }
- }
- }
QueryResponseBuilder::addMatch->QueryResponseBuilder::currentMatches->CoveredIndexMatcher::matchesCurrent
- bool CoveredIndexMatcher::matchesCurrent( Cursor * cursor , MatchDetails * details ) const {
- // bool keyUsable = ! cursor->isMultiKey() && check for $orish like conditions in matcher SERVER-1264
- return matches( cursor->currKey() , cursor->currLoc() , details ,
- !cursor->indexKeyPattern().isEmpty() // unindexed cursor
- && !cursor->isMultiKey() // multikey cursor
- );
- }
- bool CoveredIndexMatcher::matches( const BSONObj& key, const DiskLoc& recLoc,
- MatchDetails* details, bool keyUsable ) const {
- if ( keyUsable ) {//index usable
- if ( !_keyMatcher.matches(key, details ) ) {
- return false;
- }//这里_needRecord在CoveredIndexMatched::init函数调用时初始化,含义是查询条件中所有的查询域都是索引,表明不需要加载整个document,匹配完成后直接就行了
- bool needRecordForDetails = details && details->needRecord();
- if ( !_needRecord && !needRecordForDetails ) {
- return true;
- }
- }
- BSONObj obj = recLoc.obj();//从内存映射对象中得到具体的每一条record中的bson对象
- bool res =
- _docMatcher->matches( obj, details ) &&//具体的匹配动作
- !isOrClauseDup( obj );//这里的MultiCursor相关的部分,不做分析
- return res;
- }
- bool Matcher::matches(const BSONObj& jsobj , MatchDetails * details ) const {
- for ( unsigned i = 0; i < _basics.size(); i++ ) {//单个ElementMatcher的匹配动作
- const ElementMatcher& bm = _basics[i];
- const BSONElement& m = bm._toMatch;
- // -1=mismatch. 0=missing element. 1=match
- int cmp = matchesDotted(m.fieldName(), m, jsobj, bm._compareOp, bm , false , details );
- if ( cmp == 0 && bm._compareOp == BSONObj::opEXISTS ) {//查询域不存在,就看看是不是查询条件为{x{$exist:true}},这种条件将是匹配的
- // If missing, match cmp is opposite of $exists spec.
- cmp = -retExistsFound(bm);
- }
- if ( bm._isNot )
- cmp = -cmp;
- if ( cmp < 0 )
- return false;
- if ( cmp == 0 ) {
- /* missing is ok iff we were looking for null */
- if ( m.type() == jstNULL || m.type() == Undefined ||
- ( ( bm._compareOp == BSONObj::opIN || bm._compareOp == BSONObj::NIN ) && bm._myset->count( staticNull.firstElement() ) > 0 ) ) {
- if ( bm.negativeCompareOp() ^ bm._isNot ) {
- return false;
- }
- }
- else {
- if ( !bm._isNot ) {
- return false;
- }
- }
- }
- }
- for (vector<RegexMatcher>::const_iterator it = _regexs.begin();
- it != _regexs.end();
- ++it) {
- BSONElementSet s;
- if ( !_constrainIndexKey.isEmpty() ) {
- BSONElement e = jsobj.getFieldUsingIndexNames(it->_fieldName, _constrainIndexKey);
- // Should only have keys nested one deep here, for geo-indices
- // TODO: future indices may nest deeper?
- if( e.type() == Array ){
- BSONObjIterator i( e.Obj() );
- while( i.more() ){
- s.insert( i.next() );
- }
- }
- else if ( !e.eoo() )
- s.insert( e );
- }
- else {
- jsobj.getFieldsDotted( it->_fieldName, s );
- }
- bool match = false;//正则表达式的匹配.
- for( BSONElementSet::const_iterator i = s.begin(); i != s.end(); ++i )
- if ( regexMatches(*it, *i) )
- match = true;
- if ( !match ^ it->_isNot )
- return false;
- }//所有的and or not其实最内部都是基本的ElementMatcher的匹配,所以后面会重点分析matchesDotted函数.
- if ( _andMatchers.size() > 0 ) {//这里and条件匹配一个条件没匹配成功则返回false
- for( list< shared_ptr< Matcher > >::const_iterator i = _andMatchers.begin();
- i != _andMatchers.end(); ++i ) {
- // SERVER-3192 Track field matched using details the same as for
- // top level fields, at least for now.
- if ( !(*i)->matches( jsobj, details ) ) {
- return false;
- }
- }
- }
- if ( _orMatchers.size() > 0 ) {
- bool match = false;//or的匹配,只要一个匹配成功了就表示匹配成功了,
- for( list< shared_ptr< Matcher > >::const_iterator i = _orMatchers.begin();
- i != _orMatchers.end(); ++i ) {
- // SERVER-205 don't submit details - we don't want to track field
- // matched within $or
- if ( (*i)->matches( jsobj ) ) {
- match = true;
- break;
- }
- }
- if ( !match ) {
- return false;
- }
- }
- if ( _norMatchers.size() > 0 ) {
- for( list< shared_ptr< Matcher > >::const_iterator i = _norMatchers.begin();
- i != _norMatchers.end(); ++i ) {
- // SERVER-205 don't submit details - we don't want to track field
- // matched within $nor
- if ( (*i)->matches( jsobj ) ) {
- return false;
- }
- }
- }//这里执行javascript代码的匹配,笔者未分析过这段代码,感兴趣的自己分析吧
- if ( _where ) {
- return _where->exec( jsobj );
- }
- return true;
- }
- int Matcher::matchesDotted(const char *fieldName, const BSONElement& toMatch, const BSONObj& obj, int compareOp, const ElementMatcher& em , bool isArr, MatchDetails * details ) const {
- BSONElement e;
- bool indexed = !_constrainIndexKey.isEmpty();
- if ( indexed ) {//keyMatcher的匹配流程,走这里
- e = obj.getFieldUsingIndexNames(fieldName, _constrainIndexKey);
- if( e.eoo() ) {
- cout << "obj: " << obj << endl;
- cout << "fieldName: " << fieldName << endl;
- cout << "_constrainIndexKey: " << _constrainIndexKey << endl;
- verify( !e.eoo() );
- }
- }
- else {//正常的doc匹配来到这里
- const char *p = strchr(fieldName, '.');
- if ( p ) {//这里对于db.coll.find({x.y:1})这种查询条件首先从文档中找出y域的值,然后再进行匹配
- string left(fieldName, p-fieldName);
- BSONElement se = obj.getField(left.c_str());
- if ( se.eoo() )
- ;
- else if ( se.type() != Object && se.type() != Array )
- ;
- else {//这里y是对象或者数组
- BSONObj eo = se.embeddedObject();
- return matchesDotted(p+1, toMatch, eo, compareOp, em, se.type() == Array , details );
- }
- }
- // An array was encountered while scanning for components of the field name.
- if ( isArr ) {//这里array中只要有一个满足要求就算满足要求
- BSONObjIterator ai(obj);
- bool found = false;//这里对于这种状况docment:{x:[1,2,3,4,5]},查询条件{x:4}循环找出只要匹配一个值那么这个数组就算匹配
- while ( ai.moreWithEOO() ) {
- BSONElement z = ai.next();
- if( strcmp(z.fieldName(),fieldName) == 0 ) {
- if ( compareOp == BSONObj::opEXISTS ) {
- return retExistsFound( em );
- }
- if ( compareOp != BSONObj::opELEM_MATCH &&
- valuesMatch(z, toMatch, compareOp, em) ) {//这里的valuesMatch其实就是类似于memcmp的BSONElement match.
- // "field.<n>" array notation was used//可自行阅读代码分析
- if ( details )
- details->setElemMatchKey( z.fieldName() );
- return 1;
- }
- }
- if ( z.type() == Object ) {//对象则递归继续分析
- BSONObj eo = z.embeddedObject();
- int cmp = matchesDotted(fieldName, toMatch, eo, compareOp, em, false, details );
- if ( cmp > 0 ) {
- if ( details )
- details->setElemMatchKey( z.fieldName() );
- return 1;
- }
- else if ( cmp < 0 ) {
- found = true;
- }
- }
- }
- return found ? -1 : 0;
- }
- if( p ) {
- // Left portion of field name was not found or wrong type.
- return 0;
- }
- else {
- e = obj.getField(fieldName);
- }
- }
- if ( compareOp == BSONObj::opEXISTS ) {
- if( e.eoo() ) {
- return 0;
- } else {
- return retExistsFound( em );
- }
- }
- else if ( ( e.type() != Array || indexed || compareOp == BSONObj::opSIZE ) &&
- compareOp != BSONObj::opELEM_MATCH &&
- valuesMatch(e, toMatch, compareOp, em ) ) {
- return 1;
- }//对于array,只要有一个满足条件则判断为满足要求
- else if ( e.type() == Array && compareOp != BSONObj::opSIZE ) {
- BSONObjIterator ai(e.embeddedObject());
- while ( ai.more() ) {
- BSONElement z = ai.next();
- bool match = false;//遇到elemMatch了,对于每一个数组中的对象,调用其match过程,因为是对于数组中的每一个对象的子match过程,就说明array中的每一个对象中一定要有一个满足elemMatch的要求才会匹配,这就是所谓elemMatch的实现
- if ( compareOp == BSONObj::opELEM_MATCH ) {
- if ( em._subMatcherOnPrimitives ) {
- match = em._subMatcher->matches( z.wrap( "" ) );
- }
- else {
- match = ( z.isABSONObj() && em._subMatcher->matches( z.embeddedObject() ) );
- }
- }
- else {
- match = valuesMatch( z, toMatch, compareOp, em );
- }
- if ( match ) {
- if ( details ) {
- details->setElemMatchKey( z.fieldName() );
- }
- return 1;
- }
- }
- // match an entire array to itself
- if ( compareOp == BSONObj::Equality && e.woCompare( toMatch , false ) == 0 ) {
- return 1;
- }
- if ( compareOp == BSONObj::opIN && valuesMatch( e, toMatch, compareOp, em ) ) {
- return 1;
- }
- }
- else if ( e.eoo() ) {
- return 0;
- }
- return -1;
- }
- bool QueryResponseBuilder::addMatch() {
- bool match = _builder->handleMatch( orderedMatch, details );//for field projection
- return match;
- }
1 查询的结果不需要排序,对应OrderedBuildStrategy
2查询的结果需要排序,对应ReorderBuildStrategy
3多查询plan中有些plan需要排序有些plan不需要排序对应HybridBuildStrategy
这里我们选择ReOrderBuildStrategy做分析.
- bool ReorderBuildStrategy::handleMatch( bool &orderedMatch, MatchDetails& details ) {
- orderedMatch = false;//查询结果的地址已经被记录了,这种情况是multikey索引或者多plan扫描的结果,这里过滤掉
- if ( _cursor->getsetdup( _cursor->currLoc() ) ) {
- return false;
- }
- _handleMatchNoDedup();
- return true;
- }
- void ReorderBuildStrategy::_handleMatchNoDedup() {
- DiskLoc loc = _cursor->currLoc();//scanAndOrder将结果添加到一个需要Map中排序,最后得到的结果就是需要的顺序,排序只能排列最大一共32M的总文档大小,文档总大小超过了这个值将报错误.
- _scanAndOrder->add( current( false ), _parsedQuery.showDiskLoc() ? &loc : 0 );
- }
这里排序完了后查询取出数据时会完成域的映射,就是说db.coll.find({x:1},{_id:0})这里不显示_id的值
的映射就在这里处理,最后来看看吧.
- string queryWithQueryOptimizer() {
- int nReturned = queryResponseBuilder->handoff( result );
- }
QueryResponseBuilder::handoff->ReorderBuildStrategy::rewriteMatches->ScanAndOrder::fill:
- void ScanAndOrder::fill( BufBuilder& b, const ParsedQuery *parsedQuery, int& nout ) const {
- for ( BestMap::const_iterator i = _best.begin(); i != _best.end(); i++ ) {
- n++;
- if ( n <= _startFrom )
- continue;
- const BSONObj& o = i->second;//实际的映射部分,只取出查询部分指定的映射,具体自己分析代码.这里不再分析
- fillQueryResultFromObj( b, projection, o, details.get() );
- nFilled++;
- if ( nFilled >= _limit )
- break;
- }
- nout = nFilled;
- }
这里的源码需要耐心 细心,不断的读,不断的调试,最终还是能够搞定的.
原文链接:mongodb源码分析(八)查询4之mongod文档的匹配
作者:yhjj0108,杨浩
mongodb源码分析(九)mongodb的存储管理
1. mongodb中能够保存的最大collection数目.
mongodb官网(Using a Large Number of Collections,)的信息如下:
By default MongoDB has a limit of approximately 24,000 namespaces per database. Each namespace is 628 bytes, the .ns file is 16MB by default.
Each collection counts as a namespace, as does each index. Thus if every collection had one index, we can create up to 12,000 collections. The --nssize parameter allows you to increase this limit (see below).
mongodb的存储文件分两部分.1是保存集合信息的xxx.ns,另一部分是保存具体数据的xxx.0,xxx.1.
默认xxx.ns文件为16M.其中保存了collection的头信息,具体见
mongodb源码分析(五)查询2之mongod的数据库加载,
xxx.0,xxx.1.......xxx.n这些数据文件的空间默认是从64M增长的,每一个文件都是前一个文件的2倍大小,直到2G,对于
32位系统单个文件最大为512M,如果指定了--smallfiles,则上面的空间大小依次除以4也就是16M,32M.......512M.
3. mongodb内部存储空间的分配.
mongodb内部空间的分配是按照extent来分配的,当存储数据时,collection首先查找该collection是否还有空间,
有则分配空间,没有就申请一个extent.一个collection中extent是按照双向链表来组织的,其中namespace保存了头
和尾.
申请到的extent除头部信息外其余的空间将会被加入到一个deleteList的头部中,其结构如下:
这里每一个bucket中的可用数据大小在上面标注的范围到下一个bucket大小减一的范围内,如32其bucket内部保存
的可用空间长度就在[32,63].
这里每次collection的空间分配就是根据要分配的空间大小来找到相应的bucket,然后遍历bucket,找到相应的
可用的空间,将分配空间后将剩余的空间作为一个新的结构加入到相应的bucket中.如果剩余的空间不足24字节或
者分配空间的1/8,那么这段空间将不再加入到对应的bucket中.这部分空间将被划入相应的document中,其在更新
数据时可能会用到.
mongodb的最小数据存储单位是document,每当插入一条document时其内部分配空间,分配的空间除了
document本身占用的内存外还有一个document头如下:
- int _lengthWithHeaders;//该条文档和头部总大小
- int _extentOfs;//该条文档所在的extent位置
- int _nextOfs;//下一条文档所在位置
- int _prevOfs;//上一条文档所在位置
- len = a * 16;
- if(len < 1000)
- len *= 64;
- if(len > 1000000000)
- len = 1000000000;
- len = len & 0xffffff00;//256字节对齐
- int y = prelen < 4000000 ? prelen * 4.0 : prelen * 1.35;//一般按照前一个extent大小分配,前一个大小小于4M则按照4倍的大小扩展,否则按照1.35倍大小扩展
- len = y > len ? y : len;
- if(len > 0x7ff00000)//最大是一个文件的大小,当指定了--smallfiles则空间除以4
- len = 0x7ff00000;