在上周尽管磕磕碰碰地完成了一次查mongo询任务,但总想一谈工具背后的真相,即为什么Mongodb不支持sql?虽然我知道它是NoSQL系列的, 但还是好奇着它的查询是怎么实现的. 上网找了mongodb的代码,最新版的是4.0.5. https://fastdl.mongodb.org/src/mongodb-src-r4.0.5.zip
匆匆一瞥,尽管Mongodb的源码有超过一万个C/C++的文件,超过三百万行代码.但是带着问题找源码还是比较容易的,何况我只是对它的查询感到好奇而已.
准备
进入了源代码src/mongo, 这时c/c++代码已经剩下不到4千个文件了.
它是一个数据库服务器, 于是直奔db目录. 大概浏览一下文件,
发现了代码入口 dbmain.cpp.
从这里开始,不打算按部就班地阅读代码,那样工作量太大.
一般服务器的工作流程
首先, Mongodb作为一个数据库服务器,也符合一般服务器的工作方式.
服务器工作流程在mongodb中的实现
顺着这个逻辑,很容易找到mongdo的执行流程
在src/mongo/db/目录下
最后的代码是:
ExitCode _initAndListen(int listenPort) {
Client::initThread("initandlisten");
...
start = serviceContext->getTransportLayer()->start();
if (!start.isOK()) {
error() << "Failed to start the listener: " << start.toString();
return EXIT_NET_ERROR;
}
start = serviceContext->getServiceEntryPoint()->start();
if (!start.isOK()) {
error() << "Failed to start the service entry point: " << start;
return EXIT_NET_ERROR;
}
....
return waitForShutdown();
和预期的不一样,没有找到循环语句. 似乎线索在这里中断了. 但其实并不怎么影响代码阅读, 回过头来检查代码
int mongoDbMain(int argc, char* argv[], char** envp) {
....
//在这里设置了ServiceEntryPointMongod 作为ServiceEntryPoint
service->setServiceEntryPoint(std::make_unique<ServiceEntryPointMongod>(service));
...
ExitCode exitCode = initAndListen(serverGlobalParams.port);
exitCleanly(exitCode);
return 0;
}
ExitCode _initAndListen(int listenPort) {
....
auto tl = //在这里创建TransportLayer
transport::TransportLayerManager::createWithConfig(&serverGlobalParams, serviceContext);
auto res = tl->setup();
if (!res.isOK()) {
error() << "Failed to set up listener: " << res;
return EXIT_NET_ERROR;
}
serviceContext->setTransportLayer(std::move(tl));//在这里设置了TransportLayer
std::unique_ptr<TransportLayer> TransportLayerManager::createWithConfig(
...
//在这里创建了TransportLayerASIO
auto transportLayerASIO = stdx::make_unique<transport::TransportLayerASIO>(opts, sep);
transportLayer = std::move(transportLayerASIO);
.....
std::vector<std::unique_ptr<TransportLayer>> retVector;
retVector.emplace_back(std::move(transportLayer));
return stdx::make_unique<TransportLayerManager>(std::move(retVector));
}
于是在src/mongo/transport/transport_layer_asio.cpp 里发现消息循环
Status TransportLayerASIO::start() {
.....
_listenerThread = stdx::thread([this] {
setThreadName("listener");
while (_running.load()) {//消息循环在这里
_acceptorReactor->run();
}
});
....
return Status::OK();
}
void TransportLayerASIO::_acceptConnection(GenericAcceptor& acceptor) {
....
try {
std::shared_ptr<ASIOSession> session(
//在这里创建新的Session,代表一个客户端连接
new ASIOSession(this, std::move(peerSocket), true));
_sep->startSession(std::move(session));
} catch (const DBException& e) {
warning() << "Error accepting new connection " << e;
}
_acceptConnection(acceptor);
};
acceptor.async_accept(*_ingressReactor, std::move(acceptCb));
}
在这里处理客户端请求
DbResponse ServiceEntryPointMongod::handleRequest(OperationContext* opCtx, const Message& m) {
return ServiceEntryPointCommon::handleRequest(opCtx, m, Hooks{});
}
在这里解析客户端命令
//service_entry_point_common.cpp
DbResponse ServiceEntryPointCommon::handleRequest(OperationContext* opCtx,
const Message& m,
const Hooks& behaviors) {
.....
DbMessage dbmsg(m);
.....
DbResponse dbresponse;
if (op == dbMsg || op == dbCommand || (op == dbQuery && isCommand)) {
dbresponse = receivedCommands(opCtx, m, behaviors); //这是一个非常重要的函数,查询命令, aggregate,map/reduce 命令都会经过这里
}
....
//service_entry_point_common.cpp
DbResponse receivedCommands(OperationContext* opCtx,
const Message& message,
const ServiceEntryPointCommon::Hooks& behaviors) {
....
// 获得command
c = CommandHelpers::findCommand(request.getCommandName()
....
//执行真正的命令
execCommandDatabase(opCtx, c, request, replyBuilder.get(), behaviors);
....
回顾一下:
- MongoDB的源代码中,将网络传输层包装了起来. 抽象成了TransportLayer, 目前只有一个具体的子类TransportLayerASIO. 以后也许可能会增加新的传输层.
- 对协议的命令抽象成了ServiceEntryPoint, DB服务器用到的是ServiceEntryPointMongod. 处理请求的函数是ServiceEntryPointMongod::handleRequest,它直接转手给了ServiceEntryPointCommon::handleRequest.
也就是, ServiceEntryPointCommon::handleRequest将是查询命令故事开始的地方.
查询语句find的处理:
还是一样的不打算按部就班地阅读代码. 通过简单的list命令发现了commands目录,有理由相信所有命令的处理都在这里.
发现find_cmd.cpp
//find_cmd.cpp
class FindCmd : public BasicCommand {
public:
//看上去像是注册了find命令.
//另外发现run函数,有理由相信run是find的处理函数
FindCmd() : BasicCommand("find") {}
....
}
run 函数前的注释说明了find的执行过程.从注释中可以知道,find命令按条件过滤完数据(文档)后, 没有做其他处理,直接返回给了客户端.
也就是说,没法实现SQL语句的group by, Sum等复杂用法.
也可以顺着代码大致看一眼.
//find_cmd.cpp
/**
* Runs a query using the following steps:
* --Parsing.
* --Acquire locks.
* --Plan query, obtaining an executor that can run it.
* --Generate the first batch.
* --Save state for getMore, transferring ownership of the executor to a ClientCursor.
* --Generate response to send to the client.
*/
bool run(OperationContext* opCtx,
const std::string& dbname,
const BSONObj& cmdObj,
BSONObjBuilder& result) override {
......
//从这里才拿到Request,也就是查询还没开始.
const QueryRequest& originalQR = exec->getCanonicalQuery()->getQueryRequest();
// Stream query results, adding them to a BSONArray as we go.
CursorResponseBuilder firstBatch(/*isInitialResponse*/ true, &result);
BSONObj obj;
PlanExecutor::ExecState state = PlanExecutor::ADVANCED;
long long numResults = 0;
while (!FindCommon::enoughForFirstBatch(originalQR, numResults) &&
PlanExecutor::ADVANCED == (state = exec->getNext(&obj, NULL))) {
// If we can't fit this result inside the current batch, then we stash it for later.
if (!FindCommon::haveSpaceForNext(obj, numResults, firstBatch.bytesUsed())) {
exec->enqueue(obj);
break;
}
// Add result to output buffer.
firstBatch.append(obj);//将文档加入firstBatch!!!!!,值得关注firstBatch的动向
numResults++;
}
..........
// Generate the response object to send to the client.
//已经将firstBatch发送给客户端了,中间没什么处理.
//也就是只能在exec->getNext里根据find参数过滤文档了
firstBatch.done(cursorId, nss.ns());
return true;
}
其实至此已经能远远地看清MongoDB find命令的执行过程了. 也能看到它的查询处理比较简单,仅仅只是执行了简单的文档过滤之后就将结果返回给客户端了.没有笛卡尔乘积的相关代码,做不了复杂的关联查询.
只不过我们要更严谨一点. 要弄清
- 命令的注册过程,确定find命令一定是由find_cmd.cpp来完成的.
- 从网络消息到find命令处理的中间步骤过程
- exec->getNext真的执行了过滤操作?
下面开始填坑过程:
find命令的注册:
//find_cmd.cpp
class FindCmd : public BasicCommand {
public:
FindCmd() : BasicCommand("find") {}
....
} findCmd;// 这里生成了一个全局对象findCmd, 同时会调用FindCmd的构造函数,
//进而调用BasicCommand的构造函数, 再调用Command的构造函数,完成命令的注册
// commands.h
class BasicCommand : public Command {
// commands.h
/**
* Serves as a base for server commands. See the constructor for more details.
*/
class Command {
public:
...
/**
* Constructs a new command and causes it to be registered with the global commands list. It is
* not safe to construct commands other than when the server is starting up.
*
* @param oldName an optional old, deprecated name for the command
*/
Command(StringData name, StringData oldName = StringData());
// commands.cpp
Command::Command(StringData name, StringData oldName)
...
//在这里注册了命令与处理函数
globalCommandRegistry()->registerCommand(this, name, oldName);
}
// commands.cpp
void CommandRegistry::registerCommand(Command* command, StringData name, StringData oldName) {
for (StringData key : {name, oldName}) {
...
auto hashedKey = CommandMap::HashedKey(key);
...
_commands[hashedKey] = command;
}
}
从网络请求到find的处理, find命令的解析:
// service_entry_point_common.cpp
DbResponse ServiceEntryPointCommon::handleRequest(OperationContext* opCtx,
const Message& m,
const Hooks& behaviors) {
...
if (op == dbMsg || op == dbCommand || (op == dbQuery && isCommand)) {
dbresponse = receivedCommands(opCtx, m, behaviors);
...
// service_entry_point_common.cpp
DbResponse receivedCommands(OperationContext* opCtx,
const Message& message,
const ServiceEntryPointCommon::Hooks& behaviors) {
auto replyBuilder = rpc::makeReplyBuilder(rpc::protocolForMessage(message));
[&] {
...
//通过"find", 获得全局对象findCmd
c = CommandHelpers::findCommand(request.getCommandName())
...
}();//挺有意思的C++11语法, 匿名函数调用
//src/mongo/db/commands.cpp
Command* CommandHelpers::findCommand(StringData name) {
return globalCommandRegistry()->findCommand(name);
}
.....
Command* CommandRegistry::findCommand(StringData name) const {
auto it = _commands.find(name);
if (it == _commands.end())
return nullptr;
return it->second;
}
从网络请求到find的处理, find命令的调用:
// service_entry_point_common.cpp
DbResponse receivedCommands(OperationContext* opCtx,
const Message& message,
const ServiceEntryPointCommon::Hooks& behaviors) {
auto replyBuilder = rpc::makeReplyBuilder(rpc::protocolForMessage(message));
[&] {
...
//通过"find", 获得全局对象findCmd
c = CommandHelpers::findCommand(request.getCommandName())
...
//在这里执行命令
runCommandImpl(opCtx,
invocation.get(),
request,
replyBuilder,
startOperationTime,
behaviors,
&extraFieldsBuilder,
sessionOptions)) ;
...
}()
}
bool runCommandImpl(OperationContext* opCtx,
CommandInvocation* invocation,
const OpMsgRequest& request,
rpc::ReplyBuilderInterface* replyBuilder,
LogicalTime startOperationTime,
const ServiceEntryPointCommon::Hooks& behaviors,
BSONObjBuilder* extraFieldsBuilder,
const boost::optional<OperationSessionInfoFromClient>& sessionOptions) {
....
// 从这里进入FindCmd::run()
invocation->run(opCtx, &crb);
..
}
数据的查询过滤,以下开启gdb大法,一路跟踪下去了:
// commands/find_cmd.cpp
bool run(OperationContext* opCtx,
const std::string& dbname,
const BSONObj& cmdObj,
BSONObjBuilder& result) override {
...
// 注意: exec->getNext
while (!FindCommon::enoughForFirstBatch(originalQR, numResults) &&
PlanExecutor::ADVANCED == (state = exec->getNext(&obj, NULL))) {
// If we can't fit this result inside the current batch, then we stash it for later.
if (!FindCommon::haveSpaceForNext(obj, numResults, firstBatch.bytesUsed())) {
exec->enqueue(obj);
break;
}
// Add result to output buffer.
firstBatch.append(obj);
numResults++;
}
...
}
// query/plan_executor.cpp
PlanExecutor::ExecState PlanExecutor::getNext(BSONObj* objOut, RecordId* dlOut) {
...
ExecState state = getNextImpl(objOut ? &snapshotted : NULL, dlOut);
...
return state;
}
...
PlanExecutor::ExecState PlanExecutor::getNextImpl(Snapshotted<BSONObj>* objOut, RecordId* dlOut) {
.....
// _root->work是真正干活的地方
PlanStage::StageState code = _root->work(&id);
.....
}
// exec/plan_stage.cpp
PlanStage::StageState PlanStage::work(WorkingSetID* out) {
...
StageState workResult = doWork(out);
...
return workResult;
}
// exec/collection_scan.cpp
PlanStage::StageState CollectionScan::doWork(WorkingSetID* out) {
....
record = _cursor->next();
...
return returnIfMatches(member, id, out);
}
........
PlanStage::StageState CollectionScan::returnIfMatches(WorkingSetMember* member,
WorkingSetID memberID,
WorkingSetID* out) {
++_specificStats.docsTested;
if (Filter::passes(member, _filter)) {
//找到了
return PlanStage::ADVANCED;
} else if (_endCondition && Filter::passes(member, _endCondition.get())) {
//文件结尾
return PlanStage::IS_EOF;
} else {
// 没找到. 找下一条吧.
return PlanStage::NEED_TIME;
}
}