Mongo之pymongo 源码分析 find sort aggregate

最新推荐文章于 2024-01-15 22:02:53 发布

hpulfc

最新推荐文章于 2024-01-15 22:02:53 发布

阅读量1.7k

点赞数

分类专栏：数据库 mongo pymongo 文章标签： pymongo mongo find sort 分析

本文链接：https://blog.csdn.net/hpulfc/article/details/79611326

版权

数据库同时被 3 个专栏收录

19 篇文章 0 订阅

订阅专栏

mongo

4 篇文章 0 订阅

订阅专栏

pymongo

1 篇文章 0 订阅

订阅专栏

Mongo之pymongo 源码分析 find sort aggregate

一次查询的过程

简单来说，pymongo就是python关于Mongo db的一个包，主要是通过对一些命令的包装，通过socket发送给mongo服务，获取到一些结果，对结果进行包装，然后以包装为游标对象返回给使用者，其中游标对象实现了 next方法，同时对__next__进行更新，是的用户可以通过python迭代器进行迭代，逐条返回查询结果

来看具体分析：》》》

首先，通常情况下，我们这样使用：

mc = pymongo.MongoClient("remote_uri")

这是先生成了一个mongo客户端，返回的是一个MongoClien对象：

class MongoClient(common.BaseObject):
    HOST = "localhost"
    PORT = 27017

    def __init__(
            self,
            host=None,
            port=None,
            document_class=dict,
            tz_aware=False,
            connect=True,
            **kwargs):
......
......

然后会使用 db=mc["database_name"] 获取相应的数据库对象，返回的是DataBase的实例：

-- mongo_client.py
    def __getattr__(self, name):
        """Get a database by name.

        Raises :class:`~pymongo.errors.InvalidName` if an invalid
        database name is used.

        :Parameters:
          - `name`: the name of the database to get
        """
        if name.startswith('_'):
            raise AttributeError(
                "MongoClient has no attribute %r. To access the %s"
                " database, use client[%r]." % (name, name, name))
        return self.__getitem__(name)

    def __getitem__(self, name):
        """Get a database by name.

        Raises :class:`~pymongo.errors.InvalidName` if an invalid
        database name is used.

        :Parameters:
          - `name`: the name of the database to get
        """
        return database.Database(self, name) ##################################主要看这里#########################################

同理，获取对应的集合的时候也是通过db["collection_name"]得到，返回的对象是Collection的实例：这个操作和上个类似，可以试着去找一找

得到collection的实例之后就有我们很常见的一些方法了，如下：

def find(self, *args, **kwargs):
    return Cursoe(self, *args, **kwargs)
def find_one(self, filter=None, *args, **kwargs):
def insert_one(self, document):
def save(self, to_save, manipulate=True, check_keys=True, **kwargs):
还有很多，就不粘贴了。。

经常使用的 find 方法的返回结果是一个Cursor对象, 上面的代码可以看出。

接着重点来了：
很多的常见的函数，如sort, count,distinct,explain,min,max...都是Cursor中的方法，具体的实现原理可以自己去看看，这里只是对sort函数和怎么通过这个游标取数据进行分析。

首先是看一下sort函数都做了哪些事情：

    def sort(self, key_or_list, direction=None):
        """Sorts this cursor's results.
        """
        self.__check_okay_to_chain()
        keys = helpers._index_list(key_or_list, direction)
        self.__ordering = helpers._index_document(keys)
        return self

由代码不难看出，sort函数在经过一系列操作之后，仍然是返回自身，也就是返回的韩式Cursor实例，只不过前面进行了三步操作。

是检查是否能够在这个游标之后进行其他更多的操作，这里是有可能抛出InvalidOperation异常。
获取排序的键和对应的方向
对文档以当前排序的键建立相应的索引。

三步执行完之后，并没有获取到真正的结果，文档的真实的获取结果是下面这个方法：

    def next(self):
        """Advance the cursor."""
        if self.__empty:
            raise StopIteration
        _db = self.__collection.database
        if len(self.__data) or self._refresh():
            if self.__manipulate:
                return _db._fix_outgoing(self.__data.popleft(),
                                         self.__collection)
            else:
                return self.__data.popleft()
        else:
            raise StopIteration

    __next__ = next

这个函数结尾对__next__进行了重新赋值，使得可以通过python的迭代器进行迭代获取值。
纵观这个函数，发现获取的数据容器是 self.__data。在当前代码文件中搜索发现，他是一个deque对象，(插一句，双端队列支持从队列两端进行数据操作，插入/移除.),再次搜索发现，对这个变量进行赋值的方法还有下面这个：

    def __send_message(self, operation):
        """Send a query or getmore operation and handles the response.
        """
        client = self.__collection.database.client

        if operation:
            kwargs = {
                "read_preference": self.__read_preference,
                "exhaust": self.__exhaust,
            }
            if self.__address is not None:
                kwargs["address"] = self.__address

            try:
                response = client._send_message_with_response(operation,
                                                              **kwargs)
                self.__address = response.address
                if self.__exhaust:
                    # 'response' is an ExhaustResponse.
                    self.__exhaust_mgr = _SocketManager(response.socket_info,
                                                        response.pool)

                data = response.data
            except AutoReconnect:
               
                self.__killed = True
                raise
        else:
            # Exhaust cursor - no getMore message.
            try:
                data = self.__exhaust_mgr.sock.receive_message(1, None)
            except ConnectionFailure:
                self.__die()
                raise

        try:
            doc = helpers._unpack_response(response=data,
                                           cursor_id=self.__id,
                                           codec_options=self.__codec_options)
        except OperationFailure:
            .......
        except NotMasterError:
            .......
        self.__id = doc["cursor_id"]
        if self.__id == 0:
            self.__killed = True

        self.__retrieved += doc["number_returned"]
        self.__data = deque(doc["data"]) ##########################################这里######################################

        ......

搜索这个__send_messae发现__refresh使用了这个方法，同时在next方法中发现了__refresh的踪迹，可见获取数据主要是通过不同的渠道进行__send_message获取得到的。

查看__send_message方法发现，是通过MongClient的实例调用_send_message_with_response方法获取，这个客户端又是通过回调的方式，调用了Server实例的send_message_with_response方法，返回的是Response对象（就是通过读取套接字中的内容，然后进行封装的），其代码如下：

class Response(object):
    __slots__ = ('_data', '_address')

    def __init__(self, data, address):
        """Represent a response from the server.

        :Parameters:
          - `data`: Raw BSON bytes.
          - `address`: (host, port) of the source server.
        """
        self._data = data
        self._address = address

    @property
    def data(self):
        """Server response's raw BSON bytes."""
        return self._data

    @property
    def address(self):
        """(host, port) of the source server."""
        return self._address

主要是data属性，文档数据就存放在这个地方，到这，真正的数据也就找到了！

总结来说，就是通过实例化相应的类，然后主要是通过Colllection对象的方法获取Cursor对象，对文档进行多种操作。本文主要是对find方法获取数据的过程进行分析，然后，简单说了一下常用的sort方法包含哪些过程，以加深对pymongo的理解。其中涉及到的类主要有：MongoClient, Collection, Cursor, Deque, Server, SocketInfo(获取信息), Response 。最后，有兴趣的可以看一下Collection下的aggregate方法，与find有一些不同，但大体实现原理类似！嗯，没了~~

hpulfc

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
Mongo之pymongo 源码分析 find sort aggregate

Mongo之pymongo 源码分析 find sort aggregate一次查询的过程简单来说，pymongo就是python关于Mongo db的一个包，主要是通过对一些命令的包装，通过socket发送给mongo服务，获取到一些结果，对结果进行包装，然后以包装为游标对象返回给使用者，其中游标对象实现了 next方法，同时对__next__进行更新，是的用户可以通过python迭...
复制链接

扫一扫