xerces-c++内存管理策略&为何耗费大量内存

本文结构

xerces-c++是一XML解析器。在讲其内存管理策略之前,需要先讲一下一个奇怪的new用法,之后会继续介绍它的内存管理策略,最后会说明它是如何耗费大量内存的。

1) 奇怪的new语句

DOMAttr *DOMDocumentImpl::createAttribute(const XMLCh *nam)
{
    if(!nam || !isXMLName(nam))
        throw DOMException(DOMException::INVALID_CHARACTER_ERR,0, getMemoryManager());
    return new (this, DOMMemoryManager::ATTR_OBJECT) DOMAttrImpl(this,nam);
}

placement new的用法一般为 new (address) (type) initializer 的形式,address是一个指针指向已经存在的一块内存,而上面的代码却有两个参数紧跟在new后面。于是查了下官方文档,还真有这种用法:

new expression
C++ C++ language Expressions
Creates and initializes objects with dynamic storage duration, that is, objects whose lifetime is not necessarily limited by the scope in which they were created.

Syntax
::(optional) new ( type ) initializer(optional) (1)
::(optional) new new-type initializer(optional) (2)
::(optional) new (placement-params) ( type ) initializer(optional) (3)
::(optional) new (placement-params) new-type initializer(optional) (4)

官方文档还有一个例子与xerces代码很像:
new(2,f) T; // calls operator new(sizeof(T), 2, f)
都是相当于下面两步:

  1. 调用重载的new函数分配内存
inline void * operator new(size_t amt, DOMDocumentImpl *doc, DOMMemoryManager::NodeObjectType type)
{
    void *p = doc->allocate(amt, type);
    return p;
}
  1. 调用构造函数初始化上面分配的内存
DOMAttrImpl::DOMAttrImpl(DOMDocument *ownerDoc, const XMLCh *aName)
    : fNode(ownerDoc), fParent (ownerDoc), fSchemaType(0)
{
    DOMDocumentImpl *docImpl = (DOMDocumentImpl *)ownerDoc;
    fName = docImpl->getPooledString(aName);
    fNode.isSpecified(true);
}

所以,看着怪异,其实和一般的placement new没什么大的区别。

2) xerces-c++内存管理策略

通过上面被重载的new函数可以看到它调用了DOMDocumentImpl::allocate(amt, type), 而后者又会调用到:

void* DOMDocumentImpl::allocate(XMLSize_t amount)
{
  //	Align the request size so that suballocated blocks
  //	beyond this one will be maintained at the same alignment.
  amount = XMLPlatformUtils::alignPointerForNewBlockAllocation(amount);

  // If the request is for a largish block, hand it off to the system
  //   allocator.  The block still must be linked into the list of
  //   allocated blocks so that it will be deleted when the time comes.
  if (amount > kMaxSubAllocationSize)
  {
    //	The size of the header we add to our raw blocks
    XMLSize_t sizeOfHeader = XMLPlatformUtils::alignPointerForNewBlockAllocation(sizeof(void *));

    //	Try to allocate the block
    void* newBlock;
    newBlock = fMemoryManager->allocate(sizeOfHeader + amount);

    //	Link it into the list beyond current block, as current block
    //	is still being subdivided. If there is no current block
    //	then track that we have no bytes to further divide.
    if (fCurrentBlock)
    {
      *(void **)newBlock = *(void **)fCurrentBlock;
      *(void **)fCurrentBlock = newBlock;
    }
    else
    {
      *(void **)newBlock = 0;
      fCurrentBlock = newBlock;
      fFreePtr = 0;
      fFreeBytesRemaining = 0;
    }

    void *retPtr = (char*)newBlock + sizeOfHeader;
    return retPtr;
  }

  //	It's a normal (sub-allocatable) request.
  //	Are we out of room in our current block?
  if (amount > fFreeBytesRemaining)
  {
    // Request doesn't fit in the current block.
    // The size of the header we add to our raw blocks
    XMLSize_t sizeOfHeader = XMLPlatformUtils::alignPointerForNewBlockAllocation(sizeof(void *));

    // Get a new block from the system allocator.
    void* newBlock;
    newBlock = fMemoryManager->allocate(fHeapAllocSize);

    *(void **)newBlock = fCurrentBlock;
    fCurrentBlock = newBlock;
    fFreePtr = (char *)newBlock + sizeOfHeader;
    fFreeBytesRemaining = fHeapAllocSize - sizeOfHeader;

    if(fHeapAllocSize<kMaxHeapAllocSize)
      fHeapAllocSize*=2;
  }

  //	Subdivide the request off current block
  void *retPtr = fFreePtr;
  fFreePtr += amount;
  fFreeBytesRemaining -= amount;

  return retPtr;
}

这便是内存分配的核心代码,逻辑不复杂,可以总结为以下几点:

  1. 如果要分配的内存大于kMaxSubAllocationSize(0x0100)直接走原始的系统new函数。
  2. 否则上次分配的大块内存还有剩余且大于等于需要的,则用剩余的。
  3. 剩余的不够则新分配一大块内存,大小为fHeapAllocSize。
  4. 这些大块内存会组成链表,fCurrentBlock是头指针,fFreeBytesRemaining是当前大块内存剩余未用的字节数。

DOMDocumentImpl是对外的接口,要想创建节点(Node)就必须通过一系列的createXXX来创建(工厂模式?),比如createAttribute,createElement,而这些create函数都会走allocate函数。也就是说每个Attribute/Element实例都来自链表上的大块内存。这个策略让我想起了《Effecive C++》也有类似的代码。
Item 10: Write operator delete if you write operator new
这些节点中途不会释放,直到最后要释放整个Document时才一起释放。请参考以下代码:

DOMDocumentImpl::~DOMDocumentImpl()
{
...
//  Delete the heap for this document.  This uncerimoniously yanks the storage
    //      out from under all of the nodes in the document.  Destructors are NOT called.
    this->deleteHeap();``
 }`
 void DOMDocumentImpl::deleteHeap()
{
    while (fCurrentBlock != 0)
    {
        void *nextBlock = *(void **)fCurrentBlock;
        fMemoryManager->deallocate(fCurrentBlock);
        fCurrentBlock = nextBlock;
    }
}

3) xerces-c++为何耗费内存?

根本原因是描述节点、属性等的数据结构太大,可以想像成重型卡车(每个节点或属性)只拉一袋大米。
既然所有的节点、属性等类的实例内存分配都走allocate,那我们就让它打印出为哪个类分配了多少字节,看看每辆卡车自身多重?

void * DOMDocumentImpl::allocate(XMLSize_t amount, DOMMemoryManager::NodeObjectType type)
{
    static std::map<int, std::string> maps = {
        {DOMMemoryManager::NodeObjectType::ATTR_OBJECT , "ATTR_OBJECT"},
        {DOMMemoryManager::NodeObjectType::ATTR_NS_OBJECT , "ATTR_NS_OBJECT"},
        {DOMMemoryManager::NodeObjectType::CDATA_SECTION_OBJECT , "CDATA_SECTION_OBJECT"},
        {DOMMemoryManager::NodeObjectType::COMMENT_OBJECT , "COMMENT_OBJECT"},
        {DOMMemoryManager::NodeObjectType::DOCUMENT_FRAGMENT_OBJECT , "DOCUMENT_FRAGMENT_OBJECT"},
        {DOMMemoryManager::NodeObjectType::DOCUMENT_TYPE_OBJECT , "DOCUMENT_TYPE_OBJECT"},
        {DOMMemoryManager::NodeObjectType::ELEMENT_OBJECT , "ELEMENT_OBJECT"},
        {DOMMemoryManager::NodeObjectType::ELEMENT_NS_OBJECT , "ELEMENT_NS_OBJECT"},
        {DOMMemoryManager::NodeObjectType::ENTITY_OBJECT , "ENTITY_OBJECT"},
        {DOMMemoryManager::NodeObjectType::ENTITY_REFERENCE_OBJECT , "ENTITY_REFERENCE_OBJECT"},
        {DOMMemoryManager::NodeObjectType::NOTATION_OBJECT , "NOTATION_OBJECT"},
        {DOMMemoryManager::NodeObjectType::PROCESSING_INSTRUCTION_OBJECT , "PROCESSING_INSTRUCTION_OBJECT"},
        {DOMMemoryManager::NodeObjectType::TEXT_OBJECT , "TEXT_OBJECT"}
    };
    std::cout<<"New for "<<maps[type]<<" size=0x"<<std::hex<<amount<<std::endl;
    if (!fRecycleNodePtr)
        return allocate(amount);

    DOMNodePtr* ptr = fRecycleNodePtr->operator[](type);
    if (!ptr || ptr->empty())
        return allocate(amount);

    return (void*) ptr->pop();
}

一个简单的XML及日志:

<?xml version="1.0" encoding="utf-8" standalone="yes" ?>
<TopNode>
  <SectionOfDataA>
    <TestData>MEMORY COST 1 1</TestData>
    <TestData>MEMORY COST 2 1</TestData>
  </SectionOfDataA>
</TopNode>

New for ELEMENT_OBJECT size=0x68
New for TEXT_OBJECT size=0x38
New for ELEMENT_OBJECT size=0x68
New for TEXT_OBJECT size=0x38
New for ELEMENT_OBJECT size=0x68
New for TEXT_OBJECT size=0x38
New for TEXT_OBJECT size=0x38
New for ELEMENT_OBJECT size=0x68
New for TEXT_OBJECT size=0x38
New for TEXT_OBJECT size=0x38
New for TEXT_OBJECT size=0x38

可见DOMElementImpl的SIZE为0x68, DOMTextImpl为0x38, 这得顶多少个字符串!
打开DOMElementImpl.hpp便可以看到这个类包含多少member,这还不包括父类的。

// New data introduced in DOM Level 3
    const XMLCh*          fInputEncoding;
    const XMLCh*          fXmlEncoding;
    bool                  fXmlStandalone;
    const XMLCh*          fXmlVersion;
    const XMLCh*          fDocumentURI;
    DOMConfiguration*     fDOMConfiguration;

    XMLStringPool         fUserDataTableKeys;
    RefHash2KeysTableOf<DOMUserDataRecord, PtrHasher>* fUserDataTable;


    // Per-Document heap Variables.
    //   The heap consists of one or more biggish blocks which are
    //   sub-allocated for individual allocations of nodes, strings, etc.
    //   The big blocks form a linked list, allowing them to be located for deletion.
    //
    //   There is no provision for deleting suballocated blocks, other than
    //     deleting the entire heap when the document is deleted.
    //
    //   There is no header on individual sub-allocated blocks.
    //   The header on big blocks consists only of a single back pointer to
    //    the previously allocated big block (our linked list of big blocks)
    //
    //
    //   revisit - this heap should be encapsulated into its own
    //                  class, rather than hanging naked on Document.
    //
    void*                 fCurrentBlock;
    char*                 fFreePtr;
    XMLSize_t             fFreeBytesRemaining,
                          fHeapAllocSize;

    // To recycle the DOMNode pointer
    RefArrayOf<DOMNodePtr>* fRecycleNodePtr;

    // To recycle DOMBuffer pointer
    RefStackOf<DOMBuffer>* fRecycleBufferPtr;

    // Pool of DOMNodeList for getElementsByTagName
    DOMDeepNodeListPool<DOMDeepNodeListImpl>* fNodeListPool;

    // Other data
    DOMDocumentType*      fDocType;
    DOMElement*           fDocElement;

    DOMStringPoolEntry**  fNameTable;
    XMLSize_t             fNameTableSize;

    DOMNormalizer*        fNormalizer;
    Ranges*               fRanges;
    NodeIterators*        fNodeIterators;
    MemoryManager*        fMemoryManager;   // configurable memory manager
    DOMImplementation*    fDOMImplementation;

    int                   fChanges;
    bool                  errorChecking;    // Bypass error checking.

然而,只在DOMDocumentImpl::allocate(XMLSize_t amount, DOMMemoryManager::NodeObjectType type)上加log输出还不全,它仅仅统计了Node实例的分配,这一点在第二个参数type上有所体现。让我们仔细看看DOMElementImpl的构造函数:

DOMElementImpl::DOMElementImpl(DOMDocument *ownerDoc, const XMLCh *eName)
    : fNode(ownerDoc), fParent(ownerDoc), fAttributes(0), fDefaultAttributes(0)
{
    DOMDocumentImpl *docImpl = (DOMDocumentImpl *)ownerDoc;
    fName = docImpl->getPooledString(eName);           //(1)
    setupDefaultAttributes();
    if (!fDefaultAttributes) {
        fDefaultAttributes = new (docImpl) DOMAttrMapImpl(this); //(2)
        fAttributes = new (docImpl) DOMAttrMapImpl(this);   //(3)
    }
    else {
      fAttributes = new (docImpl) DOMAttrMapImpl(this, fDefaultAttributes);
    }
}

(1) 是为节点字符串分配内存,如“TopNode”,“SectionOfDataA”
(2)(3) 应该和节点的属性有关,这一点我没有验证。
docImpl->getPooledString 在大块内存的基础上又加入了防重复算法(hash的办法),此处不再细说。

最后,如果想把所有的内存分配揪出来,只需要在DOMDocumentImpl::allocate(XMLSize_t amount)内加点log即可,如下,可见耗内存之多。

New for ELEMENT_OBJECT size=0x68
subdivide from old block with amount=68 old fFreeBytesRemainin=3f10 this=000000000067FEF8
subdivide from old block with amount=20 old fFreeBytesRemainin=3ea8 this=000000000067FEF8
subdivide from old block with amount=20 old fFreeBytesRemainin=3e88 this=000000000067FEF8
subdivide from old block with amount=20 old fFreeBytesRemainin=3e68 this=000000000067FEF8
New for TEXT_OBJECT size=0x38
subdivide from old block with amount=38 old fFreeBytesRemainin=3e48 this=000000000067FEF8
subdivide from old block with amount=20 old fFreeBytesRemainin=3e10 this=000000000067FEF8
subdivide from old block with amount=28 old fFreeBytesRemainin=3df0 this=000000000067FEF8
New for ELEMENT_OBJECT size=0x68
subdivide from old block with amount=68 old fFreeBytesRemainin=3dc8 this=000000000067FEF8
subdivide from old block with amount=30 old fFreeBytesRemainin=3d60 this=000000000067FEF8
subdivide from old block with amount=20 old fFreeBytesRemainin=3d30 this=000000000067FEF8
subdivide from old block with amount=20 old fFreeBytesRemainin=3d10 this=000000000067FEF8
New for TEXT_OBJECT size=0x38
subdivide from old block with amount=38 old fFreeBytesRemainin=3cf0 this=000000000067FEF8
subdivide from old block with amount=20 old fFreeBytesRemainin=3cb8 this=000000000067FEF8
subdivide from old block with amount=30 old fFreeBytesRemainin=3c98 this=000000000067FEF8
New for ELEMENT_OBJECT size=0x68
subdivide from old block with amount=68 old fFreeBytesRemainin=3c68 this=000000000067FEF8
subdivide from old block with amount=20 old fFreeBytesRemainin=3c00 this=000000000067FEF8
subdivide from old block with amount=20 old fFreeBytesRemainin=3be0 this=000000000067FEF8
subdivide from old block with amount=20 old fFreeBytesRemainin=3bc0 this=000000000067FEF8
New for TEXT_OBJECT size=0x38
subdivide from old block with amount=38 old fFreeBytesRemainin=3ba0 this=000000000067FEF8
subdivide from old block with amount=20 old fFreeBytesRemainin=3b68 this=000000000067FEF8
subdivide from old block with amount=40 old fFreeBytesRemainin=3b48 this=000000000067FEF8
New for TEXT_OBJECT size=0x38
subdivide from old block with amount=38 old fFreeBytesRemainin=3b08 this=000000000067FEF8
subdivide from old block with amount=20 old fFreeBytesRemainin=3ad0 this=000000000067FEF8
subdivide from old block with amount=30 old fFreeBytesRemainin=3ab0 this=000000000067FEF8
New for ELEMENT_OBJECT size=0x68
subdivide from old block with amount=68 old fFreeBytesRemainin=3a80 this=000000000067FEF8
subdivide from old block with amount=20 old fFreeBytesRemainin=3a18 this=000000000067FEF8
subdivide from old block with amount=20 old fFreeBytesRemainin=39f8 this=000000000067FEF8
New for TEXT_OBJECT size=0x38
subdivide from old block with amount=38 old fFreeBytesRemainin=39d8 this=000000000067FEF8
subdivide from old block with amount=20 old fFreeBytesRemainin=39a0 this=000000000067FEF8
subdivide from old block with amount=40 old fFreeBytesRemainin=3980 this=000000000067FEF8
New for TEXT_OBJECT size=0x38
subdivide from old block with amount=38 old fFreeBytesRemainin=3940 this=000000000067FEF8
subdivide from old block with amount=20 old fFreeBytesRemainin=3908 this=000000000067FEF8
subdivide from old block with amount=28 old fFreeBytesRemainin=38e8 this=000000000067FEF8
New for TEXT_OBJECT size=0x38
subdivide from old block with amount=38 old fFreeBytesRemainin=38c0 this=000000000067FEF8
subdivide from old block with amount=20 old fFreeBytesRemainin=3888 this=000000000067FEF8
subdivide from old block with amount=28 old fFreeBytesRemainin=3868 this=000000000067FEF8
subdivide from old block with amount=40 old fFreeBytesRemainin=3840 this=000000000067FEF8
subdivide from old block with amount=40 old fFreeBytesRemainin=3800 this=000000000067FEF8
subdivide from old block with amount=18 old fFreeBytesRemainin=37c0 this=000000000067FEF8

4) demo

抽取了xerces-c++关于内存管理的代码,便于demo或学习使用,请移步下面的链接下载。

  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 打赏
    打赏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

深山老宅

鸡蛋不错的话,要不要激励下母鸡

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值