转载一篇内存管理的文章 - Custom memory allocation

C++: Custom memory allocation

By Tiago Costa | Published Apr 15 2013 02:07 PM in General Programming 
Peer Reviewed by (GuardianXincertiaJosh Vega)


Fast memory allocations along with memory leak detection can have a big impact on games performance.

C++ provides two well known functions to allocate dynamic (heap) memory ( malloc and  new), these functions are usually very slow because they're general purpose functions and in some implementations require a context-switch from user mode into kernel mode. These functions also do not provide any kind of memory leak detection system natively.

Using custom allocators we can have well defined usage patterns and optimize the allocation process accordingly.

Full source code and performance tests in the attached file

Base Allocator


Every allocator in this articles series will be derived from the class Allocator that declares 2 virtual functions ( allocate and  deallocate) that must be defined by each allocator.

Allocator.h
classAllocator{public:Allocator(){
		_usedMemory     =0;
		_numAllocations =0;}virtual~Allocator(){
		ASSERT(_numAllocations ==0&& _usedMemory ==0);}virtualvoid* allocate(u32 size, u8 alignment)=0;virtualvoid deallocate(void* p)=0;template<class T> T* allocateNew(){returnnew(allocate(sizeof(T), __alignof(T))) T;}template<class T> T* allocateNew(const T& t){returnnew(allocate(sizeof(T), __alignof(T))) T(t);}template<class T>void deallocateDelete(T* pObject){if(pObject !=nullptr){
			pObject->~T();
			deallocate(pObject);}}template<class T> T* allocateArray(u32 length){if(length ==0)returnnullptr;

		u32 headerSize =sizeof(u32)/sizeof(T);if(sizeof(u32)%sizeof(T)>0)
			headerSize +=1;//Allocate extra space to store array length in the bytes before the array
		T* p =((T*) allocate(sizeof(T)*(length + headerSize), __alignof(T)))+ headerSize;*(((u32*)p)-1)= length;for(u32 i =0; i < length; i++)new(&p[i]) T;return p;}template<class T>void deallocateArray(T* pArray){if(pArray ==nullptr)return;

		u32 length =*(((u32*)pArray)-1);for(int i = length-1; i >=0; i--)
			pArray[i].~T();//Calculate how much extra memory was allocated to store the length before the array
		u32 headerSize =sizeof(u32)/sizeof(T);if(sizeof(u32)%sizeof(T)>0)
			headerSize +=1;

		deallocate(pArray - headerSize);}

	u32 getUsedMemory(){return _usedMemory;}

	u32 getNumAllocations(){return _numAllocations;}protected:
	u32        _usedMemory;
	u32        _numAllocations;};

Memory leak detection


In the code above you can see an assert in the destructor, this is a simple and easy way to check if you forgot to deallocate any memory, that won't cause any overhead or take any extra memory.

This simple method won't tell which allocation you forgot to deallocate but it will pin point in which allocator the leak occured so you can find the leak faster (especially if you use Proxy Allocators like I suggest later in this article).

Aligned Allocations


Processors access memory in word-sized blocks, so when a processor tries to access memory in an unaligned address it might have to access more word-sized memory blocks than would be needed if the memory was aligned and perform masking/shifting to get the required data in the register.

Example:
A processor accesses memory in 4-byte words (it can only directly access the words starting at (0x00, 0x04, 0x08, 0x0C,...).

If it needs to access data (4 bytes) stored at the address  0x0B it will have to read two word-sized blocks (the address  0x08 and the address 0x0C) because the data crosses the boundary from one of the blocks to the other:

Attached Image: alignment.jpg

If the data was stored in an aligned address like  0x0C the processor could read it in a single memory access:

Attached Image: alignment2.jpg

Aligned data definition


Primitive data is said to be aligned if the memory address where it is stored is a multiple of the size of the primitive.

A data aggregate is said to be aligned if each primitive element in the aggregate is aligned.

Implementation


To  n-byte align a memory address  x we need to mask off the log2( n) least significant bits from  x

Simply masking off bits will return the first  n-byte aligned address before  x, so in order to find the first after  x we just need to add  alignment-1 to  xand mask that address.

inlinevoid* nextAlignedAddress(void* pAddress, u8 alignment){return(void*)(((uptr)pAddress +(alignment-1))&~(alignment-1));}

It can be useful to calculate by how many bytes the address needs to adjusted to be aligned.

inline u8 alignAdjustment(void* pAddress, u8 alignment){
    u8 adjustment =  alignment -((uptr)pAddress &(alignment-1));
    
    if(adjustment == alignment)
        return0;//already aligned
    
    return adjustment;}

Some allocators need to store an header before each allocation so they can use the adjustment space to reduce the memory overhead caused by the headers.

inline u8 alignAdjustmentWithHeader(void* pAddress, u8 alignment, u8 headerSize){
    u8 adjustment =  alignment -((uptr)pAddress &(alignment-1));
    
    if(adjustment == alignment)
        adjustment =0;//already aligned

	u8 neededSpace = headerSize;if(adjustment < neededSpace){
		neededSpace -= adjustment;

		adjustment += alignment *(neededSpace / alignment);if(neededSpace % alignment >0)
			adjustment += alignment;}
    
    return adjustment;}

Note:  The alignment must be a power of 2!


Linear Allocator


A Linear Allocator is the simplest and fastest type of allocator. Pointers to the start of the allocator, to the first free address and the total size of the allocator are maintained. 

Allocations


New allocations simply move the pointer to the first free address forward.

Deallocations


Individual deallocations cannot be made in linear allocators, instead use  clear() to completely clear the memory used by the allocator.

Implementation


LinearAllocator.h
#include"Allocator.h"#include"Types.h"classLinearAllocator:publicAllocator{public:LinearAllocator(u32 size,void* pStart);~LinearAllocator();void* allocate(u32 size, u8 alignment);void deallocate(void* p);void clear();private:LinearAllocator(constLinearAllocator&){};//Prevent copies because it might cause errorsLinearAllocator&operator=(constLinearAllocator&){};void* _pInitialPosition;void* _pCurrentPosition;

	u32   _size;};

LinearAllocator.cpp
#include"LinearAllocator.h"#include"Debug.h"LinearAllocator::LinearAllocator(u32 size,void* pStart):Allocator(), _size(size), _pInitialPosition(pStart), _pCurrentPosition(pStart){
	ASSERT(size >0);}LinearAllocator::~LinearAllocator(){
	_pInitialPosition   =nullptr;
	_pCurrentPosition   =nullptr;

	_size               =0;}void*LinearAllocator::allocate(u32 size, u8 alignment){
	ASSERT(size !=0);

	u8 adjustment =  alignAdjustment(_pCurrentPosition, alignment);if(_usedMemory + adjustment + size > _size)returnnullptr;

	uptr alignedAddress =(uptr)_pCurrentPosition + adjustment;

	_pCurrentPosition =(void*)(alignedAddress + size);

	_usedMemory += size + adjustment;
	_numAllocations++;return(void*)alignedAddress;}voidLinearAllocator::deallocate(void* p){
	ASSERT(false&&"Use clear() instead");}voidLinearAllocator::clear(){
	_numAllocations     =0;
	_usedMemory         =0;

	_pCurrentPosition   = _pInitialPosition;}

Stack Allocator


A Stack Allocator, like the name says, works like a stack. Along with the stack size, three pointers are maintained:
  • Pointer to the start of the stack.
  • Pointer to the top of the stack.
  • Pointer to the last allocation made. (This is optional in release builds)

Allocations


New allocations move the pointer up by the requested number of bytes plus the adjustment needed to align the address and store the  allocation header.

The  allocation header provides the following information:
  • Adjustment used in this allocation
  • Pointer to the previous allocation.

Deallocations


Note:  Memory must be deallocated in inverse order it was allocated! So if you allocate object A and then object B you must free object B memory before you can free object A memory.


To deallocate memory the allocator checks if the address to the memory that you want to deallocate corresponds to the address of the last allocation made.

If so the allocator accesses the  allocation header so it also frees the memory used to align the allocation and store the  allocation header, and it replaces the pointer to the last allocation made with the one in the  allocation header.

Implementation


StackAllocator.h
#include"Allocator.h"#include"Types.h"classStackAllocator:publicAllocator{public:StackAllocator(u32 size,void* pStart);~StackAllocator();void* allocate(u32 size, u8 alignment);void deallocate(void* p);private:StackAllocator(constStackAllocator&){};//Prevent copies because it might cause errorsStackAllocator&operator=(constStackAllocator&){};structAllocationHeader{#if _DEBUGvoid* pPrevAddress;#endif
		u8 adjustment;};void* _pInitialPosition;#if _DEBUGvoid* _pPrevPosition;#endifvoid* _pCurrentPosition;

	u32   _size;};

StackAllocator.cpp
#include"StackAllocator.h"#include"Debug.h"StackAllocator::StackAllocator(u32 size,void* pStart):Allocator(), _size(size), _pInitialPosition(pStart), _pCurrentPosition(pStart){
	ASSERT(size >0);#if _DEBUG
	_pPrevPosition    =nullptr;#endif}StackAllocator::~StackAllocator(){
	_pInitialPosition   =nullptr;#if _DEBUG
	_pPrevPosition      =nullptr;#endif

	_pCurrentPosition   =nullptr;

	_size               =0;}void*StackAllocator::allocate(u32 size, u8 alignment){
	ASSERT(size !=0);

	u8 adjustment = alignAdjustmentWithHeader(_pCurrentPosition, alignment,sizeof(AllocationHeader));if(_usedMemory + adjustment + size > _size)returnnullptr;

	uptr alignedAddress =(uptr)_pCurrentPosition + adjustment;//Add Allocation HeaderAllocationHeader* pHeader =(AllocationHeader*)(alignedAddress-sizeof(AllocationHeader));

	pHeader->adjustment   = adjustment;#if _DEBUG
	pHeader->pPrevAddress = _pPrevPosition;

	_pPrevPosition    =(void*)alignedAddress;#endif

	_pCurrentPosition =(void*)(alignedAddress + size);

	_usedMemory += size + adjustment;
	_numAllocations++;return(void*)alignedAddress;}voidStackAllocator::deallocate(void* p){
	ASSERT( p == _pPrevPosition );//Access the AllocationHeader in the bytes before pAllocationHeader* pHeader =(AllocationHeader*)((uptr)p -sizeof(AllocationHeader));

	_usedMemory -=(uptr)_pCurrentPosition -(uptr)p + pHeader->adjustment;

	_pCurrentPosition =(void*)((uptr)p - pHeader->adjustment );#if _DEBUG
	_pPrevPosition = pHeader->pPrevAddress;#endif

	_numAllocations--;}

Note:  Storing the last previous allocations in a list-like fashion and checking it before deallocations is not mandatory so it can be disabled in release builds. It's just helpful to prevent memory from being overwritten causing bugs.



FreeList Allocator


The FreeList allocator allows allocations of any size to be made (inside the available memory) and deallocations in any order.

A linked-list of free blocks of memory is maintained (each free block contains information about its size and a pointer to the next free block).

Allocations


The allocator tries to find a free block large enough for the allocation to fit, if it finds multiple free blocks that meet the requeriments, there's 3 simple ways to decide which free block to choose:

  • First-fit - Use the first.
  • Best-fit - Use the smallest.
  • Worst-fit - Use the largest.

Note:  The best-fit method will in most cases cause less fragmentation than the other 2 methods.



In the example implementation below I use the first-fit method.

Deallocation


The allocators recursively finds free blocks adjacent to the allocation being deallocated and merges them together.

If it doesn't finds any adjacent free blocks it creates a new free block and adds it to the list.

Implementation


FreeListAllocator.h
#include"Allocator.h"#include"Types.h"classFreeListAllocator:publicAllocator{public:FreeListAllocator(u32 size,void* pStart);~FreeListAllocator();void* allocate(u32 size, u8 alignment);void deallocate(void* p);private:structAllocationHeader{
		u32 size;
		u32 adjustment;};structFreeBlock{
		u32 size;FreeBlock* pNext;};FreeListAllocator(constFreeListAllocator&){};//Prevent copies because it might cause errorsFreeListAllocator&operator=(constFreeListAllocator&){};FreeBlock* _pFreeBlocks;};

FreeListAllocator.cpp
#include"FreeListAllocator.h"#include"Debug.h"FreeListAllocator::FreeListAllocator(u32 size,void* pStart):Allocator(), _pFreeBlocks((FreeBlock*)pStart){
	ASSERT(size >sizeof(FreeBlock));

	_pFreeBlocks->size  = size;
	_pFreeBlocks->pNext =nullptr;}FreeListAllocator::~FreeListAllocator(){
	_pFreeBlocks        =nullptr;}void*FreeListAllocator::allocate(u32 size, u8 alignment){
	ASSERT(size !=0);//Check free blocksFreeBlock* pPrevFreeBlock =nullptr;FreeBlock* pFreeBlock     = _pFreeBlocks;while(pFreeBlock){//Calculate adjustment needed to keep object correctly aligned
		u8 adjustment = alignAdjustmentWithHeader(pFreeBlock, alignment,sizeof(AllocationHeader));//If allocation doesn't fit in this FreeBlock, try the nextif(pFreeBlock->size < size + adjustment){
			pPrevFreeBlock = pFreeBlock;
			pFreeBlock = pFreeBlock->pNext;continue;}

		ASSERT(sizeof(AllocationHeader)>=sizeof(FreeBlock));//If allocations in the remaining memory will be impossibleif(pFreeBlock->size - size - adjustment <=sizeof(AllocationHeader)){//Increase allocation size instead of creating a new FreeBlock
			size = pFreeBlock->size;if(pPrevFreeBlock !=nullptr)
				pPrevFreeBlock = pFreeBlock->pNext;else
				_pFreeBlocks = pFreeBlock->pNext;}else{//Else create a new FreeBlock containing remaining memoryFreeBlock* pNextBlock =(FreeBlock*)((uptr)pFreeBlock + size + adjustment );
			pNextBlock->size = pFreeBlock->size - size - adjustment;
			pNextBlock->pNext = pFreeBlock->pNext;if(pPrevFreeBlock !=nullptr)
				pPrevFreeBlock = pNextBlock;else
				_pFreeBlocks = pNextBlock;}

		uptr alignedAddress =(uptr)pFreeBlock + adjustment;AllocationHeader* pHeader =(AllocationHeader*)(alignedAddress -sizeof(AllocationHeader));
		pHeader->size             = size + adjustment;
		pHeader->adjustment       = adjustment;

		_usedMemory     += adjustment + size;
		_numAllocations++;return(void*)alignedAddress;}

	ASSERT(false&&"Couldn't find free block large enough!");returnnullptr;}voidFreeListAllocator::deallocate(void* p){AllocationHeader* pHeader =(AllocationHeader*)((uptr)p -sizeof(AllocationHeader));

	u32 size = pHeader->size;

	uptr blockStart =(uptr)p - pHeader->adjustment;
	uptr blockEnd   = blockStart + size;
	u32 blockSize   = size;bool blockMerged =false;//Find adjacent free blocks and mergebool search =true;while(search){
		search =false;FreeBlock* pPrevFreeBlock =nullptr;FreeBlock* pFreeBlock = _pFreeBlocks;while(pFreeBlock !=nullptr){if((uptr)pFreeBlock + pFreeBlock->size == blockStart ){
				pFreeBlock->size += blockSize;

				blockStart =(uptr)pFreeBlock;
				blockEnd   = blockStart + pFreeBlock->size;
				blockSize  = pFreeBlock->size;

				search =true;
				blockMerged =true;break;}elseif(blockEnd ==(uptr) pFreeBlock){FreeBlock* pNewFreeBlock =(FreeBlock*) blockStart;
				pNewFreeBlock->pNext = pFreeBlock->pNext;
				pNewFreeBlock->size = blockSize + pFreeBlock->size;if(pFreeBlock == _pFreeBlocks)
					_pFreeBlocks = pNewFreeBlock;elseif(pPrevFreeBlock != pNewFreeBlock)
					pPrevFreeBlock->pNext = pNewFreeBlock;

				blockStart =(uptr)pNewFreeBlock;
				blockEnd   = blockStart + pNewFreeBlock->size;
				blockSize  = pNewFreeBlock->size;

				search =true;
				blockMerged =true;break;}

			pPrevFreeBlock = pFreeBlock;
			pFreeBlock = pFreeBlock->pNext;}}if(!blockMerged){FreeBlock* pBlock         =(FreeBlock*)((uptr)p - pHeader->adjustment);
		pBlock->size              = blockSize;
		pBlock->pNext             = _pFreeBlocks;

		_pFreeBlocks              = pBlock;}

	_numAllocations--;
	_usedMemory -= size;}

Pool Allocator


This allocator only allows allocations of a fixed size and alignment to be made, this results in both fast allocations and deallocations to be made. 

Like the FreeList allocator, a linked-list of free blocks is maintaied but since all blocks are the same size each free block only needs to store a pointer to the next one.

Another advantage of Pool allactors is no need to align each allocation, since all allocations have the same size/alignment only the first block has to be aligned, this results in a almost non-existant memory overhead.

Note:  
The block size of the Pool Allocator must be larger than sizeof(void*) because when blocks are free they store a pointer to the next free block in the list.



Allocations


The allocator simply returns the first free block and updates the linked list.

Deallocations


The allocator simply adds the deallocated block to the free blocks linked list.

Implementation


PoolAllocator.h
#include"Allocator.h"#include"Types.h"classPoolAllocator:publicAllocator{public:PoolAllocator(u32 objectSize, u8 objectAlignment, u32 size,void* pMem);~PoolAllocator();void* allocate(u32 size, u8 alignment);void deallocate(void* p);private:PoolAllocator(constPoolAllocator&){};//Prevent copies because it might cause errorsPoolAllocator&operator=(constPoolAllocator&){};

	u32        _size;
	u32        _objectSize;
	u8         _objectAlignment;void**     _pFreeList;};

PoolAllocator.cpp
#include"PoolAllocator.h"#include"Debug.h"PoolAllocator::PoolAllocator(u32 objectSize, u8 objectAlignment, u32 size,void* pMem):Allocator(), _objectSize(objectSize), _objectAlignment(objectAlignment), _size(size){
	ASSERT(objectSize >=sizeof(void*));//Calculate adjustment needed to keep object correctly aligned
	u8 adjustment = alignAdjustment(pMem, objectAlignment);

	_pFreeList =(void**)((uptr)pMem + adjustment);

	u32 numObjects =(size-adjustment)/objectSize;void** p = _pFreeList;//Initialize free blocks listfor(u32 i =0; i < numObjects-1; i++){*p =(void*)((uptr) p + objectSize );
		p =(void**)*p;}*p =nullptr;}PoolAllocator::~PoolAllocator(){
	_pFreeList =nullptr;}void*PoolAllocator::allocate(u32 size, u8 alignment){
	ASSERT(size == _objectSize && alignment == _objectAlignment);if(_pFreeList ==nullptr)returnnullptr;void* p = _pFreeList;

	_pFreeList =(void**)(*_pFreeList);

	_usedMemory += size;
	_numAllocations++;return p;}voidPoolAllocator::deallocate(void* p){*((void**)p)= _pFreeList;

	_pFreeList =(void**)p;

	_usedMemory -= _objectSize;
	_numAllocations--;}

Proxy Allocator


A Proxy Allocator is a special kind of allocator. It is just used to help with memory leak and subsystem memory usage tracking.

It will simply redirect all allocations/deallocations to the allocator passed as argument in the constructor while keeping track of how many allocations it made and how much memory it is "using".

Example:
Two subsystems use the same allocator  A.
If you want to show in the debugging user interface how much memory each subsystem is using, you create a proxy allocator, that redirects all allocations/deallocations to  A, in each subsystem and track their memory usage.

It will also help in memory leak tracking because the assert in the proxy allocator destructor of the subsystem that is leaking memory will fail.

Implementation


ProxyAllocator.h
#include"Allocator.h"classProxyAllocator:publicAllocator{public:ProxyAllocator(Allocator* pAllocator);
    ~ProxyAllocator();
    
    void* allocate(size_t size,size_t alignment);
    
    void deallocate(void* p);
    
private:
    ProxyAllocator(constProxyAllocator&){};//Prevent copies because it might cause errors
    ProxyAllocator&operator=(constProxyAllocator&){};
    
    Allocator* _pAllocator;};

ProxyAllocator.cpp
#include"ProxyAllocator.h"#include"Debug.h"ProxyAllocator::ProxyAllocator(Allocator* pAllocator):Allocator(), _pAllocator(pAllocator){
	ASSERT(pAllocator != NULL);}ProxyAllocator::~ProxyAllocator(){
	_pAllocator =nullptr;}void*ProxyAllocator::allocate(size_t size,size_t alignment){
	ASSERT(_pAllocator != NULL);

	_numAllocations++;

	u32 mem = _pAllocator->getUsedMemory();void* p = _pAllocator->allocate(size, alignment);

	_usedMemory += _pAllocator->getUsedMemory()- mem;return p;}voidProxyAllocator::deallocate(void* p){
	ASSERT(_pAllocator != NULL);

	_numAllocations--;

	u32 mem = _pAllocator->getUsedMemory();

	_pAllocator->deallocate(p);

	_usedMemory -= mem - _pAllocator->getUsedMemory();}

Allocator Managment


A large block of memory should be allocated when the program starts using  malloc (and this should be the only  malloc made) this large block of memory is managed by a global allocator (for example a stack allocator).

Each subsystem should then allocate the block of memory it needs to work from the global allocator, and create allocators that will manage that memory.

Example usage


  • Allocate 1GB of memory using malloc and create a FreeList allocator to manage that memory.
  • Create a Proxy allocator that redirects all allocations to the FreeList allocator.
  • Initialize the Resource Manager by passing a pointer to the Proxy allocator in the constructor.
  • Register the Proxy allocator in the memory usage tracker so it shows how much memory the Resource Manager is using.
  • Allocate 16MB of memory using the FreeList allocator and create a Linear allocator to manage that memory and register it in the memory usage tracker.
  • Use the Linear allocator to make small temporary allocations needed for game logic, etc, and clear it before the end of each frame.
  • The Resource Manager will create a Pool allocator for every ResourcePackage it loads.

Tips & Tricks


  • Depending on the type of allocator, keep the number of individual allocations to a minimum to reduce the memory wasted by allocation headers.
  • Prefer using allocateArray() to individual allocations when it makes sense. Most allocators will use extra memory in each allocation to storeallocation headers and arrays will only need single header.
  • Instead of making small size allocations from allocators with large amounts of memory available, allocate a single memory block capable of holding all the small allocations and create a new allocator to manage the memory block and make the small allocations from this block.

Performance Comparison


To test the performance of each allocator compared to  malloc I wrote a program that measures how long it takes to make 20000 allocations (you can download the program in the end of the article), the tests where made in release mode and the results are averages of 3 runs.

Malloc vs Linear Allocator


10k 16 bytes allocations + 1k 256 bytes allocations + 50 2Mb allocations/deallocations (allocations made using the linear allocator are deallocated in a single call to  clear().
AllocatorTime (s)
Malloc0.639655
Linear0.000072

Malloc vs Stack Allocator


10k 16 bytes allocations + 1k 256 bytes allocations + 50 2Mb allocations/deallocations
AllocatorTime (s)
Malloc0.650435
Stack0.000289

Malloc vs FreeList Allocator


10k 16 bytes allocations + 1k 256 bytes allocations + 50 2Mb allocations/deallocations
AllocatorTime (s)
Malloc0.673865
FreeList0.000414

Malloc vs Pool Allocator


20k 16 bytes allocations/deallocations
AllocatorTime (s)
Malloc1.454934
Pool0.000193

Conclusion


There isn't a single best allocator - it's important to think about how the memory will be allocated/accessed/deallocated and choose the right allocator for each situation.

Full source code and performance tests in the attached file

Reference


http://bitsquid.blogspot.pt/2010/09/custom-memory-allocation-in-c.html
Game Engine Architecture, Jason Gregory 2009
http://molecularmusings.wordpress.com/2011/07/05/memory-system-part-1/

转载于:https://www.cnblogs.com/RobinG/archive/2013/05/05/3060539.html

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值