内存管理内幕--动态分配的选择、折衷和实现（三）

最新推荐文章于 2024-09-23 22:27:40 发布

kybd2006

最新推荐文章于 2024-09-23 22:27:40 发布

阅读量807

点赞数

文章标签： struct 数据结构 structure allocation pointers reference

半自动内存管理策略

引用计数

引用计数是一种 半自动（semi-automated）的内存管理技术，这表示它需要一些编程支持，但是它不需要您确切知道某一对象何时不再被使用。引用计数机制为您完成内存管理任务。

在引用计数中，所有共享的数据结构都有一个域来包含当前活动“引用”结构的次数。当向一个程序传递一个指向某个数据结构指针时，该程序会将引用计数增加 1。实质上，您是在告诉数据结构，它正在被存储在多少个位置上。然后，当您的进程完成对它的使用后，该程序就会将引用计数减少 1。结束这个动作之后，它还会检查计数是否已经减到零。如果是，那么它将释放内存。

这样做的好处是，您不必追踪程序中某个给定的数据结构可能会遵循的每一条路径。每次对其局部的引用，都将导致计数的适当增加或减少。这样可以防止在使用数据结构时释放该结构。不过，当您使用某个采用引用计数的数据结构时，您必须记得运行引用计数函数。另外，内置函数和第三方的库不会知道或者可以使用您的引用计数机制。引用计数也难以处理发生循环引用的数据结构。

要实现引用计数，您只需要两个函数 —— 一个增加引用计数，一个减少引用计数并当计数减少到零时释放内存。

一个示例引用计数函数集可能看起来如下所示：

清单 9. 基本的引用计数函数

        
/* Structure Definitions*/
/* Base structure that holds a refcount */
struct refcountedstruct
{
	int refcount;
}
/* All refcounted structures must mirror struct
 * refcountedstruct for their first variables
 */
/* Refcount maintenance functions */
/* Increase reference count */
void REF(void *data)
{
	struct refcountedstruct *rstruct;
	rstruct = (struct refcountedstruct *) data;
	rstruct->refcount++;
}
/* Decrease reference count */
void UNREF(void *data)
{
	struct refcountedstruct *rstruct;
	rstruct = (struct refcountedstruct *) data;
	rstruct->refcount--;
	/* Free the structure if there are no more users */
	if(rstruct->refcount == 0)
	{
		free(rstruct);
	}
}

REF 和 UNREF 可能会更复杂，这取决于您想要做的事情。例如，您可能想要为多线程程序增加锁，那么您可能想扩展 refcountedstruct，使它同样包含一个指向某个在释放内存之前要调用的函数的指针（类似于面向对象语言中的析构函数 —— 如果您的结构中包含这些指针，那么这是 必需的）。

当使用 REF 和 UNREF 时，您需要遵守这些指针的分配规则：

UNREF 分配前左端指针（left-hand-side pointer）指向的值。
REF 分配后左端指针（left-hand-side pointer）指向的值。

在传递使用引用计数的结构的函数中，函数需要遵循以下这些规则：

在函数的起始处 REF 每一个指针。
在函数的结束处 UNREF 第一个指针。

以下是一个使用引用计数的生动的代码示例：

清单 10. 使用引用计数的示例

        
/* EXAMPLES OF USAGE */
/* Data type to be refcounted */
struct mydata
{
	int refcount; /* same as refcountedstruct */
	int datafield1; /* Fields specific to this struct */
	int datafield2;
	/* other declarations would go here as appropriate */
};
/* Use the functions in code */
void dosomething(struct mydata *data)
{
	REF(data);
	/* Process data */
	/* when we are through */
	UNREF(data);
}
struct mydata *globalvar1;
/* Note that in this one, we don't decrease the
 * refcount since we are maintaining the reference
 * past the end of the function call through the
 * global variable
 */
void storesomething(struct mydata *data)
{
	REF(data); /* passed as a parameter */
	globalvar1 = data;
	REF(data); /* ref because of Assignment */
	UNREF(data); /* Function finished */
}

由于引用计数是如此简单，大部分程序员都自已去实现它，而不是使用库。不过，它们依赖于 malloc 和 free 等低层的分配程序来实际地分配和释放它们的内存。

在 Perl 等高级语言中，进行内存管理时使用引用计数非常广泛。在这些语言中，引用计数由语言自动地处理，所以您根本不必担心它，除非要编写扩展模块。由于所有内容都必须进行引用计数，所以这会对速度产生一些影响，但它极大地提高了编程的安全性和方便性。以下是引用计数的益处：

实现简单。
易于使用。
由于引用是数据结构的一部分，所以它有一个好的缓存位置。

不过，它也有其不足之处：

要求您永远不要忘记调用引用计数函数。
无法释放作为循环数据结构的一部分的结构。
减缓几乎每一个指针的分配。
尽管所使用的对象采用了引用计数，但是当使用异常处理（比如 try 或 setjmp()/ longjmp()）时，您必须采取其他方法。
需要额外的内存来处理引用。
引用计数占用了结构中的第一个位置，在大部分机器中最快可以访问到的就是这个位置。
在多线程环境中更慢也更难以使用。

C++ 可以通过使用 智能指针（smart pointers）来容忍程序员所犯的一些错误，智能指针可以为您处理引用计数等指针处理细节。不过，如果不得不使用任何先前的不能处理智能指针的代码（比如对 C 库的联接），实际上，使用它们的后果通实比不使用它们更为困难和复杂。因此，它通常只是有益于纯 C++ 项目。如果您想使用智能指针，那么您实在应该去阅读 Alexandrescu 撰写的 Modern C++ Design 一书中的“Smart Pointers”那一章。

内存池

内存池是另一种半自动内存管理方法。内存池帮助某些程序进行自动内存管理，这些程序会经历一些特定的阶段，而且每个阶段中都有分配给进程的特定阶段的内存。例如，很多网络服务器进程都会分配很多针对每个连接的内存 —— 内存的最大生存期限为当前连接的存在期。Apache 使用了池式内存（pooled memory），将其连接拆分为各个阶段，每个阶段都有自己的内存池。在结束每个阶段时，会一次释放所有内存。

在池式内存管理中，每次内存分配都会指定内存池，从中分配内存。每个内存池都有不同的生存期限。在 Apache 中，有一个持续时间为服务器存在期的内存池，还有一个持续时间为连接的存在期的内存池，以及一个持续时间为请求的存在期的池，另外还有其他一些内存池。因此，如果我的一系列函数不会生成比连接持续时间更长的数据，那么我就可以完全从连接池中分配内存，并知道在连接结束时，这些内存会被自动释放。另外，有一些实现允许注册 清除函数（cleanup functions），在清除内存池之前，恰好可以调用它，来完成在内存被清理前需要完成的其他所有任务（类似于面向对象中的析构函数）。

要在自己的程序中使用池，您既可以使用 GNU libc 的 obstack 实现，也可以使用 Apache 的 Apache Portable Runtime。GNU obstack 的好处在于，基于 GNU 的 Linux 发行版本中默认会包括它们。Apache Portable Runtime 的好处在于它有很多其他工具，可以处理编写多平台服务器软件所有方面的事情。要深入了解 GNU obstack 和 Apache 的池式内存实现，请参阅参考资料部分中指向这些实现的文档的链接。

下面的假想代码列表展示了如何使用 obstack：

清单 11. obstack 的示例代码

        
#include <obstack.h>
#include <stdlib.h>
/* Example code listing for using obstacks */
/* Used for obstack macros (xmalloc is
   a malloc function that exits if memory
   is exhausted */
#define obstack_chunk_alloc xmalloc
#define obstack_chunk_free free
/* Pools */
/* Only permanent allocations should go in this pool */
struct obstack *global_pool;
/* This pool is for per-connection data */
struct obstack *connection_pool;
/* This pool is for per-request data */
struct obstack *request_pool;
void allocation_failed()
{
	exit(1);
}
int main()
{
	/* Initialize Pools */
	global_pool = (struct obstack *)
		xmalloc (sizeof (struct obstack));
	obstack_init(global_pool);
	connection_pool = (struct obstack *)
		xmalloc (sizeof (struct obstack));
	obstack_init(connection_pool);
	request_pool = (struct obstack *)
		xmalloc (sizeof (struct obstack));
	obstack_init(request_pool);
	/* Set the error handling function */
	obstack_alloc_failed_handler = &allocation_failed;
	/* Server main loop */
	while(1)
	{
		wait_for_connection();
		/* We are in a connection */
		while(more_requests_available())
		{
			/* Handle request */
			handle_request();
			/* Free all of the memory allocated
			 * in the request pool
			 */
			obstack_free(request_pool, NULL);
		}
		/* We're finished with the connection, time
		 * to free that pool
		 */
		obstack_free(connection_pool, NULL);
	}
}
int handle_request()
{
	/* Be sure that all object allocations are allocated
	 * from the request pool
	 */
	int bytes_i_need = 400;
	void *data1 = obstack_alloc(request_pool, bytes_i_need);
	/* Do stuff to process the request */
	/* return */
	return 0;
}

基本上，在操作的每一个主要阶段结束之后，这个阶段的 obstack 会被释放。不过，要注意的是，如果一个过程需要分配持续时间比当前阶段更长的内存，那么它也可以使用更长期限的 obstack，比如连接或者全局内存。传递给 obstack_free() 的 NULL 指出它应该释放 obstack 的全部内容。可以用其他的值，但是它们通常不怎么实用。

使用池式内存分配的益处如下所示：

应用程序可以简单地管理内存。
内存分配和回收更快，因为每次都是在一个池中完成的。分配可以在 O(1) 时间内完成，释放内存池所需时间也差不多（实际上是 O(n) 时间，不过在大部分情况下会除以一个大的因数，使其变成 O(1)）。
可以预先分配错误处理池（Error-handling pools），以便程序在常规内存被耗尽时仍可以恢复。
有非常易于使用的标准实现。

池式内存的缺点是：