CMU 15-213 CSAPP. Ch11. Dynamic Memory Allocation

菜=原罪

已于 2022-11-28 01:25:59 修改

阅读量459

点赞数

分类专栏：操作系统文章标签：操作系统 CSAPP

于 2022-11-28 01:23:29 首次发布

本文链接：https://blog.csdn.net/qq_34707209/article/details/125571664

版权

操作系统专栏收录该内容

13 篇文章 2 订阅

订阅专栏

CMU 15-213 CSAPP (Ch1~Ch3)
CMU 15-213 CSAPP (Ch5~Ch7)
CMU 15-213 CSAPP (Ch8)
CMU 15-213 CSAPP (Ch9)
CMU 15-213 CSAPP (Ch10)
CMU 15-213 CSAPP (Ch11)
视频链接
 课件链接
 课程补充
该课程使用 64位编译器！

Ch11. Dynamic Memory Allocation

11.1 Basic concepts

应用程序通过 Allocator ( 如 malloc ) 在 application 运行后指定 virtual memory size，获取并释放 “堆” ( Heap ) 中的虚拟内存块；
Allocator maintaiins the heap as a contiguous collection of blocks；
- ”已分配使用“
- ”已释放待使用“；
Allocators 主要分为两类：
- 显式 ( explicit )，如 C 语言中的 malloc 和 free，分配与释放完全由 application 自行控制；
- 隐式 ( implicit )，如 Java、ML、Lisp 等，application 分配，但由 system 的 ”垃圾回收“ ( Garbage collection ) 释放内存；

11.1.1 The malloc Package

Allocators in C is provided by standard C library, “malloc package”；

#include <stdlib.h>
void *malloc(size_t size);
void free(void *p);
void *calloc(size_t nmemb, size_t size); 
void *realloc(void *ptr, size_t size);  
void *reallocarray(void *ptr, size_t nmemb, size_t size);

#include <unistd.h>  
int brk(void *addr); 
void *sbrk(intptr_t increment);

/********** annotation **********/
brk()  and  sbrk() change the location of the program break, which defines 
the end of the process's data segment (i.e., the program break is the first 
location after the end of the uninitialized data segment).  Increasing the 
program break has the effect of  allocating  memory  to the process; decreasing 
the break deallocates memory.

brk()  sets the end of the data segment to the value specified by addr, when 
that value is reasonable, the system has enough memory, and the process does 
not exceed its maximum data size (see setrlimit(2)).
  
sbrk() increments the program's data space by increment bytes.  Calling sbrk() 
with an increment of 0 can be used to find the current location of 
the program break.

void *malloc ( size_t size )
- 返回 void 指针，指向至少 size 个字节的内存块；
- x86 块大小 8 字节对齐，x86-64 块大小 16 字节对齐；

//CentOS
#include <stdlib.h>
#include <stdio.h>
#include <unistd.h>
#include <math.h>
#include <errno.h>
#include <string.h>
int main()
{	
	void *pbrk0 = sbrk(0);
	void *p1 = malloc(1);
	void *pbrk1 = sbrk(0);
	void *p2 = NULL;
	for(int cnt=0;cnt<1000;cnt++) p2 = malloc(0);
	void *pbrk2 = sbrk(0);
	void *p3 = malloc(pow(10,10));
	printf(	"pbrk0 [%p]\n"
		"p1 [%p]\n"	
		"pbrk1 [%p]\n"
		"p2 [%p]\n"
		"pbrk2 [%p]\n"
		"p3 [%p]\n"
		"errno [%d]\n"
		"strerr [%s]\n",
		pbrk0,p1,pbrk1,p2,pbrk2,p3,errno,strerror(errno));
	return 0;
}

[root@localhost allo]# uname -a
Linux localhost.localdomain 4.18.0-240.el8.x86_64 #1 SMP Fri Sep 25 19:48:47 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
[root@localhost]# gcc a.c 
[root@localhost]# ./a.out 
pbrk0	[0xfda000]
p1		[0xfda2a0]
pbrk1	[0xffb000]
p2		[0xfe1fa0] // 0xfe1fa0 - 0xfda2a0 = 0x7d00 = 32 * 1000
pbrk2	[0xffb000] // 0xffb000 - 0xfda000 = 0x21000, 编译时就决定 “一次性移动一大步”
p3		[(nil)]
errno	[12]
error	[Cannot allocate memory]

void free ( void *p )
- Returns the block pointed at by p to pool of available memory；
- p must come from a previous call to malloc、calloc ( 公开课上发音 “c” “alloc” ) or realloc；

11.1.2 How to implement malloc and free ?

Assumption
- 内存以 word 为最小单元编址，( on x86 ) word = 4 bytes；
- malloc、free 入参单位为 word；
Constraints
- Application 调用 malloc 和 free 的顺序随机；
- free 的入参必须是指向已分配内存块的指针；
- 请求的块大小或数量不可控；
- 请求必须立即响应 ( 不可以囤一堆请求一次性响应，或重排请求的先后顺序 ) ；
- 只能用 free memory 响应请求，一旦分配 allocators 再也无权访问；
  - allocator can’t compress or smoosh them all together to create larger free blocks；
- 大小对齐 8-byte (x86) or 16-byte (x86-64)；
Performance Goal
Allocators combine a trade-off of both running time ( speed ) and space. So these speed and memory efficiency metrics are defined；
- Troughput
  Number of completed requests per unit time，eg：5000 malloc calls + 5000 free calls in 10 sec，throughput is 1000 ops/sec；
- Peak Memory Utilization ( $U_k$ )
  $U_k=\frac{max( \sum\limits_{i<k}{P_i} )}{H_k}$
  【Payload】malloc ( p )，result in a block with a payload of p bytes；
  【Aggregate payload】 $P_k$ ，the sum of currently allocated payload after request $R_1，R_2 ... R_k$ has completed；
  【Current heap size】 $H_k$ ；
实现一个 allocator 需要面对的问题 ( answers in implicit free list )
- How much memory to free given just a pointer ?
- How to keep track of the free blocks ?
- How to deal with extra space when requiring blocks smaller than free blocks it is placed in ?
- How to choose a specific block among so many free blocks ?
- How to reinsert freed blocks ?

11.1.3 Fragmentation

Internal Fragmentation
- payload is smaller than block size；
- like padding for alignment，overhead for maintaining，explicit policy ( don’t want to splinter blocks up into little chunks )；
- depends only on the pattern of previous requests，easy to measure；
External Fragmentation
- aggregate heap memory is enough，but no single free block satisfied request；
- depends on the pattern of future requests，difficult to measure；

11.2 Implicit free lists

Knowing How Much to Free
- standard method
  把 size 藏在 block 开头的的 word 中，这个 word 被称为 header field 或者 header；
Keeping Track of Free Blocks
- 【1】Implicit list
  Traversing all free blocks in heap by traversing all blocks，ignoring allocated blocks；
  There is no list of free blocks，but the headers in blocks can tell；
  与 blocks 的总数线性相关；
- 【2】Explicit list
  Use some of the words in the block to create singly or doubly linked lists；
  与 free blocks 的总数线性相关，效率略有提升；
- 【3】Segregated free list
  Different free lists for different size classes；
- 【4】Blocks sorted by size
  Use a balanced tree (e.g. Red-Black tree) with pointers and the length used as a key

11.2.1 Implicit List

Each block needs size and allocation status，standard trick：
- Blocks has to be aligned，low-order bits are always 0，use it as a allocated / free flag；
- Must mask out this bit when read size of blocks；
- 例子
  
  payload 的起始地址要求 2字对齐，导致 block size 2字对齐，header 在当前 payload 的前一个word；最后，留一个 0 size block 标志 heap 的结尾（合并 free blocks 时消除特殊情况）；
Finding a Free Block
- 【First Fit】Search from beginning，choose first free block that fits；It can cause “splinters” at beginning of list；Take linear time in total number of blocks；
```
void *search_free_block(int len)
{
	void *p = list.begin();
	alloc = (*p & 1); // 是否被分配
	tsize = (*p & -2); // free block 总大小 (Byte)
	psize = tsize - 4; // max_payload = tsize - header_size (Byte)
	while( p<list.end() && (alloc || psize<len)) //已被分配 或 大小不满足请求
		p = p + tsize/4; //next block (word addressed)
	return p;
}
```
- 【Next Fit】Search from where previous search finished；Obviously faster than first fit ( avoid re-scanning unhelpful blocks )；fragmentation maybe worse；
- 【Best Fit】Search list and choose the best free block ( closest to size required ) ；less fragmentation but slower than first fit；infinite segregated free list is one way to approach best fit；
Allocating in Free Block
- Split the block if size required is smaller than free block，or just return the whole block ?

void addblock(void *p, int len)
{
	// arithmetic shift, round up to even（远零取整）,两字节对齐
  	int newsize = ((len + 1) >> 1) << 1;  	
  	
  	int oldsize = *p & -2;  // mask out status bit
  	*p = newsize | 1;       // set length and allocted status of new block
  	if (newsize < oldsize)	// set length of remaining free block
    	*(p + newsize) = oldsize - newsize; 
}  
void * malloc(int len)
{
	void *p = NULL; 
	p = search_free_block();
	if(p) addblock(p,len + 4); // header size = 4 Bytes
	return p + 1; // (p + 4 bytes) addresse of payload 
}

Freeing a Block
- only clear “allocated” status ? Generating external fragmentation !!
- Adjacent free blocks need to be coalesced
- All decent allocator is that contiguous free blocks never exist；

void free_block(ptr p)
{
	*p = *p & -2;   		// clear allocated flag
	next = p + (*p)/4;     	// find next block header
	if ((*next & 1) == 0)  	// if not allocated
		*p = *p + *next; 	// add to this block
}

How to Coalescing previous block
- remember the previous block while traverse the list，very inefficient !
- Bidirectional Coalescing
  - Boundary tags [ Don Kunth 1973 ]；
    Replicate header at end of free blocks ( a.k.a. “footer” )，这样 free§ 的时候，p-1 便是前一个 block 的 “footer”（即 “header”），从而判断是否与前一个 block 进行 Coalescing；缺点是 overhead 变大；
    
    4 cases：

Allocated block 都没有 footer；怎样才能在没有 footer 的条件下判断 previous block 是否 allocated ？
如果 previous block 是 allocated 状态，则它不参与合并，footer也就没有存在的必要 !!
但没有 “footer” 又不能判断 previous block 是否被 allocated，怎么办？
字节对齐导致 header 的低 n 位都是 0 ！
malloc 时在其后一个 header 中用一个低位记录 “allocated” 状态；

11.3 Explicit free lists

Implicit Free List 是由 block 的 length 隐式构建的单向链表，链表节点在虚拟地址上也是连续的；
而 Explicit 则是由指针显式构建的双向链表，链表节点间的相对位置随机；

11.3.1 Allocating

只需要更新 4个指针；

11.3.2 Freeing

核心问题：Where to insert the newly freed block ?

LIFO (last - in - first - out) Policy
- free list 的开头插入新释放的 block；
- 实现简单，时间复杂度 O(1)；
- 研究表明碎片较 Address-ordered 更多；
Address-ordered Policy
- pre 块地址 < newly freed 块地址 < next 块地址；
- 需要 search，可以通过平衡树等方法加速；
- 碎片较少；

很多程序员会犯过早优化 ( premature optimization ) 的错误，刚开始就思考奇奇怪怪的场景和对应的优化措施；在解决 allocator 这样复杂的问题的时候，应该本着“先易后难” 的原则，先有基础方案，再找瓶颈并逐个优化；

为方便描述，使用 predecessor 和 successor 分别表示虚拟地址（VA）前后相邻，而用 prev 和 next 表示 free list 中的前后相连；

LIFO Case 1 前后都不用 coalescing；

LIFO Case 2 与 predecessor 进行 coalescing；

LIFO Case 3 与 successor 进行 coalescing；

LIFO Case 4 与 successor、predecessor 同时 coalescing；

11.3.3 Explicit List Summary

v.s. Implicit List
- Faster，seaching free list only instead of whole heap ( especially when most of the memory is full )；
- More complicated splicing business；
- Extra space for linked list pointers；

11.4 Segregated free lists

Each size class has its own free list；

To allocato a block of size n:
- Seach appropriate free lists of blocks of requiring size m ( If the block is not found, try next larger class )；
- Place split block on appropriate list ( optional, may be class size exactly equal to the requiring size )
- If no block is found:
  - Request additional heap memory from OS ( sbrk() )；
  - Allocate block of n bytes from this new memory；
  - Place remainder (剩余的块) as a single free block in largest size class；
To free a block；
- Coalesce (like before) and place on appropriate list；
Advantages of seglist allocators
- Higher throughput，快：
  二分查找合适大小的空闲块链表，log time for power-of-two size classes；
- Better memory utilization，省：
  对 seglist 的 First-fit search 几乎等效于对整个 heap 进行了 best-fit search；

sbrk 是 Syscall，花费约几百微妙，overhead 很大，通过一次 sbrk 申请一个超大块的方式，amortize (均摊) 时间开销；但同时，memory utilization 将变得很低；又一个 space – time trade – off

Seglist 存放在 heap 的起始位置 (见 malloc Lab)；
More read：
“Dynamic Storage Allocation: A Survey and Critical Review”

11.5 Implicit Memory Management: Garbage Collection

Automatic reclamation of heap-allocated storage；
Common in many dynamic languages；
Python, Ruby, Java, Perl, ML, Lisp, Mathematica；
Variants (“conservative” garbage collectors) exist for C and C++
Allocator can’t determine whether these blocks are indeed garbage；

void foo(){
	int *p = malloc(128); // garbage
	return;
}

11.5.1 Garbage Collection

Which memory can be freed?
- 扫描内存，辨别指针及被其指向的块，没有被指向的块，认为是 Garbage；
Assumptions about pointers
- Memory Manager 需要能分辨出哪些变量是指针；
- 假设指针都指向块的开头（反例如指向数组中元素的指针）；
- 指针必须是静态不可变的?
Classical GC Algorithms
- Mark-and-sweep collection (McCarthy, 1960)
- Reference counting (Collins, 1960)
- Copying collection (Minsky, 1963)
- Generational Collectors (Lieberman and Hewitt, 1983)
  - Collection based on lifetimes；
  - Most allocations become garbage very soon；
  - So focus reclamation work on zones of memory recently allocated；

More information “Garbage Collection: Algorithms for Automatic Dynamic Memory”, John Wiley & Sons, 1996.

Mark-and-sweep collection
- Each block is a node in graph;
- Each pointer is a edge in the graph；
- Locations not in the heap that contain pointers into the heap are calld root nodes (e.g. registers, locations on the stack, global variables)；
- A node is reachable if there is a path from any root to that node；
  Non-reachable nodes are garbage；
- Implement on top of malloc / free package
  - Allocate using malloc until you “run out of space”；
  - When out of space，use extra mark bit in the head of each block；
  - Sub phase 1，mark，start from all roots and set mark bit on each reachable block（ depth-first traversal ）；
  - Sub phase 2，scan all blocks，and free blocks that are not marked；
- A Simple Implementation

ptr mark(ptr p) 						// pointer to the payload
{
   if (false == is_ptr(p)) return;     	// do nothing if not pointer
   if (true == markBitSet(p)) return;	// check if already marked
   setMarkBit(p);                 		// set the mark bit
   for (i=0; i < length(p); i++)  		// mark() recursively
   		mark(p[i]); 
   return;
}      
ptr sweep(ptr p, ptr end) 
{
   while (p < end) 						// scan the whole heap
   {
      if(true == markBitSet(p)) clearMarkBit(); 	// 正在被引用的块
      else if (true == allocateBitSet(p)) free(p);	// 没有被引用的块
      p += length(p);								// 将 p 指向 next block
}

How to find the beginning of the block
- In C，pointers can point to the middle of a block；
- Use a balanced binary tree to keep track of all allocated blocks (key is start-of-block)；
- Balanced-tree pointers can be stored in header (use two additional words)；
- Search the binary tree，a pointer should fall in the beginning and end of some allocated block，and that block is reachable；
- It’s conservative，because it maybe just an integer，but purportedly points to non-reachable blocks；

11.6 Memory-related perils and pitfalls

Errors involving memory are the worst kind of bugs to try to find out，because they are distant in both space and time；You only find out about those errors when you try to reference that data；

Dereferencing bad pointers
Reading uninitialized memory
Overwriting memory
Referencing nonexistent variables
Freeing blocks multiple times
Referencing freed blocks
Failing to free blocks

C pointers，只要根据声明时运算符的优先级，就不会认错类型；

declarations	type of p
int *p	pointer to int
int *p[13]	an array[13] of pointer to int
int *(p[13])	an array[13] of pointer to int
int (*p)[13]	a pointer to an array[13] of int
int ((f())[13])()	a function returning ptr to an array[13]of pointers to functions returning int
int ((x[3])())[5]	an array[3] of pointers to functions returning pointers to array[5] of ints