CS107-Lecture 7-Note

距离上一次听CS107已经有一个月了,非常尴尬。最近花一周左右时间粗糙看完了《Linux内核设计与实现》,compile custom Linux kernel,又复习了之前课程的笔记,和CS107结合起来,感觉消化了更多。比如书中介绍的(1)内核编程的特点之inline;(2)内核数据结构中对struct、list的使用;(3)malloc、realloc使用时的注意事项,本质上都是CS107中advanced memory management的部分。书的重点是介绍Linux内核的策略和机制,CS107是编写Linux内核代码时的Program Paradigm。

Lecture 7前半部分介绍String Stack的实现,中间介绍几个和分配内存相关的库函数,如strdup、const char *,最后介绍内存中数据块的分类和保存,如heap field、stack field等。


Full Implementation of a String Stack

Lecture 6介绍了generic stack,这次介绍的是string stack. 有别于int, double, long等类型,string类型对指针和地址的运用要复杂一点,因为在下文的例子中用到了动态分配的内存,就需要手动free这块内存。

typedef struct {
    void *elems;
    int elemSize;
    int logLength;
    int allocLength;
                    //reserved field
}Stack;

void StackNew(Stack *s, int elemSize);
void StackDispose(Stack *s);
void StackPush(Stack *s, void *elemAddr);
void StackPop(Stack *s, void *elemAddr);

函数声明基本一致,接下来是对string stack的操作:

/* program 1. Operate String Stack */

int main(-,-) {
    const char *friends[] = {"Al","Bob","Carl"};
    Stack StringStack;
    StackNew(&StringStack, sizeof(char*));
    for(int i=0; i<3; i++) {
        char *copy = strdup(friends[i]); //copy就是friends开挂的副本,后半部分会详细说明
        StackPush(StringStack, &copy); //取copy地址传入,真正push的是copy中存放的地址,该地址指向了friends[0]在heap中的副本:“AI”
    }

    char *name;
    for(int i=0; i<3; i++) {
        StackPop(&StringStack, &name);
        printf("pop up variable: %s\n", name);
        free(name); //这里释放name,其实就是释放“Al”、“Bob”、“Carl”,将它们还给heap
    }

    StackDispose(&StringStack); //这里释放StringStack,就是释放stack的内存
}

问题:如果没有for循环中StackPop和free的操作,直接StackDispose可行吗?

Jerry: the stack shouldn’t be obligated to be empty, or the client shouldn’t be forced to pop everything off the stack before they call dispose. StackDispose should be able to say, “Ok, I seem to have retained ownership of some elements that were pressed onto me.”

所以,StackDispose()应当有基本的能力:判断自身栈中(char*)s->elems是否还保存有某种内容,如果(char*)s->elems中保存的是int、double、long、char等类型的值,不需要清零;如果是pointer,并且其指向了动态分配的内存块,应当先free这块内存,再free StackString。问题的难点在于理解freefn():

这里写图片描述

释放elems指向的栈中保存的三个指针。

Jerry: when we store things, we actually have to potentially set up the stack, or set up the stacks to potentially delete elements for us.

所以,Jerry从StackNew()函数入手,在初始化时就令StackString为elems做了标记:为其定制了一个freefn()来释放elems中的(我们已知的char**类型的)元素。

/* program 2. modified StringStack */

typedef struct {
    void *elems;
    int elemSize;
    int logLength;
    int allocLength;
    void (*freenfn)(void*); //为Stack新增一个函数域
}Stack;

void StackNew(Stack *s, int elemSize, void (*freefn)(void *)){略}

void StackDispose() {
    /*1. 检查elems指向的内存块中是否有些复杂的东西(二级指针)
    * 2. 有东西,就free掉
    * 3. 因为我们已知栈里这些东西是char*,所以用char*解引用,释放解引用后指向的堆中的内存块
    */
    if(s->freenfn != NULL) {
        for(int i=0; i<s->logLength; i++) { //不需要释放未被使用的栈,所以使用logLength
            s->freenfn((char *)s->elems + i*s->elemSize);
        }
    }
    free(s->elems); //释放elems指向的栈中的内存块
}


Stack StringStack;

//如果是一般类型,StringFree传入NULL,之后无需释放内存
StackNew(&StringStack, sizeof(char *), StringFree);

//为Stack定制的函数,专门free作为char**存在的elems
StringFree(void *elems) {
    free(*(char **)elems); //第一个*的解引用很关键,从而释放栈空间
}

在理解freefn()之前,这篇对C语言中malloc和free函数的理解 归纳了malloc和free的几个知识点,有助于理解清理堆栈的问题:我们所free的malloc分配的堆内存,在释放后将就可以被内存管理者回收再利用,而*elem的值并没有改变,就很可能变成野指针(因为鬼知道这个地址处的值将会被重新分配来干什么),还需要重新指向NULL。


Functions

1. rotate()

rotate()是C++标准库函数,Jerry将在C里模拟实现。copy过程中,可能会出现内存覆盖,不能使用memcpy(),而使用memmove():

Jerry: The source regions are actually overlap or they are potentially overlap. The implementation of memcpy is brute force(暴力算法). It carries things four(这里是针对他举的呢个4-byte copy的例子) bytes at a time, and then at the end does whatever mod tricks(求模的方式?) it needs to copy off an odd number of bytes, but it assumes they’re not overlapping. When they’re overlapping, that brute force approach might not work.

关于memcpy()和memmove(),memmove 和 memcpy的区别以及处理内存重叠问题 这篇文章介绍了俩函数的使用情境和具体实现。Jerry也补充了一些他的观点:我们可以在copy之前,check the target address and the source address,然后自行判断是从前到后还是从后到前copy;如果不想check,那就用mmmove(),但要记得mmmove()只用在非用不可的地方,因为它的效率实在是太低了,比如rotate()里:

void rotate(void *front, void *middle, void *end) {
    //如果是两个void指针直接相减,就会返回两个地址之间的int的个数,所以需要转为char*再减,才能得到两个地址之间实际物理字节的个数
    int frontSize = (char*)middle - (char*)front; 
    int backSize = (char*)end - (char*)middle; 
    char buffer[frontSize]; //开个buffer
    memcpy(buffer, front, frontSize);
    memmove(front, middle, backSize); //尽可能调用memcpy,因为效率更高
    memcpy((char*)end-frontSize, buffer, frontSize);
}

2. qsort()

void qsort(void *base, int size, int elemSize, int (*cmpfn)(void *, void *)) { 
    Lecture 8
}   

RAM Memory Management

大三那年在温老师的课上睡过的觉觉错过的知识欠下的学术债,早晚要补回来,所以最后十分钟Jerry概述的RAM Memory Management,句句珠玑都要记下来…

Here’s RAM, and since we’re dealing with an architecture where longs and pointers are four bytes, that means that pointers can distinguish between two to the thirtysecond different addresses. That means the lowest address in memory is zero which is that null that you’re starting to fear a little bit and then the highest address is two to the thirysecond minus one.

stack segment

Whenever you call functions, and the function call forces the allocation of lots of local variables, the memory for those local variables is drawn from a subset of all RAM called the stack. I’m gonna drawn that up here. I drew a little bit bigger than I need to, but here it is, the stack segment. The stack segment is what this thing is called. It doesn’t necessarily use all of the stack, but for reasons that will become clear and there’s even a little bit of intuition, I think, as to why it might be called a stack, when you call main you get all of its local variables, and they’re alive and they’re active. When main calls a helper function it doesn’t like the main functions, main’s variables go away. They’re just temporarily disabled, and don’t have access to .….. at least not via the normal variable names, right. So main calls helper, which calls helper helper, which calls helper helper helper, and you have all of these variables that are allocated. But only the ones on the bottom most function are actually alive and accessible via their variable names. When helper helper helper returns, you return back to the local where helper helper has local variables, you can access those. So basically, when a function calls another function, the first function’s variables are suspended until whatever happens in response to the function call actually ends. And it may itself call several helper functions, and just go through lots of a big code tree of functions calls before it actually returns a value, or not. What happens is that, initially, that much space is set aside from the stack segment to just hold the main’s variables whatever the main’s local variables are. And when main called something, this threshold is lowered to there to just make sure that not only is there space for main’s variables set aside, but also for the helper function’s variables, okay. And it(the threshold) goes down an up, down and up, every time it goes down is because some function was called, and every time it goes up, it’s because some functions returned, okay. And the same argument can be made for methods in C++. It’s called a stack because the most recently called functions is the one that is invited to return before any other ones unless it calls some other function, okay. That’s why it called a stack.

heap segment

Heap in this world(CS 107) doesn’t mean like a priority cube back in data structure. It really means blob of arbitrary bytes which is completepy managed by the hardware, by the assembly code which actually happens to be down here(地址0上面的内存块).**cThis right there, this boundary, and that address, and that address is admitted to software(heap). **Software that implements what we called the heap manager. And the heap manger is software it’s code. The implementation of malloc, realloc and free…and they basically manage this memory right here. …But I wanna …..memory heap …that it is one big linear array of bytes. And so, rather than drawing it as a tall rectangle, I’m gonna draw it as a very wide rectangle.

堆内存管理器:

Entirely software managed with very little exception, okay and I say exception because the operate system and what’s called the loader has to admit to the implementation what the boundaries of the stacks of the heap segment are, but everything else is really frame in terms of this raw memory allocator, okay. As far as realloc is concerned, if I pass this address to realloc, and I ask it to become bigger, it’ll have to do a reallocation request, and put it somewhere else probably right there, the way we’ve been talking about it. What happened is that there really is a little bit of a data structure that overlays the entire heap segment, okay. It is manually manged using lots of void * business.

空闲链表:

The data structure that’s more or less used by the heap manager overlays a linked list of what are called free notes, okay. And it always keeps the address of the very first free note, and because you’re not usually using this as a client, the heap manger uses it as a variably sized node that right in the first eight or four bytes keeps track of how big that node is. So, it might have subdivided that, and to the left of that line might have the size of that node, and to the right of that line might actually have a pointer to that right there.


参考资料

[1]. 堆(heap)和栈(stack)
这篇文章比较了heap和stack在大小、使用方式、碎片、生长方式等方面的差别,可以帮助理解上面的free操作。在StackNew()中,friends是const类型,保存在常量区;通过strdup()调用malloc(),在heap上分配了一个空间保存一级指针(指向friends);elem是stack上的变量,保存了指向friends的二级指针。

作者最后还举了一个例子,解释了不同变量在内存空间中的位置。其中“还有就是函数调用时会在栈上有一系列的保留现场及传递参数的操作”,其实Jerry也提了下,在说到free了friends之后,friends所在heap内存的去向,是会有专门的内存管理者来回收的。如果我理解的没错的话,《Linux内核设计与实现》第12章内存管理,就有详细介绍内核如何把内存分成页和区来管理的,这种管理maybe就是他们所说的回收再分配吧。

[2]. 字符数组和字符指针
看了这篇文章,对字符数组和字符指针的理解更透彻了。评论里有两处对文章有争议的地方:
(1)str是指针常量还是常量指针?

char str[20] = {'h','e','l','l','o',' ','w','o','r','l','d'};
str++;
int *pstr;

两个称呼我也分不清,特意查了下《The C Programming Language (Second Edition)》5.3 Pointers and Array,并没有对数组名下这种定义。但我理解评论想表达的意思:A pointer is a variable, and an array name is not a variable; constructions like pstr=str and pstr++ are legal; constructions like str=pstr and str++ are illegal.
(2)字符数组声明之后,没有初始化;或者声明之后,只是malloc了内存,都会因为没有‘\0’导致直接printf(“%s\n”,str);出来乱码。这个我没理解,因为我也运行了下代码,结果并没有乱码:
这里写图片描述

即使是作者的例子char str[20]; 我运行的也是ok的。奇怪。

  • 1
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值