【转帖】Coroutines in C（C语言中的协程）（本文很可能就是freertos中协程的奠基性文章！）

最新推荐文章于 2024-08-04 16:23:08 发布

DOOM

最新推荐文章于 2024-08-04 16:23:08 发布

阅读量5.3k

点赞数 2

分类专栏： C++ 文章标签： c 语言 function character structure macros

C++ 专栏收录该内容

47 篇文章 1 订阅

订阅专栏

tp://www.chiark.greenend.org.uk/~sgtatham/coroutines.html

by Simon Tatham 翻译：jimieweile

Introduction
介绍

Structuring a large program is always a difficult job. One of the particular problems that often comes up is this: if you have a piece of code producing data, and another piece of code consuming it, which should be the caller and which should be the callee?
结构化一个大的程序一直是一个困难工作。其中一个典型问题经常是这样来的：如果您有块代码制造数据，同时另外一块代码消费它，哪个应当是调用者哪个应当是被调用者。

Here is a very simple piece of run-length decompression code, and an equally simple piece of parser code:
这里是一个非常简单的“协程长度减压”代码片段，还有一个同样简单的分析代码：

    /* Decompression code */
    while (1) {
        c = getchar();
        if (c == EOF)
            break;
        if (c == 0xFF) {
            len = getchar();
            c = getchar();
            while (len--)
                emit(c);
        } else
            emit(c);
    }
    emit(EOF);

    /* Parser code */
    while (1) {
        c = getchar();
        if (c == EOF)
            break;
        if (isalpha(c)) {
            do {
                add_to_token(c);
                c = getchar();
            } while (isalpha(c));
            got_token(WORD);
        }
        add_to_token(c);
        got_token(PUNCT);
    }

Each of these code fragments is very simple, and easy to read and understand. One produces a character at a time by calling emit() ; the other consumes a character at a time by calling getchar() . If only the calls to emit() and the calls to getchar() could be made to feed data to each other, it would be simple to connect the two fragments together so that the output from the decompressor went straight to the parser.
这些代码的每个都非常简单，而且容易读和理解。其中一个在某个时刻通过调用emit()制造一个字符；另外一个在一个时刻通过调用getchar()消费一个字符。如果只要通过调用emit()和 getchar()能被用来互相喂数据，那就简单了，只要把这两片代码互联，从而解压代码的输入直接通到分析代码。

In many modern operating systems, you could do this using pipes between two processes or two threads. emit() in the decompressor writes to a pipe, andgetchar() in the parser reads from the other end of the same pipe. Simple and robust, but also heavyweight and not portable. Typically you don't want to have to divide your program into threads for a task this simple.
在很多现代操作系统，您能在两个进程或两个线程之间使用管道。解压代码的emit()写进一个管道，而getchar()从同一管道的另外一端读入。简单牢靠，但是是重量级的不是轻量级的。典型的，您不会想不得不把您的程序分到多个线程中去，对这么个简单的任务。

In this article I offer a creative solution to this sort of structure problem.
在这片文章我将提供一个创新的解决办法，来解决这种类型的结构问题。

Rewriting
重写

The conventional answer is to rewrite one of the ends of the communication channel so that it's a function that can be called. Here's an example of what that might mean for each of the example fragments.
这个传统的回答是重写其中一个传输通道，让它的一个功能能被调用。这里是一个这意味着什么例子，每个例子代码都重写了。

int decompressor(void) {
    static int repchar;
    static int replen;
    if (replen > 0) {
        replen--;
        return repchar;
    }
    c = getchar();
    if (c == EOF)
        return EOF;
    if (c == 0xFF) {
        replen = getchar();
        repchar = getchar();
        replen--;
        return repchar;
    } else
        return c;
}

void parser(int c) {
    static enum {
        START, IN_WORD
    } state;
    switch (state) {
        case IN_WORD:
        if (isalpha(c)) {
            add_to_token(c);
            return;
        }
        got_token(WORD);
        state = START;
        /* fall through */

        case START:
        add_to_token(c);
        if (isalpha(c))
            state = IN_WORD;
        else
            got_token(PUNCT);
        break;
    }
}

Of course you don't have to rewrite both of them; just one will do. If you rewrite the decompressor in the form shown, so that it returns one character every time it's called, then the original parser code can replace calls to getchar() with calls to decompressor() , and the program will be happy. Conversely, if you rewrite the parser in the form shown, so that it is called once for every input character, then the original decompression code can call parser() instead of emit() with no problems. You would only want to rewrite both functions as callees if you were a glutton for punishment.
当然您不用不得不重写他们两个；只要一个就行。如果您象上面一样重写解压缩，这样它每次被调用返回一个字母，然后原先的解析器代码能用 decompressor()来代替getchar()，而且这个程序将会高兴。相反地，如果您重写解析器，这样它每次输入字母时被调用，那么原先的解压代码能用parse()来代替emit()，没有问题。仅当您是受虐狂时，您会重写两个函数把它们改成被调用的样子。

And that's the point, really. Both these rewritten functions are thoroughly ugly compared to their originals. Both of the processes taking place here are easier to read when written as a caller, not as a callee. Try to deduce the grammar recognised by the parser, or the compressed data format understood by the decompressor, just by reading the code, and you will find that both the originals are clear and both the rewritten forms are less clear. It would be much nicer if we didn't have to turn either piece of code inside out.
而且这是问题的关键，真的。这两个重写的功能和它们原先的样子比起来是彻底丑陋的。

Knuth's coroutines
努斯的协程

In The Art of Computer Programming , Donald Knuth presents a solution to this sort of problem. His answer is to throw away the stack concept completely. Stop thinking of one process as the caller and the other as the callee, and start thinking of them as cooperating equals.
在《计算机程序的艺术》中，努斯提供了一个这类问题的解决办法。他的回答是彻底扔掉堆栈的概念。停止思考关于一个进程是调用者，另外一个是被调用者，并且从平等合作开始思考它们。

In practical terms: replace the traditional "call" primitive with a slightly different one. The new "call" will save the return value somewhere other than on the stack, and will then jump to a location specified in another saved return value. So each time the decompressor emits another character, it saves its program counter and jumps to the last known location within the parser - and each time the parser needs another character, it saves its own program counter and jumps to the location saved by the decompressor. Control shuttles back and forth between the two routines exactly as often as necessary.
实际上：把原始的传统的“调用”替换为一个稍有不同的东东。这个新的“调用”将把返回值

This is very nice in theory, but in practice you can only do it in assembly language, because no commonly used high level language supports the coroutine call primitive. Languages like C depend utterly on their stack-based structure, so whenever control passes from any function to any other, one must be the caller and the other must be the callee. So if you want to write portable code, this technique is at least as impractical as the Unix pipe solution.
这在理论上非常漂亮，但是实际上您只能用汇编语言完成它，因为没有通常使用的高级语言支持例程的原始呼叫。象C之类的语言彻底地依赖基于堆栈的结构，所以无论如何控制从任何函数传到另外任何函数，一个必定是调用者，另外一个必定是被调用者。所以覆盖您想写出可移植代码，这个技术起码和Unix管道一样是不现实的。

Stack-based coroutines
基于堆栈的协程

So what we would really like is the ability to mimic Knuth's coroutine call primitive in C. We must accept that in reality, at the C level, one function will be caller and the other will be callee. In the caller, we have no problem; we code the original algorithm, pretty much exactly as written, and whenever it has (or needs) a character it calls the other function.
就是我们真正喜欢的是，在C语言中，模拟努斯的协程的能力。我们必须接受现实，在C的水平，一个功能将是调用者而且另外一个是被调用者。在调用者那里，我们没有问题；我们用原先的算法写，几乎和已写的一样，在有一个（或者需要一个）字母的时候，它调用另外函数。

The callee has all the problems. For our callee, we want a function which has a "return and continue" operation: return from the function, and next time it is called, resume control from just after the return statement. For example, we would like to be able to write a function that says
被调用者有所有的问题。对我们被调用者来说，我们想一个函数，它有一个“返回和继续”的操作。从这个函数返回后，下一次它被调用时候，控制回到刚才的返回语句后。例如，我们会喜欢能够写一个函数说：

int function(void) {
    int i;
    for (i = 0; i < 10; i++)
        return i;   /* won't work, but wouldn't it be nice */
}

and have ten successive calls to the function return the numbers 0 through 9.

然后十次成功的调用返回数字从0到9。

How can we implement this? Well, we can transfer control to an arbitrary point in the function using a goto statement. So if we use a state variable, we could do this:
我们能如何实现这个？好吧，我们能传递控制到函数内一个任意武断的点，通过“goto”语句。所以如果我们使用了一个状态变量，我们能做到这一点：

int function(void) {
    static int i, state = 0;
    switch (state) {
        case 0: goto LABEL0;
        case 1: goto LABEL1;
    }
    LABEL0: /* start of function */
    for (i = 0; i < 10; i++) {
        state = 1; /* so we will come back to LABEL1 */
        return i;
        LABEL1:; /* resume control straight after the return */
    }
}

This method works. We have a set of labels at the points where we might need to resume control: one at the start, and one just after each return statement. We have a state variable, preserved between calls to the function, which tells us which label we need to resume control at next. Before any return, we update the state variable to point at the right label; after any call, we do a switch on the value of the variable to find out where to jump to.

这个方法可以工作。我们在或许需要返回控制得地方有一系列标签：一个在开始的时候，另外一个正好在每个return语句后面。我们有一个状态变量，在这个函数每次被调用后保留，它告诉我们那个标签我们需要下次返回控制的地方。在任意返回之前，我们更新状态变量，指向正确的标签；在任何调用时，以这个变量做一个switch语句来找到该跳到哪里。
========================================
下面是这个方法的改进，用了一个达夫设备这么个结构，也是一个神奇的代码。但是和Coroutines /协程这个概念关系不大，精力有限，就此停止。
因为freertos有Coroutines 概念，所以找这篇来学习，接下去，要回去看看是否freertos是否就是这个概念。
======================================
刚去看freertos的源程序，发现freertos中的协程代码也用到了“达夫设备”，所以本篇论文，本文很可能就是freertos中协程的奠基性文章！
2010/10/8，所以，准备继续把余下的也翻译完毕。
==================================================================

It's still ugly, though. The worst part of it is that the set of labels must be maintained manually, and must be consistent between the function body and the initial switch statement. Every time we add a new return statement, we must invent a new label name and add it to the list in the switch ; every time we remove a return statement, we must remove its corresponding label. We've just increased our maintenance workload by a factor of two.
但是它仍旧很丑陋。最坏的部分是它的标签集必须被手工维护，而且必须在功能和初始的switch语句之间吧保持一致。每次我们增加一个新的return返回语句，我们必须发明一个新的标签名并且增加它到switch的列表中去；每次我们移除一个return返回语句，我们必须移除它的相关的标签。我们只是通过两个因素增加了我们的维护。

Duff's device
达夫设备

The famous "Duff's device" in C makes use of the fact that a case statement is still legal within a sub-block of its matching switch statement. Tom Duff used this for an optimised output loop:
著名的C语言的“达夫设备”，利用了这个事实：一个case语句在和它匹配的switch的子程序块中仍然是有效的。汤姆*达夫用这个来优化一个输出循环：
学习体会，仅作参考：
1.switch (count % 8)，这句，只运行了第一次。
2.下面的一排，case语句，也只在switch第一次运行的时候，执行了它们中的一个。总共一次。
3.程序的目的是要拷贝count个字节，用switch/case语句，在第一次时，先执行了“余数”次，确保总数正确。
4.程序优化的地方是 while ((count -= 8) > 0);，count的比较次数大约减少8倍，所以提高了效率。 switch (count % 8) { case 0: do { *to = *from++; case 7: *to = *from++; case 6: *to = *from++; case 5: *to = *from++; case 4: *to = *from++; case 3: *to = *from++; case 2: *to = *from++; case 1: *to = *from++; } while ((count -= 8) > 0); }

We can put it to a slightly different use in the coroutine trick. Instead of using a switch statement to decide which goto statement to execute, we can use the switch statement to perform the jump itself:
我们能够在协程的诡计中稍稍不同地使用它。不是用switch语句来决定跳转到哪个goto语句，我们能用switch语句来执行jump本身：
学习体会，仅供参考：
1.下面这个函数，除了结构上合达夫设备有类似之处，重要的不同是场景：switch/case会执行多次。
2.switch/case的case语句，受控于state，每次这个函数被调用时，依次往下执行下一个case。

int function(void) {
    static int i, state = 0;
    switch (state) {
        case 0: /* start of function */
        for (i = 0; i < 10; i++) {
            state = 1; /* so we will come back to "case 1" */
            return i;
            case 1:; /* resume control straight after the return */
        }
    }
}

Now this is looking promising. All we have to do now is construct a few well chosen macros, and we can hide the gory details in something plausible-looking:

现在看起来很有前途。现在所有我们不得不做的是创建一些个优选的宏，而且能够隐藏暴力细节，某种程度上看起来是花言巧语的。

#define crBegin static int state=0; switch(state) { case 0:
#define crReturn(i,x) do { state=i; return x; case i:; } while (0)
#define crFinish }
int function(void) {
    static int i;
    crBegin;
    for (i = 0; i < 10; i++)
        crReturn(1, i);
    crFinish;
}

(note the use of do ... while(0) to ensure that crReturn does not need braces around it when it comes directly between if and else )

（注意do ... while(0)的使用，来确保crReturn 不需要花括号包围它，当她直接从If和else之间发挥时）

This is almost exactly what we wanted. We can use crReturn to return from the function in such a way that control at the next call resumes just after the return. Of course we must obey some ground rules (surround the function body with crBegin and crFinish ; declare all local variables static if they need to be preserved across a crReturn ; never put a crReturn within an explicit switch statement); but those do not limit us very much.
这几乎就是我们想要的。我们使用crReturn 来从函数返回，通过这个方式来控制下次的调用回到恰恰这个return的后面。当然我们必须遵守一些基本准则（用crBegin 和crFinish 包围功能体；所有本地变量申明为静态如果它们需要在越过crReturn保留；永远不要把一个crReturn 放进一个显式的switch 语句中）；但是这些没有非常限制我们。

The only snag remaining is the first parameter to crReturn . Just as when we invented a new label in the previous section we had to avoid it colliding with existing label names, now we must ensure all our state parameters to crReturn are different. The consequences will be fairly benign - the compiler will catch it and not let it do horrible things at run time - but we still need to avoid doing it.
唯一的意外是crReturn的第一个参数。正如我们在上一节发明一个新标签时，我们已经避免了它冲撞现有的标签名字，现在我们必须确保所有我们给crReturn 的状态参数是不同的。这个后果将相当良好的-编译器将搞定？它，而且不让他在运行时做恐怖的事情-但是我们仍旧需要避免做它。

Even this can be solved. ANSI C provides the special macro name __LINE__ , which expands to the current source line number. So we can rewrite crReturn as
甚至这个能被解决。ANSI C提供这个特别的宏的名字 __LINE__，它扩展到当前的源代码行数。所以我们可以重写crReturn 为：

#define crReturn(x) do { state=__LINE__; return x; /
                         case __LINE__:; } while (0)

and then we no longer have to worry about those state parameters at all, provided we obey a fourth ground rule (never put two crReturn statements on the same line).

然后我们终于不再不得不担心关于这些状态参数，提供了我们遵守的第三个准则（不要把crReturn 语句放入到同一行中。）

Evaluation

So now we have this monstrosity, let's rewrite our original code fragments using it.
好现在我们有了这个大怪物，让我们用它重写我们开始的代码片段。（通过达夫设备的使用，使得代码形式优化）

int decompressor(void) {
    static int c, len;
    crBegin;
    while (1) {
        c = getchar();
        if (c == EOF)
            break;
        if (c == 0xFF) {
            len = getchar();
            c = getchar();
            while (len--)
	        crReturn(c);
        } else
	    crReturn(c);
    }
    crReturn(EOF);
    crFinish;
}

void parser(int c) {
    crBegin;
    while (1) {
        /* first char already in c */
        if (c == EOF)
            break;
        if (isalpha(c)) {
            do {
                add_to_token(c);
		crReturn( );
            } while (isalpha(c));
            got_token(WORD);
        }
        add_to_token(c);
        got_token(PUNCT);
	crReturn( );
    }
    crFinish;
}

We have rewritten both decompressor and parser as callees, with no need at all for the massive restructuring we had to do last time we did this. The structure of each function exactly mirrors the structure of its original form. A reader can deduce the grammar recognised by the parser, or the compressed data format used by the decompressor, far more easily than by reading the obscure state-machine code. The control flow is intuitive once you have wrapped your mind around the new format: when the decompressor has a character, it passes it back to the caller with crReturn and waits to be called again when another character is required. When the parser needs another character, it returns using crReturn , and waits to be called again with the new character in the parameter c .
我们已经重写了解压缩和解析为被调用者，不需要我们最近一次那样做巨大的结构修改。这两个结构准确地和原先的形式相对应。一个阅读器能够推出被解析器认识的语法，或者被解压器压缩的数据格式，远比读那些难以理解的状态-机器代码容易。这个控制流是直观的，一旦您已经封装了您的想法包围这个新的格式：当解压器有一个字符，它用crReturn 传递它回到调用者，并且等待被再次调用，当另一个字符被需要时。当分析器需要另一个字符，它用crReturn 返回，并且等待被再次调用，新的字符在参数c中。

There has been one small structural alteration to the code: parser() now has its getchar() (well, the corresponding crReturn ) at the end of the loop instead of the start, because the first character is already in c when the function is entered. We could accept this small change in structure, or if we really felt strongly about it we could specify that parser() required an "initialisation" call before you could start feeding it characters.
这里有一个小的结构上的改变，在parser()代码中，parser()现在有了它自己的getchar()（好吧，这个相关的crReturn），在每一个循环的结尾，而原先是在开始的时候getchar()。因为这第一个字符在函数进来时候已经在c中了。我们能够接受这个结构上的小改变，或者我们真的对它感觉强烈，我们可以规定parser()需要一个“初始化”调用，在我们开始喂它字符前。

As before, of course, we don't have to rewrite both routines using the coroutine macros. One will suffice; the other can be its caller.
就像以前，当然，我们不需要不得不重写两个代码，用这个协程宏。一个就够了，另外一个是调用者。

We have achieved what we set out to achieve: a portable ANSI C means of passing data between a producer and a consumer without the need to rewrite one as an explicit state machine. We have done this by combining the C preprocessor with a little-used feature of the switch statement to create an implicit state machine.
我们已经完成了我们着手去完成的：一个可移植的ANSI C意味着，在一个制造者和消费者之间传输数据不需要重写被调用程序为一个显式的状态机。我们已经通过组合C预编译器和一个switch的罕用的特征来创建一个隐式的状态机。
（以下内容不继续翻译了，因为要超出一篇文章的大小限制了。并且下面是代码风格讨论，和主题不紧密）

Coding Standards

Of course, this trick violates every coding standard in the book. Try doing this in your company's code and you will probably be subject to a stern telling off if not disciplinary action! You have embedded unmatched braces in macros, used case within sub-blocks, and as for the crReturn macro with its terrifyingly disruptive contents . . . It's a wonder you haven't been fired on the spot for such irresponsible coding practice. You should be ashamed of yourself.

I would claim that the coding standards are at fault here. The examples I've shown in this article are not very long, not very complicated, and still just about comprehensible when rewritten as state machines. But as the functions get longer, the degree of rewriting required becomes greater and the loss of clarity becomes much, much worse.

Consider. A function built of small blocks of the form

    case STATE1:
    /* perform some activity */
    if (condition) state = STATE2; else state = STATE3;

is not very different, to a reader, from a function built of small blocks of the form

    LABEL1:
    /* perform some activity */
    if (condition) goto LABEL2; else goto LABEL3;

One is caller and the other is callee, true, but the visual structure of the functions are the same, and the insights they provide into their underlying algorithms are exactly as small as each other. The same people who would fire you for using my coroutine macros would fire you just as loudly for building a function out of small blocks connected by goto statements! And this time they would be right, because laying out a function like that obscures the structure of the algorithm horribly.

Coding standards aim for clarity. By hiding vital things like switch , return and case statements inside "obfuscating" macros, the coding standards would claim you have obscured the syntactic structure of the program, and violated the requirement for clarity. But you have done so in the cause of revealing the algorithmic structure of the program, which is far more likely to be what the reader wants to know!

Any coding standard which insists on syntactic clarity at the expense of algorithmic clarity should be rewritten. If your employer fires you for using this trick, tell them that repeatedly as the security staff drag you out of the building.

Refinements and Code

In a serious application, this toy coroutine implementation is unlikely to be useful, because it relies on static variables and so it fails to be re-entrant or multi-threadable. Ideally, in a real application, you would want to be able to call the same function in several different contexts, and at each call in a given context, have control resume just after the last return in the same context.

This is easily enough done. We arrange an extra function parameter, which is a pointer to a context structure; we declare all our local state, and our coroutine state variable, as elements of that structure.

It's a little bit ugly, because suddenly you have to use ctx->i as a loop counter where you would previously just have used i ; virtually all your serious variables become elements of the coroutine context structure. But it removes the problems with re-entrancy, and still hasn't impacted the structure of the routine.

(Of course, if C only had Pascal's with statement, we could arrange for the macros to make this layer of indirection truly transparent as well. A pity. Still, at least C++ users can manage this by having their coroutine be a class member, and keeping all its local variables in the class so that the scoping is implicit.)

Included here is a C header file that implements this coroutine trick as a set of pre-defined macros. There are two sets of macros defined in the file, prefixed scr and ccr . The scr macros are the simple form of the technique, for when you can get away with using static variables; the ccr macros provide the advanced re-entrant form. Full documentation is given in a comment in the header file itself.

Note that Visual C++ version 6 doesn't like this coroutine trick, because its default debug state (Program Database for Edit and Continue) does something strange to the __LINE__ macro. To compile a coroutine-using program with VC++ 6, you must turn off Edit and Continue. (In the project settings, go to the "C/C++" tab, category "General", setting "Debug info". Select any option other than "Program Database for Edit and Continue".)

(The header file is MIT-licensed, so you can use it in anything you like without restriction. If you do find something the MIT licence doesn't permit you to do, mail me , and I'll probably give you explicit permission to do it anyway.)

Follow this link for coroutine.h .

Thanks for reading. Share and enjoy!