Why Not Translate Perl to C?_perl to c translate-CSDN博客

People often have the idea that automatically translating Perl to Cand then compiling the C will make their Perl programs run faster,because "C is much faster than Perl." This article explains why thisstrategy is unlikely to work.

Short Summary

Your Perl program is being run by the Perl interpreter. You want a Cprogram that does the same thing that your Perl program does. A Cprogram to do what your Perl program does would have to do most of thesame things that the Perl interpreter does when it runs your Perlprogram. There is no reason to think that the C program could dothose things faster than the Perl interpreter does them, because thePerl interpreter itself is written in very fast C.

Some detailed case studies follow.

Built-In Functions

Suppose your program needs to split a line into fields, and uses thePerl split function to do so. You want to compile this to C so it will befaster.

This is obviously not going to work, because the splitfunction is already implemented in C. If you have the Perl sourcecode, you can see the implementation of split in the filepp.c; it is in the function named pp_split. Whenyour Perl program uses split, Perl calls thispp_split function to do the splitting. pp_split iswritten in C, and it has already been compiled to native machinecode.

Now, suppose you want to translate your Perl program to C. Howwill you translate your split call? The only thing you cando is translate it to a call to the C pp_split function, orsome other equivalent function that splits. There is no reason tobelieve that any C implementation of split will be fasterthan the pp_split that Perl already has. Years of work havegone into making pp_split as fast as possible.

You can make the same argument for all of Perl's other built-infunctions, such as join, printf, rand and readdir.

So much for built-in functions.

Data Structures

Why is Perl slow to begin with? One major reason is that its datastructures are extremely flexible, and this flexibility imposes aspeed penalty.

Let's look in detail at an important example: strings. Considerthis Perl code:

        $x = 'foo';     
        $y = 'bar';
        $x .= $y;

That is, we want to append $y to the end of $x.In C, this is extremely tricky. In C, you would start by doingsomething like this:

        char *x = "foo";
        char *y = "bar";

Now you have a problem. You would like to insert bar atthe end of the buffer pointed to by x. But you can't,because there is not enough room; x only points to enoughspace for four characters, and you need space for seven. (C stringsalways have an extra nul character on the end.) To appendy to x, you must allocate a new buffer, and thenarrange for x to point to the new buffer:

        char *tmp = malloc(strlen(x) + strlen(y) + 1);
        strcpy(tmp, x);
        strcat(tmp, y);
        x = tmp;

This works fine if x is the only pointer to that particular buffer.But if some other part of the program also had a pointer to thebuffer, this code does not work. Why not? Here's the picture of whatwe did:

BEFORE:

Here x and z are two variables that both contain pointers to the samebuffer. We want to append bar to the end of the string. But the Ccode we used above doesn't quite work, because we allocated a newregion of memory to hold the result, and then pointed x to it:

AFTER x = tmp:

It's tempting to think that we should just point z to thenew buffer also, but in practice this is impossible. The functionthat is doing the appending cannot know whether there is such az, or where it may be. There might be 100 variableslike z all pointing to the old buffer, and there is no goodway to keep track of them so that they can all be changed when thearray moves.

Perl does support a transparent string append operation. Let's seehow this works. In Perl, a variable like $x does not point directlyat the buffer. Instead, it points at a structure called an SV.('Scalar Value') The SV has the pointer to the buffer, and also someother things that I do not show:

BEFORE $x .= $y

When you ask Perl to append bar to $x, it followsthe pointers and finds that there is not enough space in the buffer.So, just as in C, it allocates a new buffer and stores the result inthe new buffer. Then it fixes the pointer in the SV to point to thenew buffer, and it throws away the old buffer:

Now $x and $z have both changed. If there wereany other variables sharing the SV, their values would have changedalso. This technique is called "double indirection,'" and it is howPerl can support operations like .=. A similar principleapplies for arrays; this is how Perl can support the pushfunction.

The flexibility comes at a price: Whenever you want to use thevalue of $x, Perl must follow two pointers to get the value:The first to find the SV structure, and the second to get to thebuffer with the character data. This means that using a string inPerl takes at least twice as long as in C. In C, you follow just onepointer.

If you want to compile Perl to C, you have a big problem. Youwould like to support operations like .= and push, but C does not support these very well. There are only threesolutions:

Don't support .=
This is a bad solution, because after you disallow all the Perl operations like .= and push what you have left is not very much like Perl; it is much more like C, and then you might as well just write the program in C in the first place.
Do something extremely clever
Cleverness is in short supply this month. :)
Use a double-indirection technique in the compiled C code
This works, but the resulting C code will be slow, because you will have to traverse twice as many pointers each time you want to look up the value of a variable. But that is why Perl is slow! Perl is already doing the double-indirection lookup in C, and the code to do this has already been compiled to native machine code.

So again, it's not clear that you are going to get any benefit fromtranslating Perl to C. The slowness of Perl comes from theflexibility of the data structures. The code to manipulate thesestructures is already written in C. If you translate a Perl programto C, you have the choice of throwing away the flexibility of the datastructure, in which case you are now writing C programs with Cstructures, or keeping the flexibility with the same speedpenalty. You probably cannot speed up the data structures, because ifanyone knew how to make the structures faster and still keep themflexible, they would already have made those changes in the C code forPerl itself.

Possible Future Work

Larry WallApocalypse 2

Damian Conway Exegesis 2

perl6-internalsmailing list archive

It should now be clear that although it might not be hard to translatePerl to C, programs probably will not be faster as a result.

However, it's possible that a sufficiently clever person could makea Perl-to-C translator that produced faster C code. The programmerwould need to give hints to the translator to say how the variableswere being used. For example, suppose you have an array @a.With such an array, Perl is ready for anything. You might do$a[1000000] = 'hello'; or $a[500] .='foo'; or $a[500] /= 17;. This flexibility is expensive.But suppose you know that this array will only hold integers and therewill never be more than 1,000 integers. You might tell the translatorthat, and then instead of producing C code to manage a slow Perlarray, the translator can produce

        int a[1000];

and use a fast C array of machine integers.

To do this, you have to be very clever and you have to think of away of explaining to the translator that @a will never bebigger than 1,000 elements and will only contain integers, or a way forthe translator to guess that just from looking at the Perlprogram.

People are planning these features for Perl 6 right now. Forexample, Larry Wall, the author of Perl, plans that you will be ableto declare a Perl array as

        my int @a is dim(1000);

Then a Perl-to-C translator (or Perl itself) might be able to usea fast C array of machine integers rather than a slow Perl array ofSVs. If you are interested, you may want to join the perl6-internalsmailing list.