CompilationUnit

最新推荐文章于 2024-04-09 21:53:32 发布

cicuinie0996

最新推荐文章于 2024-04-09 21:53:32 发布

阅读量1.4k

点赞数

文章标签： java 数据结构与算法

原文链接：https://my.oschina.net/rinehart/blog/176362

版权

dalvik还是要显得与众不同。

在一个从Java源码编译到JVM字节码的编译器（如javac、ECJ）里，一个“编译单元”（CompilationUnit）指的是一个Java源文件。而在Dalvik VM的JIT里也有一个结构体名为“CompilationUnit”，这个千万不能跟Java源码级的编译单元弄混了——它在这里指的就是一个“trace”。

http://hllvm.group.iteye.com/group/topic/17798

万能的WIKI是这么说的：

http://en.wikipedia.org/wiki/Single_Compilation_Unit

看起来WIKI的解释和java JVM的定义差不多。

在源代码里，它是这么定义的：

typedef struct CompilationUnit {
    int numInsts;
    int numBlocks;
    GrowableList blockList;
    const Method *method;
#ifdef ARCH_IA32
    int exceptionBlockId;               // the block corresponding to exception handling
#endif
    const JitTraceDescription *traceDesc;
    LIR *firstLIRInsn;
    LIR *lastLIRInsn;
    LIR *literalList;                   // Constants
    LIR *classPointerList;              // Relocatable
    int numClassPointers;
    LIR *chainCellOffsetLIR;
    GrowableList pcReconstructionList;
    int headerSize;                     // bytes before the first code ptr
    int dataOffset;                     // starting offset of literal pool
    int totalSize;                      // header + code size
    AssemblerStatus assemblerStatus;    // Success or fix and retry
    int assemblerRetries;               // How many times tried to fix assembly
    unsigned char *codeBuffer;
    void *baseAddr;
    bool printMe;
    bool allSingleStep;
    bool hasClassLiterals;              // Contains class ptrs used as literals
    bool hasLoop;                       // Contains a loop
    bool hasInvoke;                     // Contains an invoke instruction
    bool heapMemOp;                     // Mark mem ops for self verification
    bool usesLinkRegister;              // For self-verification only
    int profileCodeSize;                // Size of the profile prefix in bytes
    int numChainingCells[kChainingCellGap];
    LIR *firstChainingLIR[kChainingCellGap];
    LIR *chainingCellBottom;
    struct RegisterPool *regPool;
    int optRound;                       // round number to tell an LIR's age
    jmp_buf *bailPtr;
    JitInstructionSetType instructionSet;
    /* Number of total regs used in the whole cUnit after SSA transformation */
    int numSSARegs;
    /* Map SSA reg i to the Dalvik[15..0]/Sub[31..16] pair. */
    GrowableList *ssaToDalvikMap;

    /* The following are new data structures to support SSA representations */
    /* Map original Dalvik reg i to the SSA[15..0]/Sub[31..16] pair */
    int *dalvikToSSAMap;                // length == method->registersSize
    BitVector *isConstantV;             // length == numSSAReg
    int *constantValues;                // length == numSSAReg

    /* Data structure for loop analysis and optimizations */
    struct LoopAnalysis *loopAnalysis;

    /* Map SSA names to location */
    RegLocation *regLocation;
    int sequenceNumber;

    /*
     * Set to the Dalvik PC of the switch instruction if it has more than
     * MAX_CHAINED_SWITCH_CASES cases.
     */
    const u2 *switchOverflowPad;

    JitMode jitMode;
    int numReachableBlocks;
    int numDalvikRegisters;             // method->registersSize + inlined
    BasicBlock *entryBlock;
    BasicBlock *exitBlock;
    BasicBlock *puntBlock;              // punting to interp for exceptions
    BasicBlock *backChainBlock;         // for loop-trace
    BasicBlock *curBlock;
    BasicBlock *nextCodegenBlock;       // for extended trace codegen
    GrowableList dfsOrder;
    GrowableList domPostOrderTraversal;
    BitVector *tryBlockAddr;
    BitVector **defBlockMatrix;         // numDalvikRegister x numBlocks
    BitVector *tempBlockV;
    BitVector *tempDalvikRegisterV;
    BitVector *tempSSARegisterV;        // numSSARegs
    bool printSSANames;
    void *blockLabelList;
    bool quitLoopMode;                  // cold path/complex bytecode
} CompilationUnit;

不得不说，这个数据结构实在是太大了，这样真的好维护吗？

但对于程序来说，它复杂度和它重要性一般是成正比的。

CU是一个贯穿整个程序的结构体，一个trace包含的几乎所有信息，都被CU整合了在一次，这也是因此为何CU如此复杂和臃肿的原因。

你若想了解一个软件的执行思路，必须的吃透它最主要的数据结构，以及它的组织方式。因为数据结构的组织方式，决定了软件该如何处理它。例如，使用了大量指针的数据结构，很明显是必须要使用动态链表来保存数据的。而之所以使用了大量的动态链表，是因为数据结构要描述的对象大小的不确定性。例如对于dalvik来说，trace的长度不一定，你不能假定它是64K或者是128K。静态减少了复杂度，但是对于不规则的对象却无法处理；动态引入了弹性，但是必然会引入复杂度。

从结构体开头就可以看到，”const Method *method;“。这是一个指针，那么对于一个CU来说，是必须要对应一个method，但是一个method可能包含了几个CU.

比较重要的还有，这是整个CU的入口，出口对应的LIR：

LIR *firstLIRInsn;
    LIR *lastLIRInsn;
    LIR *literalList;                   // Constants
    LIR *classPointerList;              // Relocatable

当CU被编译成为LIR的时候，LIR是保存成为链表形式的。

除此之外，CU还会包含大量的BB，整个CU的入口，出口等。

BasicBlock *entryBlock;
    BasicBlock *exitBlock;
    BasicBlock *puntBlock;              // punting to interp for exceptions
    BasicBlock *backChainBlock;         // for loop-trace
    BasicBlock *curBlock;
    BasicBlock *nextCodegenBlock;       // for extended trace codegen

之前所说过，BB是只有一个IN/OUT的branch，它自然也包含了MIR。因此，BB,MIR,LIR在CU这里，一家都齐全了。

转载于:https://my.oschina.net/rinehart/blog/176362