在Android运行时ART加载类和方法的过程分析一文中,我们通过AndroidRuntime类的成员函数start来分析类和类方法的加载过程。本文同样是从这个函数开始分析类方法的执行过程,如下所示:
//frameworks/base/core/jni/AndroidRuntime.cpp
1007 void AndroidRuntime::start(const char* className, const Vector<String8>& options, bool zygote)
1008 {
...
1079 /*
1080 * Start VM. This thread becomes the main thread of the VM, and will
1081 * not return until the VM exits.
1082 */
1083 char* slashClassName = toSlashClassName(className);
1084 jclass startClass = env->FindClass(slashClassName);
1085 if (startClass == NULL) {
1086 ALOGE("JavaVM unable to locate class '%s'\n", slashClassName);
1087 /* keep going */
1088 } else {
1089 jmethodID startMeth = env->GetStaticMethodID(startClass, "main",
1090 "([Ljava/lang/String;)V");
1091 if (startMeth == NULL) {
1092 ALOGE("JavaVM unable to find main() in '%s'\n", className);
1093 /* keep going */
1094 } else {
1095 env->CallStaticVoidMethod(startClass, startMeth, strArray);
1096
1097 #if 0
1098 if (env->ExceptionCheck())
1099 threadExitUncaughtException(env);
1100 #endif
1101 }
1102 }
...
1110 }
找到要调用类方法之后,就可以调用JNI接口CallStaticVoidMethod来执行它了。
根据我们在Android运行时ART加载类和方法的过程分析一文的分析可以知道,JNI接口CallStaticVoidMethod由JNI类的成员函数CallStaticVoidMethod实现,如下所示:
// art/runtime/jni_internal.cc
1612 static void CallStaticVoidMethod(JNIEnv* env, jclass, jmethodID mid, ...) {
1613 va_list ap;
1614 va_start(ap, mid);
1615 CHECK_NON_NULL_ARGUMENT_RETURN_VOID(mid);
1616 ScopedObjectAccess soa(env);
1617 InvokeWithVarArgs(soa, nullptr, mid, ap);
1618 va_end(ap);
1619 }
JNI类的成员函数CallStaticVoidMethod实际上又是通过全局函数InvokeWithVarArgs来调用参数mid指定的方法的,如下所示:
// ~/android-6.0.1_r62/art/runtime/reflection.cc
437 JValue InvokeWithVarArgs(const ScopedObjectAccessAlreadyRunnable& soa, jobject obj, jmethodID mid,
438 va_list args)
439 SHARED_LOCKS_REQUIRED(Locks::mutator_lock_) {
440 // We want to make sure that the stack is not within a small distance from the
441 // protected region in case we are calling into a leaf function whose stack
442 // check has been elided.
443 if (UNLIKELY(__builtin_frame_address(0) < soa.Self()->GetStackEnd())) {
444 ThrowStackOverflowError(soa.Self());
445 return JValue();
446 }
447
448 ArtMethod* method = soa.DecodeMethod(mid);
449 bool is_string_init = method->GetDeclaringClass()->IsStringClass() && method->IsConstructor();
450 if (is_string_init) {
451 // Replace calls to String.<init> with equivalent StringFactory call.
452 method = soa.DecodeMethod(WellKnownClasses::StringInitToStringFactoryMethodID(mid));
453 }
454 mirror::Object* receiver = method->IsStatic() ? nullptr : soa.Decode<mirror::Object*>(obj);
455 uint32_t shorty_len = 0;
456 const char* shorty = method->GetInterfaceMethodIfProxy(sizeof(void*))->GetShorty(&shorty_len);
457 JValue result;
458 ArgArray arg_array(shorty, shorty_len);
459 arg_array.BuildArgArrayFromVarArgs(soa, receiver, args);
460 InvokeWithArgArray(soa, method, &arg_array, &result, shorty);
461 if (is_string_init) {
462 // For string init, remap original receiver to StringFactory result.
463 UpdateReference(soa.Self(), obj, result.GetL());
464 }
465 return result;
466 }
函数InvokeWithVarArgs将调用参数封装在一个数组中,然后再调用另外一个函数InvokeWithArgArray来参数mid指定的方法。参数mid实际上是一个ArtMethod对象指针,因此,我们可以将它转换为一个ArtMethod指针,于是就可以得到被调用类方法的相关信息了。
函数InvokeWithArgArray的实现如下所示:
// ~/android-6.0.1_r62/art/runtime/reflection.cc
426 static void InvokeWithArgArray(const ScopedObjectAccessAlreadyRunnable& soa,
427 ArtMethod* method, ArgArray* arg_array, JValue* result,
428 const char* shorty)
429 SHARED_LOCKS_REQUIRED(Locks::mutator_lock_) {
430 uint32_t* args = arg_array->GetArray();
431 if (UNLIKELY(soa.Env()->check_jni)) {
432 CheckMethodArguments(soa.Vm(), method->GetInterfaceMethodIfProxy(sizeof(void*)), args);
433 }
434 method->Invoke(soa.Self(), args, arg_array->GetNumBytes(), result, shorty);
435 }
函数InvokeWithArgArray通过ArtMethod类的成员函数Invoke来调用参数method指定的类方法。
ArtMethod类的成员函数Invoke的实现如下所示
368 void ArtMethod::Invoke(Thread* self, uint32_t* args, uint32_t args_size, JValue* result,
369 const char* shorty) {
...
381 // Push a transition back into managed code onto the linked list in thread.
382 ManagedStack fragment;
383 self->PushManagedStackFragment(&fragment);
384
385 Runtime* runtime = Runtime::Current();
386 // Call the invoke stub, passing everything as arguments.
387 // If the runtime is not yet started or it is required by the debugger, then perform the
388 // Invocation by the interpreter.
389 if (UNLIKELY(!runtime->IsStarted() || Dbg::IsForcedInterpreterNeededForCalling(self, this))) {
390 if (IsStatic()) {
391 art::interpreter::EnterInterpreterFromInvoke(self, this, nullptr, args, result);
392 } else {
393 mirror::Object* receiver =
394 reinterpret_cast<StackReference<mirror::Object>*>(&args[0])->AsMirrorPtr();
395 art::interpreter::EnterInterpreterFromInvoke(self, this, receiver, args + 1, result);
396 }
397 } else {
398 DCHECK_EQ(runtime->GetClassLinker()->GetImagePointerSize(), sizeof(void*));
399
400 constexpr bool kLogInvocationStartAndReturn = false;
401 bool have_quick_code = GetEntryPointFromQuickCompiledCode() != nullptr;
402 if (LIKELY(have_quick_code)) {
403 if (kLogInvocationStartAndReturn) {
404 LOG(INFO) << StringPrintf(
405 "Invoking '%s' quick code=%p static=%d", PrettyMethod(this).c_str(),
406 GetEntryPointFromQuickCompiledCode(), static_cast<int>(IsStatic() ? 1 : 0));
407 }
408
409 // Ensure that we won't be accidentally calling quick compiled code when -Xint.
410 if (kIsDebugBuild && runtime->GetInstrumentation()->IsForcedInterpretOnly()) {
411 DCHECK(!runtime->UseJit());
412 CHECK(IsEntrypointInterpreter())
413 << "Don't call compiled code when -Xint " << PrettyMethod(this);
414 }
415
416 #if defined(__LP64__) || defined(__arm__) || defined(__i386__)
417 if (!IsStatic()) {
418 (*art_quick_invoke_stub)(this, args, args_size, self, result, shorty);
419 } else {
420 (*art_quick_invoke_static_stub)(this, args, args_size, self, result, shorty);
421 }
422 #else
423 (*art_quick_invoke_stub)(this, args, args_size, self, result, shorty);
424 #endif
425 if (UNLIKELY(self->GetException() == Thread::GetDeoptimizationException())) {
426 // Unusual case where we were running generated code and an
427 // exception was thrown to force the activations to be removed from the
428 // stack. Continue execution in the interpreter.
429 self->ClearException();
430 ShadowFrame* shadow_frame =
431 self->PopStackedShadowFrame(StackedShadowFrameType::kDeoptimizationShadowFrame);
432 result->SetJ(self->PopDeoptimizationReturnValue().GetJ());
433 self->SetTopOfStack(nullptr);
434 self->SetTopOfShadowStack(shadow_frame);
435 interpreter::EnterInterpreterFromDeoptimize(self, shadow_frame, result);
436 }
437 if (kLogInvocationStartAndReturn) {
438 LOG(INFO) << StringPrintf("Returned '%s' quick code=%p", PrettyMethod(this).c_str(),
439 GetEntryPointFromQuickCompiledCode());
440 }
441 } else {
442 LOG(INFO) << "Not invoking '" << PrettyMethod(this) << "' code=null";
443 if (result != nullptr) {
444 result->SetJ(0);
445 }
446 }
447 }
448
449 // Pop transition.
450 self->PopManagedStackFragment(fragment);
451 }
ArtMethod类的成员函数Invoke的执行逻辑如下所示:
1. 构造一个类型为ManagedStack的调用栈帧。这些调用栈帧会保存在当前线程对象的一个链表中,在进行垃圾收集会使用到。
2. 如果ART运行时还没有启动,那么这时候是不能够调用任何类方法的,因此就直接返回。否则,继续往下执行。
3. 从前面的函数LinkCode可以知道,无论一个类方法是通过解释器执行,还是直接以本地机器指令执行,均可以通过ArtMethod类的成员函数GetEntryPointFromCompiledCode获得其入口点,并且该入口不为NULL。不过,这里并没有直接调用该入口点,而是通过Stub来间接调用。这是因为我们需要设置一些特殊的寄存器。如果是64位或者arm或者i386架构的:1/不是静态方法,那么调用art_ quick_invoke_stub;2/是静态方法,调用art_quick_invoke_static_stub,否则调用art_quick_invoke_ stub。由于我们考虑的是arm64架构的,所以会调用art_quick_invoke_ stub或者art_ quick_ invoke_static_stub。
4. 如果在执行类方法的过程中,出现了一个值为-1的异常,那么就在运行生成的本地机器指令出现了问题,这时候就通过解释器来继续执行。每次通过解释器执行一个类方法的时候,都需要构造一个类型为ShadowFrame的调用栈帧。这些调用栈帧同样是在垃圾回收时使用到。
接下来我们主要是分析第3步,并且假设目标CPU体系架构为ARM64,这样第3步使用的Stub就为函数art_quick_invoke_stub和art_quick_invoke_static_stub,它们的实现如下所示:
// art/runtime/arch/arm64/quick_entrypoints_arm64.S
510 .macro INVOKE_STUB_CREATE_FRAME
511
512 SAVE_SIZE=15*8 // x4, x5, x19, x20, x21, x22, x23, x24, x25, x26, x27, x28, SP, LR, FP saved.
513 SAVE_SIZE_AND_METHOD=SAVE_SIZE+8
514
515
516 mov x9, sp // Save stack pointer.
517 .cfi_register sp,x9
518
519 add x10, x2, # SAVE_SIZE_AND_METHOD // calculate size of frame.
520 sub x10, sp, x10 // Calculate SP position - saves + ArtMethod* + args
521 and x10, x10, # ~0xf // Enforce 16 byte stack alignment.
522 mov sp, x10 // Set new SP.
523
524 sub x10, x9, #SAVE_SIZE // Calculate new FP (later). Done here as we must move SP
525 .cfi_def_cfa_register x10 // before this.
526 .cfi_adjust_cfa_offset SAVE_SIZE
527
528 str x28, [x10, #112]
529 .cfi_rel_offset x28, 112
530
531 stp x26, x27, [x10, #96]
532 .cfi_rel_offset x26, 96
533 .cfi_rel_offset x27, 104
534
535 stp x24, x25, [x10, #80]
536 .cfi_rel_offset x24, 80
537 .cfi_rel_offset x25, 88
538
539 stp x22, x23, [x10, #64]
540 .cfi_rel_offset x22, 64
541 .cfi_rel_offset x23, 72
542
543 stp x20, x21, [x10, #48]
544 .cfi_rel_offset x20, 48
545 .cfi_rel_offset x21, 56
546
547 stp x9, x19, [x10, #32] // Save old stack pointer and x19.
548 .cfi_rel_offset sp, 32
549 .cfi_rel_offset x19, 40
550
551 stp x4, x5, [x10, #16] // Save result and shorty addresses.
552 .cfi_rel_offset x4, 16
553 .cfi_rel_offset x5, 24
554
555 stp xFP, xLR, [x10] // Store LR & FP.
556 .cfi_rel_offset x29, 0
557 .cfi_rel_offset x30, 8
558
559 mov xFP, x10 // Use xFP now, as it's callee-saved.
560 .cfi_def_cfa_register x29
561 mov xSELF, x3 // Move thread pointer into SELF register.
562
563 // Copy arguments into stack frame.
564 // Use simple copy routine for now.
565 // 4 bytes per slot.
566 // X1 - source address
567 // W2 - args length
568 // X9 - destination address.
569 // W10 - temporary
570 add x9, sp, #8 // Destination address is bottom of stack + null.
571
572 // Use \@ to differentiate between macro invocations.
573 .LcopyParams\@:
574 cmp w2, #0
575 beq .LendCopyParams\@
576 sub w2, w2, #4 // Need 65536 bytes of range.
577 ldr w10, [x1, x2]
578 str w10, [x9, x2]
579
580 b .LcopyParams\@
581
582 .LendCopyParams\@:
583
584 // Store null into ArtMethod* at bottom of frame.
585 str xzr, [sp]
586 .endm
657 * extern"C" void art_quick_invoke_stub(ArtMethod *method, x0
658 * uint32_t *args, x1
659 * uint32_t argsize, w2
660 * Thread *self, x3
661 * JValue *result, x4
662 * char *shorty); x5
663 * +----------------------+
664 * | |
665 * | C/C++ frame |
666 * | LR'' |
667 * | FP'' | <- SP'
668 * +----------------------+
669 * +----------------------+
670 * | x28 | <- TODO: Remove callee-saves.
671 * | : |
672 * | x19 |
673 * | SP' |
674 * | X5 |
675 * | X4 | Saved registers
676 * | LR' |
677 * | FP' | <- FP
678 * +----------------------+
679 * | uint32_t out[n-1] |
680 * | : : | Outs
681 * | uint32_t out[0] |
682 * | ArtMethod* | <- SP value=null
683 * +----------------------+
684 *
685 * Outgoing registers:
686 * x0 - Method*
687 * x1-x7 - integer parameters.
688 * d0-d7 - Floating point parameters.
689 * xSELF = self
690 * SP = & of ArtMethod*
691 * x1 = "this" pointer.
692 *
693 */
694 ENTRY art_quick_invoke_stub
695 // Spill registers as per AACPS64 calling convention.
696 INVOKE_STUB_CREATE_FRAME
697
698 // Fill registers x/w1 to x/w7 and s/d0 to s/d7 with parameters.
699 // Parse the passed shorty to determine which register to load.
700 // Load addresses for routines that load WXSD registers.
701 adr x11, .LstoreW2
702 adr x12, .LstoreX2
703 adr x13, .LstoreS0
704 adr x14, .LstoreD0
705
706 // Initialize routine offsets to 0 for integers and floats.
707 // x8 for integers, x15 for floating point.
708 mov x8, #0
709 mov x15, #0
710
711 add x10, x5, #1 // Load shorty address, plus one to skip return value.
712 ldr w1, [x9],#4 // Load "this" parameter, and increment arg pointer.
713
714 // Loop to fill registers.
715 .LfillRegisters:
716 ldrb w17, [x10], #1 // Load next character in signature, and increment.
717 cbz w17, .LcallFunction // Exit at end of signature. Shorty 0 terminated.
718
719 cmp w17, #'F' // is this a float?
720 bne .LisDouble
721
722 cmp x15, # 8*12 // Skip this load if all registers full.
723 beq .Ladvance4
724
725 add x17, x13, x15 // Calculate subroutine to jump to.
726 br x17
727
728 .LisDouble:
729 cmp w17, #'D' // is this a double?
730 bne .LisLong
731
732 cmp x15, # 8*12 // Skip this load if all registers full.
733 beq .Ladvance8
734
735 add x17, x14, x15 // Calculate subroutine to jump to.
736 br x17
737
738 .LisLong:
739 cmp w17, #'J' // is this a long?
740 bne .LisOther
741
742 cmp x8, # 6*12 // Skip this load if all registers full.
743 beq .Ladvance8
744
745 add x17, x12, x8 // Calculate subroutine to jump to.
746 br x17
747
748 .LisOther: // Everything else takes one vReg.
749 cmp x8, # 6*12 // Skip this load if all registers full.
750 beq .Ladvance4
751
752 add x17, x11, x8 // Calculate subroutine to jump to.
753 br x17
754
755 .Ladvance4:
756 add x9, x9, #4
757 b .LfillRegisters
758
759 .Ladvance8:
760 add x9, x9, #8
761 b .LfillRegisters
762
763 // Macro for loading a parameter into a register.
764 // counter - the register with offset into these tables
765 // size - the size of the register - 4 or 8 bytes.
766 // register - the name of the register to be loaded.
767 .macro LOADREG counter size register return
768 ldr \register , [x9], #\size
769 add \counter, \counter, 12
770 b \return
771 .endm
772
773 // Store ints.
774 .LstoreW2:
775 LOADREG x8 4 w2 .LfillRegisters
776 LOADREG x8 4 w3 .LfillRegisters
777 LOADREG x8 4 w4 .LfillRegisters
778 LOADREG x8 4 w5 .LfillRegisters
779 LOADREG x8 4 w6 .LfillRegisters
780 LOADREG x8 4 w7 .LfillRegisters
781
782 // Store longs.
783 .LstoreX2:
784 LOADREG x8 8 x2 .LfillRegisters
785 LOADREG x8 8 x3 .LfillRegisters
786 LOADREG x8 8 x4 .LfillRegisters
787 LOADREG x8 8 x5 .LfillRegisters
788 LOADREG x8 8 x6 .LfillRegisters
789 LOADREG x8 8 x7 .LfillRegisters
790
791 // Store singles.
792 .LstoreS0:
793 LOADREG x15 4 s0 .LfillRegisters
794 LOADREG x15 4 s1 .LfillRegisters
795 LOADREG x15 4 s2 .LfillRegisters
796 LOADREG x15 4 s3 .LfillRegisters
797 LOADREG x15 4 s4 .LfillRegisters
798 LOADREG x15 4 s5 .LfillRegisters
799 LOADREG x15 4 s6 .LfillRegisters
800 LOADREG x15 4 s7 .LfillRegisters
801
802 // Store doubles.
803 .LstoreD0:
804 LOADREG x15 8 d0 .LfillRegisters
805 LOADREG x15 8 d1 .LfillRegisters
806 LOADREG x15 8 d2 .LfillRegisters
807 LOADREG x15 8 d3 .LfillRegisters
808 LOADREG x15 8 d4 .LfillRegisters
809 LOADREG x15 8 d5 .LfillRegisters
810 LOADREG x15 8 d6 .LfillRegisters
811 LOADREG x15 8 d7 .LfillRegisters
812
813
814 .LcallFunction:
815
816 INVOKE_STUB_CALL_AND_RETURN
817
818 END art_quick_invoke_stub
函数art_quick_invoke_ stub前面的注释列出了 函数art_ quick_ invoke_stub被调用的时候,寄存器X0-X5的值,以及调用栈顶端的两个值。其中,X0指向当前被调用的类方法,X1指向一个参数数组地址,W2记录参数数组的大小,X3指向当前线程。调用栈顶端的两个元素分别用来保存调用结果及其类型。
无论一个类方法是通过解释器执行,还是直接以本地机器指令执行,当它被调用时,都有着特殊的调用约定。其中,寄存器xSELF(x18)指向用来描述当前调用线程的一个Thread对象地址,这样本地机器指令在执行的过程中,就可以通过它来定位线程的相关信息,例如我们在前面描述的各种函数跳转表;寄存器r4初始化为一个计数值,当计数值递减至0时,就需要检查当前线程是否已经被挂起;寄存器x0指向用来描述被调用类方法的一个ArtMethod对象地址。
所有传递给被调用方法的参数都会保存在调用栈中,因此,在进入类方法的入口点之前,需要在栈中预留足够的位置,并且通过调用memcpy函数将参数都拷贝到预留的栈位置去。同时,前面7个参数还会额外地保存在寄存器x1-x7中。这样对于小于等于7个参数的类方法,就可以通过访问寄存器来快速地获得参数。
注意,传递给被调用类方法的参数并不是从栈顶第一个位置(一个位置等于一个字长,即8个字节)开始保存的,而是从第二个位置开始的,即sp + 8。这是因为栈顶的第一个位置是预留用来保存用来描述当调用类方法(Caller)的ArtMethod对象地址的。由于函数art_quick_invoke_stub是用来从外部进入到ART运行时的,即不存在调用类方法,因此这时候栈顶第一个位置会被设置为NULL。
准备好调用栈帧之后,就找到从用来描述当前调用类方法的ArtMethod对象地址偏移ART_METHOD_QUICK_CODE_OFFSET_64处的值,并且以该值作为类方法的执行入口点,最后通过blr指令跳过去执行。
//~/android-6.0.1_r62/art/runtime/asm_support.h中:
196 #define ART_METHOD_QUICK_CODE_OFFSET_64 48
197 ADD_TEST_EQ(ART_METHOD_QUICK_CODE_OFFSET_64,
198 art::ArtMethod::EntryPointFromQuickCompiledCodeOffset(8).Int32Value())
以下为INVOKE_STUB_CREATE_FRAME宏的一些理解:
INVOKE_STUB_CREATE_FRAME为一个宏,负责按照AACPS64decalling convention来spill registers。
1.确定要保存的数据的大小SAVE_SIZE=15*8 和SAVE_SIZE_AND_METHOD=SAVE_SIZE+8。
2.保存栈指针,将sp寄存器的内容存入x9寄存器中。
注意:伪指令是不参与CPU运行的,只指导编译链接过程。比如,代码中以“.cfi”开头的伪指令是辅助汇编器创建栈帧(stack frame)信息的。
3.计算栈帧的大小,将x2寄存器中的内容加上SAVE_SIZE_AND_METHOD的值,存入x10寄存器中。
计算sp指针的位置,saves+ArtMethod*+args.寄存器sp减去x10寄存器中的值,存入x10中。
强制进行16字节栈对齐。
将x10的值复制到sp寄存器中,即设置新的栈指针。
计算新的FP,将x9即原栈指针的值-#SAVE_SIZE的值,存入x10中。
4.将寄存器的值存入栈中,对应的是:
x28存入sp+112的地方,x27存入sp+104,x26存入sp+96,….x19存入sp+40,x9(原sp地址)存入sp+32,将x4(Result)存入sp+16, x5(shorty addresses)存入sp+24,将X30(LR)存入sp+8,x29(FP)存入sp+0。将x10也就是现在的sp的值复制到xFP(x29), 将x3中的值复制到xSELF(x18)。
5.以下是将arguments存入栈帧中,每个slot是4字节,x1为args的地址,w2为args的长度,x9为destination address,w10是temporary。
将sp的值+8存入x9中
进入循环,判断w2是否为0,若为0,则跳转到.LendCopyParams\@ :将xzr(0)存入sp中
否则,w2=w2-4,用x1中的值(args的地址)+x2的值(args)得到的地址,取其中的值存入w10中,然后将w10的值存入x9+x2得到的地址中,一直到w2为0.
形式如:
663 * +———————-+
664 * | |
665 * | C/C++ frame |
666 * | LR” |
667 * | FP” | <- SP’
668 * +———————-+
669 * +———————-+
670 * | x28 | <- TODO: Remove callee-saves.
671 * | : |
672 * | x19 |
673 * | SP’ |
674 * | X5 |
675 * | X4 | Saved registers
676 * | LR’ |
677 * | FP’ | <- FP
678 * +———————-+
679 * | uint32_t out[n-1] |
680 * | : : | Outs
681 * | uint32_t out[0] |
682 * | ArtMethod* | <- SP value=null
683 * +———————-+