Know Your Stack - Scala to Java to Bytecode to Assembly

Many times in my career so far I could have made better decisions, solved problems much faster or even prevented problems if I had a deeper understanding of the technology stack I was using.

To begin getting a deeper understanding of the Scala stack, we will trace a simple program through it's lifetime of becoming Java, then JVM bytecode and then the final step before it's fed to the CPU monsters - assembly.

Starting with a high-level overview like this is a good foundation on which to explore each step in more detail. Because I'm nice I'll give you lots of links at the bottom of this post so you can do just that.

All code relevant to this post is  up on my github .

Scala

Our initial block of code is a scala object with a main method. When the application is started this main method will be run and, as you can see below, it's going to print something stupid to the console and finish executing.
package kns

object KnowYourStack {
 
 def main(args: Array[String]) {
  val message = "Fat stacks, yo!"
  println(message)
 }
}

    

Compiled to Java

When Scala code is compiled with  sbt it's already output as bytecode. However, we can use one of the java decompilers to show us what the java code would have looked like.

First I'll run sbt to compile the Scala. This turns our KnowYourStack.scala into two .class files as shown below:



I'll now use  jd-gui to disassemble them back into java. Here are the results:
package kns;

import scala.Predef.;

public final class KnowYourStack$
{
  public static final  MODULE$;

  static
  {
    new ();
  }

  public void main(String[] args)
  {
    String message = "Fat stacks, yo!";
    Predef..MODULE$.println(message);
  }

  private KnowYourStack$()
  {
    MODULE$ = this;
  }
}
    

package kns;

import scala.reflect.ScalaSignature;

@ScalaSignature(bytes="\006\001\025:Q!\001\002\t\002\025\tQb\0238pof{WO]*uC\016\\'\"A\002\002\007-t7o\001\001\021\005\0319Q\"\001\002\007\013!\021\001\022A\005\003\033-swn^-pkJ\034F/Y2l'\t9!\002\005\002\f\0355\tABC\001\016\003\025\0318-\0317b\023\tyAB\001\004B]f\024VM\032\005\006#\035!\tAE\001\007y%t\027\016\036 \025\003\025AQ\001F\004\005\002U\tA!\\1j]R\021a#\007\t\003\027]I!\001\007\007\003\tUs\027\016\036\005\0065M\001\raG\001\005CJ<7\017E\002\f9yI!!\b\007\003\013\005\023(/Y=\021\005}\021cBA\006!\023\t\tC\"\001\004Qe\026$WMZ\005\003G\021\022aa\025;sS:<'BA\021\r\001")
public final class KnowYourStack
{
  public static void main(String[] paramArrayOfString)
  {
    KnowYourStack..MODULE$.main(paramArrayOfString);
  }
}
It looks like Scala creates both an instance and static class as a way to enforce the singleton pattern, which objects are. This is also the entry point to the application, which is maybe what the Scala signature is doing here?

Next Stop: Bytecode

A JVM takes bytecode and  JITs it to machine code at runtime. It will even compile hot code that is being JITd frequently. The bytecode of our two generated .class files is below using  javap.

➜ javap -c -p KnowYourStack\$ 

Compiled from "KnowYourStack.scala"

public final class kns.KnowYourStack$ {
  public static final kns.KnowYourStack$ MODULE$;

  public static {};
    Code:
       0: new           #2                  // class kns/KnowYourStack$
       3: invokespecial #12                 // Method "":()V
       6: return        

  public void main(java.lang.String[]);
    Code:
       0: ldc           #16                 // String Fat stacks, yo!
       2: astore_2      
       3: getstatic     #21                 // Field scala/Predef$.MODULE$:Lscala/Predef$;
       6: aload_2       
       7: invokevirtual #25                 // Method scala/Predef$.println:(Ljava/lang/Object;)V
      10: return        

  private kns.KnowYourStack$();
    Code:
       0: aload_0       
       1: invokespecial #31                 // Method java/lang/Object."":()V
       4: aload_0       
       5: putstatic     #33                 // Field MODULE$:Lkns/KnowYourStack$;
       8: return        
}


➜ javap -c -p KnowYourStack  

Compiled from "KnowYourStack.scala"

public final class kns.KnowYourStack {
  public static void main(java.lang.String[]);
    Code:
       0: getstatic     #16                 // Field kns/KnowYourStack$.MODULE$:Lkns/KnowYourStack$;
       3: aload_0       
       4: invokevirtual #18                 // Method kns/KnowYourStack$.main:([Ljava/lang/String;)V
       7: return        
}


Interesting.... where did our string go? We can see it being loaded as constant 16 (ldc #16) and accessed later (getstatic #16) but where is the declaration? A lot of these instructions make a lot of sense when you've read/watched the links I've posted at the bottom of this post.

Final Destination: Assembly

I've got 2 ravenous intel CPUs inside in my i5 Ivy Bridge; both of whom have an insatiable appetite for x86_64 soup. Let's see what we can knock up from our base of bytecode special ingredients.

To get assembly out of the JVM you need to pass two flags. We do that like this in sbt:

name := "know your stack"

scalaVersion := "2.10.2"

fork := true

javaOptions += "-XX:+UnlockDiagnosticVMOptions"

javaOptions += "-XX:+PrintAssembly"

Note also the fork settings is set to true. This is because sbt needs to fork a JVM with the settings applied. You'll also need to listen to  this smart guy.

We're now ready to cook.

After running  sbt run my terminal erupts with lines of text that are hard to make out. Luckily I pipe it into a file, which on later inspection happens to be full of intel instructions. Here's a couple of snippets:

Start of the application
Running kns.KnowYourStack  
OpenJDK 64-Bit Server VM warning: PrintAssembly is enabled; turning on DebugNonSafepoints to gain additional output 
Loaded disassembler from /usr/lib/jvm/java-7-openjdk/jre/lib/amd64/hsdis-amd64.so 
Decoding compiled method 0x00007f3419060250: 
Code: 
[Disassembling for mach='i386:x86-64'] [0m
[Entry Point] 
[Constants] 
 [# {method} 'readLine' '([BII)I' in 'java/util/jar/Manifest$FastInputStream' 
 # this:     rsi:rsi   = 'java/util/jar/Manifest$FastInputStream' 
 # parm0:    rdx:rdx   = '[B' 
  # parm1:    rcx       = int 
  # parm2:    r8        = int
  #           [sp+0x60]  (sp of caller) 
  0x00007f34190603a0: mov    0x8(%rsi),%r10d 
  0x00007f34190603a4: shl    $0x3,%r10 
  0x00007f34190603a8: cmp    %r10,%rax 
 0x00007f34190603ab: jne    0x00007f3419037960  ;   {runtime_call} 
  0x00007f34190603b1: xchg   %ax,%ax 
  0x00007f34190603b4: nopl   0x0(%rax,%rax,1) 
  0x00007f34190603bc: xchg   %ax,%ax 
 [Verified Entry Point] 
  0x00007f34190603c0: mov    %eax,-0x14000(%rsp) 
  0x00007f34190603c7: push   %rbp 
  0x00007f34190603c8: sub    $0x50,%rsp         ;*synchronization entry 

Method Decompilation with Annotations
[Decoding compiled method 0x00007f3419062f50: 
  [Code: 
 [Entry Point] 
 [Verified Entry Point] 
 [Constants] 
  # {method} 'indexOf' '([CII[CIII)I' in 'java/lang/String' [
  # parm0:    rsi:rsi   = '[C' [
  # parm1:    rdx       = int [
  # parm2:    rcx       = int [
  # parm3:    r8:r8     = '[C' [
  # parm4:    r9        = int [
  # parm5:    rdi       = int [
  # parm6:    [sp+0x50]   = int  (sp of caller) [
  0x00007f34190630a0: mov    %eax,-0x14000(%rsp) [
  0x00007f34190630a7: push   %rbp [
  0x00007f34190630a8: sub    $0x40,%rsp         ;*synchronization entry [
                                                 ; - java.lang.String::indexOf@-1 (line 1718) [
  0x00007f34190630ac: mov    %rsi,0x18(%rsp) [
  0x00007f34190630b1: mov    %edx,0x10(%rsp) [
  0x00007f34190630b5: mov    %r9d,(%rsp) [
 0x00007f34190630b9: mov    %ecx,0x8(%rsp) [
  0x00007f34190630bd: mov    0x50(%rsp),%ebp [
  0x00007f34190630c1: cmp    %ecx,%ebp [
  0x00007f34190630c3: jge    0x00007f3419063481  ;*if_icmplt [

You can find the full output  on  my github. Just don't ask me to explain what all of those registers are used for.... yet.
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值