Many times in my career so far I could have made better decisions, solved problems much faster or even prevented problems if I had a deeper understanding of the technology stack I was using.
To begin getting a deeper understanding of the Scala stack, we will trace a simple program through it's lifetime of becoming Java, then JVM bytecode and then the final step before it's fed to the CPU monsters - assembly.
Starting with a high-level overview like this is a good foundation on which to explore each step in more detail. Because I'm nice I'll give you lots of links at the bottom of this post so you can do just that.
All code relevant to this post is up on my github .
It looks like Scala creates both an instance and static class as a way to enforce the singleton pattern, which objects are. This is also the entry point to the application, which is maybe what the Scala signature is doing here?
We're now ready to cook.
After running sbt run my terminal erupts with lines of text that are hard to make out. Luckily I pipe it into a file, which on later inspection happens to be full of intel instructions. Here's a couple of snippets:
Start of the application
Method Decompilation with Annotations
To begin getting a deeper understanding of the Scala stack, we will trace a simple program through it's lifetime of becoming Java, then JVM bytecode and then the final step before it's fed to the CPU monsters - assembly.
Starting with a high-level overview like this is a good foundation on which to explore each step in more detail. Because I'm nice I'll give you lots of links at the bottom of this post so you can do just that.
All code relevant to this post is up on my github .
Scala
Our initial block of code is a scala object with a main method. When the application is started this main method will be run and, as you can see below, it's going to print something stupid to the console and finish executing.
package kns
object KnowYourStack {
def main(args: Array[String]) {
val message = "Fat stacks, yo!"
println(message)
}
}
Compiled to Java
When Scala code is compiled with
sbt it's already output as bytecode. However, we can use one of the java decompilers to show us what the java code would have looked like.
First I'll run sbt to compile the Scala. This turns our KnowYourStack.scala into two .class files as shown below:
I'll now use
jd-gui to disassemble them back into java. Here are the results:
package kns;
import scala.Predef.;
public final class KnowYourStack$
{
public static final MODULE$;
static
{
new ();
}
public void main(String[] args)
{
String message = "Fat stacks, yo!";
Predef..MODULE$.println(message);
}
private KnowYourStack$()
{
MODULE$ = this;
}
}
package kns;
import scala.reflect.ScalaSignature;
@ScalaSignature(bytes="\006\001\025:Q!\001\002\t\002\025\tQb\0238pof{WO]*uC\016\\'\"A\002\002\007-t7o\001\001\021\005\0319Q\"\001\002\007\013!\021\001\022A\005\003\033-swn^-pkJ\034F/Y2l'\t9!\002\005\002\f\0355\tABC\001\016\003\025\0318-\0317b\023\tyAB\001\004B]f\024VM\032\005\006#\035!\tAE\001\007y%t\027\016\036 \025\003\025AQ\001F\004\005\002U\tA!\\1j]R\021a#\007\t\003\027]I!\001\007\007\003\tUs\027\016\036\005\0065M\001\raG\001\005CJ<7\017E\002\f9yI!!\b\007\003\013\005\023(/Y=\021\005}\021cBA\006!\023\t\tC\"\001\004Qe\026$WMZ\005\003G\021\022aa\025;sS:<'BA\021\r\001")
public final class KnowYourStack
{
public static void main(String[] paramArrayOfString)
{
KnowYourStack..MODULE$.main(paramArrayOfString);
}
}
Next Stop: Bytecode
A JVM takes bytecode and
JITs it to machine code at runtime. It will even compile hot code that is being JITd frequently. The bytecode of our two generated .class files is below using
javap.
➜ javap -c -p KnowYourStack\$ Compiled from "KnowYourStack.scala" public final class kns.KnowYourStack$ { public static final kns.KnowYourStack$ MODULE$; public static {}; Code: 0: new #2 // class kns/KnowYourStack$ 3: invokespecial #12 // Method "":()V 6: return public void main(java.lang.String[]); Code: 0: ldc #16 // String Fat stacks, yo! 2: astore_2 3: getstatic #21 // Field scala/Predef$.MODULE$:Lscala/Predef$; 6: aload_2 7: invokevirtual #25 // Method scala/Predef$.println:(Ljava/lang/Object;)V 10: return private kns.KnowYourStack$(); Code: 0: aload_0 1: invokespecial #31 // Method java/lang/Object."":()V 4: aload_0 5: putstatic #33 // Field MODULE$:Lkns/KnowYourStack$; 8: return }
➜ javap -c -p KnowYourStack Compiled from "KnowYourStack.scala" public final class kns.KnowYourStack { public static void main(java.lang.String[]); Code: 0: getstatic #16 // Field kns/KnowYourStack$.MODULE$:Lkns/KnowYourStack$; 3: aload_0 4: invokevirtual #18 // Method kns/KnowYourStack$.main:([Ljava/lang/String;)V 7: return }
Interesting.... where did our string go? We can see it being loaded as constant 16 (ldc #16) and accessed later (getstatic #16) but where is the declaration? A lot of these instructions make a lot of sense when you've read/watched the links I've posted at the bottom of this post.
Final Destination: Assembly
I've got 2 ravenous intel CPUs inside in my i5 Ivy Bridge; both of whom have an insatiable appetite for x86_64 soup. Let's see what we can knock up from our base of bytecode special ingredients.
To get assembly out of the JVM you need to pass two flags. We do that like this in sbt:
name := "know your stack" scalaVersion := "2.10.2" fork := true javaOptions += "-XX:+UnlockDiagnosticVMOptions" javaOptions += "-XX:+PrintAssembly"
Note also the fork settings is set to true. This is because sbt needs to fork a JVM with the settings applied. You'll also need to listen to
this smart guy.
We're now ready to cook.
After running sbt run my terminal erupts with lines of text that are hard to make out. Luckily I pipe it into a file, which on later inspection happens to be full of intel instructions. Here's a couple of snippets:
Start of the application
Running kns.KnowYourStack OpenJDK 64-Bit Server VM warning: PrintAssembly is enabled; turning on DebugNonSafepoints to gain additional output Loaded disassembler from /usr/lib/jvm/java-7-openjdk/jre/lib/amd64/hsdis-amd64.so Decoding compiled method 0x00007f3419060250: Code: [Disassembling for mach='i386:x86-64'] [0m [Entry Point] [Constants] [# {method} 'readLine' '([BII)I' in 'java/util/jar/Manifest$FastInputStream' # this: rsi:rsi = 'java/util/jar/Manifest$FastInputStream' # parm0: rdx:rdx = '[B' # parm1: rcx = int # parm2: r8 = int # [sp+0x60] (sp of caller) 0x00007f34190603a0: mov 0x8(%rsi),%r10d 0x00007f34190603a4: shl $0x3,%r10 0x00007f34190603a8: cmp %r10,%rax 0x00007f34190603ab: jne 0x00007f3419037960 ; {runtime_call} 0x00007f34190603b1: xchg %ax,%ax 0x00007f34190603b4: nopl 0x0(%rax,%rax,1) 0x00007f34190603bc: xchg %ax,%ax [Verified Entry Point] 0x00007f34190603c0: mov %eax,-0x14000(%rsp) 0x00007f34190603c7: push %rbp 0x00007f34190603c8: sub $0x50,%rsp ;*synchronization entry
Method Decompilation with Annotations
[Decoding compiled method 0x00007f3419062f50: [Code: [Entry Point] [Verified Entry Point] [Constants] # {method} 'indexOf' '([CII[CIII)I' in 'java/lang/String' [ # parm0: rsi:rsi = '[C' [ # parm1: rdx = int [ # parm2: rcx = int [ # parm3: r8:r8 = '[C' [ # parm4: r9 = int [ # parm5: rdi = int [ # parm6: [sp+0x50] = int (sp of caller) [ 0x00007f34190630a0: mov %eax,-0x14000(%rsp) [ 0x00007f34190630a7: push %rbp [ 0x00007f34190630a8: sub $0x40,%rsp ;*synchronization entry [ ; - java.lang.String::indexOf@-1 (line 1718) [ 0x00007f34190630ac: mov %rsi,0x18(%rsp) [ 0x00007f34190630b1: mov %edx,0x10(%rsp) [ 0x00007f34190630b5: mov %r9d,(%rsp) [ 0x00007f34190630b9: mov %ecx,0x8(%rsp) [ 0x00007f34190630bd: mov 0x50(%rsp),%ebp [ 0x00007f34190630c1: cmp %ecx,%ebp [ 0x00007f34190630c3: jge 0x00007f3419063481 ;*if_icmplt [
You can find the full output
on my github. Just don't ask me to explain what all of those registers are used for.... yet.