Alex Kalinovsky《Covert Java》第三章节Obfuscating Classes翻译（三）

最新推荐文章于 2019-04-23 10:16:09 发布

sery

最新推荐文章于 2019-04-23 10:16:09 发布

阅读量1k

点赞数

文章标签： java exception string jvm methods class

混淆器的转化作用

经过混淆作用的输出结果并没有统一的标准，所以不同混淆器所能提供的保护级别是不同的。下面会给大家看一下经过混淆器作用以后常见的输出结果。我们以程序片段 ChatServer's sendMessage 方法为混淆的例子，来了解一下经过混淆器的作用，对于反编译器会产生什么影响。表3.1里列出了 sendMessage 的源代码

表3.1
public void sendMessage(String host, String message) throws Exception {
if (host == null || host.trim().length() == 0)
throw new Exception ("Please specify host name");

    System.out.println("Sending message to host " + host + ": " + message);
    String url = "//" + host + ":" + this.registryPort + "/chatserver";
    ChatServerRemote remoteServer = (ChatServerRemote)Naming.lookup(url);

    MessageInfo messageInfo = new MessageInfo(this.hostName, this.userName);
    remoteServer.receiveMessage(message, messageInfo);
    System.out.println("Message sent to host " + host);
}

java 的字节码会保留，编译器为了帮助调试 “运行代码” 而插入的信息。由 javac 插入的信息会包括以下几种：行号，变量名，原文件名。对于运行class 而言，调试信息是没有利用价值的，但是它可以帮助调试者把字节码和源代码联系起来。反编译器会利用这些信息，更好的重新构架源代码。如果可以得到class 文件全部的调试信息，那么反编译代码几乎可以做到和源代码一模一样。当调试信息被删除掉以后，则文件储存的原来变量名就无法查找，反编译器不得不使用它们自己生成的变量名。在我们的例子里，经过处理以后，sendMessage 的参数名称有可能会变成 s1 和 s2，替代了原来的host 和 message。

名称混淆

开发者会为包，类和方法使用具有实际意义的名称。我们使用 ChatServer 来表示程序中聊天服务器的名称，而发送信息给其他用户的方法起名为 sendMessage。好的命名习惯对于开发和维护工作而言是至关重要的，但是它们对于JVM 来说，没有任何意义。Java Runtime (JRE) 并不在乎 sendMessage 是被叫做 goShopping 或者 abcdefg，它依然会调用并执行程序。通过把那些可读性很高的名称修改成为不具备实际意义的名称，混淆器可以使阅读反编译代码变的十分困难。当ChatServer.sendMessage 变成了 d.a ，代码中其它的名称也都进行了相似的转换，则反编译出来的代码将很难理解含义。一个好的混淆器利用 polymorphism 来使阅读程序的难度加大。源代码中三个具有不同名称和 signatures（即返回值和参数，我记不起来中文是怎么翻译的了，反正不是“签名”），执行不同功能的方法，在混淆代码中，会被修改成相同的名称。因为它们的signatures 不同，所以使用相同的名称并不会触犯java 语法的错误，只会增加反编译代码的难度。表3.2 列出了经过混淆器删除调试信息和更改程序内名称处理以后，反编译产生的代码

表3.2
public void a(String s, String s1)
    throws Exception
{
    if(s == null || s.trim().length() == 0)
    {
      throw new Exception("Please specify host name");
    } else
    {
      System.out.println(String.valueOf(String.valueOf((
        new StringBuffer("Sending message to host ")
        ).append(s).append(": ").append(s1))));
      String s2 = String.valueOf(String.valueOf((
        new StringBuffer("//")).append(s).append(":")
        .append(b).append("/chatserver")));
      b b1 = (b)Naming.lookup(s2);
      MessageInfo messageinfo = new MessageInfo(e, f);
      b1.receiveMessage(s1, messageinfo);
      System.out.println("Message sent to host ".concat(
         String.valueOf(String.valueOf(s))));
      return;
    }
}

加密java 的string
java string 在字节码里的保存方式就好象文本一样。大部分书写习惯良好的程序，它们在代码里都包含有一些线索，可以通过代码执行产生的日志，来调试以及查询这些线索。即使类和方法的名称已经修改了，但是根据方法里写入执行日志或者控制台的 string 内容，依然可以被用来发现方法的运行功能。在我们的例子里，ChatServer.sendMessage 会输出一个线索信息，代码如下：

System.out.println("Sending message to host " + host + ": " + message);

即使ChatServer.sendMessage 已经被改名为 d.a，当你看到反编译的 message 那部分代码段的时候，线索信息依然可以很明确的告诉你这个方法是做什么的。但是，如果字节码里的这个 string 被加密过了，在反编译后的类代码中，它看起来是这个样子：

System.out.println(String.valueOf(String.valueOf((new
StringBuffer(a("A/025wV6|/0279_:a/003xU:2/004v/0227}/003m/022"))
).append(s).append(a("(P")).append(s1))));

如果你仔细看这个加密的 string ，你可以发现，程序先要执行一个a() 方法，它是用来解秘 string 并向 System.out.println() 返回一个可读的 sting 结果。string 加密是一个非常有效的代码混淆方法，一般只由商用混淆器提供。

改变控制流

早期混淆器的转换作用只是增加了反向工程理解混淆代码的难度，但是并没有改变根本的 java 代码结构。它们对于保护程序的算法和程序控制流无能为力，而这两者恰恰是程序设计中最重要的部分。刚才显示的ChatServer.sendMessage 的反编译版本，只要有足够耐心，依然是可以理解和阅读的。你首先可以通过查找代码的有效输入，以及抛出的错误异常来阅读它。然后就可以发现，代码会查找一个远程服务器对象，并执行一个方法。

最好的混淆器会通过插入伪造的条件代码和 goto 语句来更改字节码的控制流。某些情况下，这会使执行速度变慢，但是对于增加知识产权的保护来说，只是一个很小的代价。表3.3 列出了sendMessage 在经过了上面讨论的所有混淆变换以后，出现的样子。

表3.3

public void a(String s, String s1)
    throws Exception
{
    boolean flag = MessageInfo.c;
    s;
    if(flag) goto _L2; else goto _L1
_L1:
    JVM INSTR ifnull 29;
      goto _L3 _L4
_L3:
    s.trim();
_L2:
    if(flag) goto _L6; else goto _L5
_L5:
    length();
    JVM INSTR ifne 42;
      goto _L4 _L7
_L4:
throw new Exception(a("/002)qUe7egDs1,rM6:*g@6<$yQ"));
_L7:
        System.out.println(String.valueOf(String.valueOf((
            new StringBuffer(a("/001 zP/177</"4Ys!6uSsr1{/024~=6´/024"))
            ).append(s).append(a("he")).append(s1))));
        String.valueOf(String.valueOf(
            (new StringBuffer(a("}j"))).append(s).append(":")
            .append(b).append(a("}&|Ub! fBs "))));
_L6:
        String s2;
        s2;
        covertjava.chat.b b1 = (covertjava.chat.b)Naming.lookup(s2);
        MessageInfo messageinfo = new MessageInfo(e, f);
        b1.receiveMessage(s1, messageinfo);
        System.out.println(a("/037 gGw5 4Gs<14@yr-{Gbr").concat(String.valueOf
        ¬(String.valueOf(s))));
        if(flag)
            b.c = !b.c;
        return;

}

现在可以看到，反编译以后的代码一片混乱。sendMessage 的代码看起来几乎没有逻辑性可言。如果对于控制流的混淆转换，会增加更多的 loops，if，逻辑变量，则混淆的效果将更加有效。

插入错误代码

某些混淆器为了防止经过混淆的代码被反编译，会在代码中插入错误代码，但是这项技术一直是颇有争议。这项技术的依据是，Java Runtime对于 java 字节码结构和语法的松散规定。JRE 并没有严格的强迫字节码都必须遵循某一种验证格式，这就允许混淆器向class 文件中插入错误的字节码。插入的代码并不会影响原来代码的运行，但是如果试图要反编译这些代码，反编译程序则会产生错误，或者反编译后的代码会充斥着令人困惑的 JVM INSTR keywords 语句。表3.3 中，显示了反编译器可能如何处理插入的错误代码。使用这种方法的危险性在于，插入的错误代码的程序可能在某一个特定的 JVM 上运行正常，而在别的版本 JVM 上则可能出错。即使对于现在 JVM 来说，程序不会出现太大的问题，这也不保证对于以后的 JVM 就可以运行正常。

删除无用的代码（shrinking）

作为一项附带的功能，大部分混淆器会删除无用的代码，这样可以帮助减少程序的大小。举例，如果一个类叫做A，它有一个根本没有被其它类调用的方法，称为 m()，经过混淆器的转换，m()就会被删除。这个功能对于那些需要通过internet 来下载的程序，或者安装在不确定的电脑环境中的程序来说，十分重要。

优化字节码

另一项混淆器吹嘘的辅助功能是对于代码的优化。卖方会声称，混淆器会酌情把 nonfinal methods 方法变成 final 方法，这样做会提升程序的执行速度（翻译的不确定）。事实上，很难去评估这种做法的实际效果，而且绝大部分的卖方也根本无法提供比较数据。随着版本的不断更新，JIT 编译器的功能已经十分强大，所谓的优化功能在这里并没有实际价值。因此，诸如method finalization 和删除无用代码的功能，只能说可以起到一定的作用。

英语原文

Transformations Performed by Obfuscators No standards exist for obfuscation, so the level of protection varies based on the quality of the obfuscator. The following sections present some of the features commonly found in obfuscators. We will use ChatServer's sendMessage method to illustrate how each transformation affects the decompiled code. The original source code for sendMessage is shown in Listing 3.1. Listing 3.1 Original Source Code of sendMessage public void sendMessage(String host, String message) throws Exception { if (host == null || host.trim().length() == 0) throw new Exception ("Please specify host name"); System.out.println("Sending message to host " + host + ": " + message); String url = "//" + host + ":" + this.registryPort + "/chatserver"; ChatServerRemote remoteServer = (ChatServerRemote)Naming.lookup(url); MessageInfo messageInfo = new MessageInfo(this.hostName, this.userName); remoteServer.receiveMessage(message, messageInfo); System.out.println("Message sent to host " + host); } Stripping Out Debug Information Java bytecode can contain information inserted by the compiler that helps debug the running code. The information inserted by javac can contain some or all of the following: line numbers, variable names, and source filenames. Debug information is not needed to run the class but is used by debuggers to associate the bytecode with the source code. Decompilers use this information to better reconstruct the source code. With full debug information in the class file, the decompiled code is almost identical to the original source code. When the debug information is stripped out, the names that were stored are lost, so decompilers have to generate their own names. In our case, after the stripping, sendMessage parameter names would appear as s1 and s2 instead of host and message. Name Mangling Developers use meaningful names for packages, classes, and methods. Our sample chat application's server implementation is called ChatServer and the method that sends a message to another user is called sendMessage. Good names are crucial for development and maintenance, but they mean nothing to the JVM. Java Runtime (JRE) doesn't care whether sendMessage is called goShopping or abcdefg; it still invokes it and executes it. By renaming the meaningful human-readable names to meaningless machine-generated ones, obfuscators make the task of understanding the decompiled code much harder. What used to be ChatServer.sendMessage becomes d.a; when many classes and methods exist with the same names, the decompiled code is extremely hard to follow. A good obfuscator takes advantage of polymorphism to make matters worse. Three methods with different names and signatures doing different tasks in the original code can be renamed to the same common name in the obfuscated code. Because their signatures are different, it does not violate the Java language specification but adds confusion to the decompiled code. Listing 3.2 shows an example of a decompiled sendMessage after obfuscation that stripped the debugging information and performed name mangling. Listing 3.2 Decompiled sendMessage After Name Mangling public void a(String s, String s1) throws Exception { if(s == null || s.trim().length() == 0) { throw new Exception("Please specify host name"); } else { System.out.println(String.valueOf(String.valueOf(( new StringBuffer("Sending message to host ") ).append(s).append(": ").append(s1)))); String s2 = String.valueOf(String.valueOf(( new StringBuffer("//")).append(s).append(":") .append(b).append("/chatserver"))); b b1 = (b)Naming.lookup(s2); MessageInfo messageinfo = new MessageInfo(e, f); b1.receiveMessage(s1, messageinfo); System.out.println("Message sent to host ".concat( String.valueOf(String.valueOf(s)))); return; } } Encoding Java Strings Java strings are stored as plain text inside the bytecode. Most of the well-written applications have traces inside the code that produce execution logs for debugging and audit trace. Even if class and method names are changed, the strings written by methods to a log file or console can betray the method purpose. In our case, ChatServer.sendMessage outputs a trace message using the following: System.out.println("Sending message to host " + host + ": " + message); Even if ChatServer.sendMessage is renamed to d.a, when you see a trace like this one in the decompiled message body, it is clear what the method does. However, if the string is encoded in bytecode, the decompiled version of the class looks like this: System.out.println(String.valueOf(String.valueOf((new StringBuffer(a("A/025wV6|/0279_:a/003xU:2/004v/0227}/003m/022")) ).append(s).append(a("(P")).append(s1)))); If you look closely at the encoded string, it is first passed to the a() method, which decodes it and returns the readable string to the System.out.println() method. String encoding is a powerful feature that should be provided by a commercial-strength obfuscator. Changing Control Flow The transformations presented earlier make reverse engineering of the obfuscated code harder, but they do not change the fundamental structure of the Java code. They also do nothing to protect the algorithms and program control flow, which is usually the most important part of the innovation. The decompiled version of ChatServer.sendMessage shown earlier is still fairly understandable. You can see that the code first checks for valid input and throws an exception upon error. Then it looks up the remote server object and invokes a method on it. The best obfuscators are capable of transforming the execution flow of bytecode by inserting bogus conditional and goto statements. This can slow down the execution somewhat, but it might be a small price to pay for the increased protection of the IP. Listing 3.3 shows what sendMessage has become after all the transformations discussed earlier have been applied. Listing 3.3 Decompiled sendMessage After All Transformations public void a(String s, String s1) throws Exception { boolean flag = MessageInfo.c; s; if(flag) goto _L2; else goto _L1 _L1: JVM INSTR ifnull 29; goto _L3 _L4 _L3: s.trim(); _L2: if(flag) goto _L6; else goto _L5 _L5: length(); JVM INSTR ifne 42; goto _L4 _L7 _L4: throw new Exception(a("/002)qUe7egDs1,rM6:*g@6<$yQ")); _L7: System.out.println(String.valueOf(String.valueOf(( new StringBuffer(a("/001 zP/177</"4Ys!6uSsr1{/024~=6´/024")) ).append(s).append(a("he")).append(s1)))); String.valueOf(String.valueOf( (new StringBuffer(a("}j"))).append(s).append(":") .append(b).append(a("}&|Ub! fBs ")))); _L6: String s2; s2; covertjava.chat.b b1 = (covertjava.chat.b)Naming.lookup(s2); MessageInfo messageinfo = new MessageInfo(e, f); b1.receiveMessage(s1, messageinfo); System.out.println(a("/037 gGw5 4Gs<14@yr-{Gbr").concat(String.valueOf ¬(String.valueOf(s)))); if(flag) b.c = !b.c; return; } Now that's a total, but powerful, mess! sendMessage is a fairly small method with little conditional logic. If control flow obfuscation was applied to a more complex method with for loops, if statements, and local variables, the obfuscation would be even more effective. Inserting Corrupt Code Inserting corrupt code is a somewhat dubious technique used by some obfuscators to prevent obfuscated classes from decompiling. The technique is based on a loose interpretation of the Java bytecode specification by the Java Runtime. JRE does not strictly enforce all the rules of bytecode format verification, and that allows obfuscators to introduce incorrect bytecode into the class files. The introduced code does not prevent the original code from executing, but an attempt to decompile the class file results in a failure—or at best in confusing source code full of JVM INSTR keywords. Listing 3.3 shows how a decompiler might handle corrupt code. The risk of using this method is that the corrupted code might not run on a version of JVM that more closely adheres to the specification. Even if it is not an issue with the majority of JVMs today, it might become a problem later. Eliminating Unused Code (Shrinking) As an added benefit, most obfuscators remove unused code, which results in application size reduction. For example, if a class called A has a method called m() that is never called by any class, the code for m() is stripped out of A's bytecode. This feature is especially useful for code that is downloaded via the Internet or installed in unsecured environments. Optimizing Bytecode Another added benefit touted by obfuscators is potential code optimization. The vendors claim that declaring nonfinal methods as final where possible and performing minor code improvements can help speed up execution. It is hard to assess the real performance gains, and most vendors do not publish the metrics. What is worth noting here is that, with every new release, JIT compilers are becoming more powerful. Therefore, features such as method finalization and dead code elimination are most likely performed by it anyway.