Inside Code Virtualizer

Written by Scherzo   


This article aim to explain how Code Virtualizer works. During the last month, I spent all my free time analysing the Code Virtualizer Demo 1.0.1.0 unpacked by softworm. Fortunately, I finished my analysis and I can say that this is the best software I have seen before. Not best in the meaning of protection, but in the meaning of organization. This was the most pleasing software I have analysed.

Three important things to notice are that the description and explanation of the code disassembled by OllyDbg is done in the code execution order. Most things that I am going to say are applicable only for the 1.0.1.0 version of Code Virtualizer. For comments on new versions, see "Hopes for the Future and Acknowledgments”. And I will not threat the 64-bit case.

This article is divided in two parts. Firstly I am going to talk about how the Virtual Machine is generated and why Oreans says that each Virtual Machine has its own characteristics. Secondly I use the concepts described before to explain how the Virtual Opcodes are generated, how they are executed and why they emulate the original code of an application.

Enjoy this article and I hope you learn something reading it.

Contents

1 Introduction
1.1 About Code Virtualizer
1.2 About this article
2 The Virtual Machine - Light VM
2.1 The Virtual Machine itself
2.2 Generating the Virtual Machine
3 The Virtual Opcodes
3.1 Disassembling and "Assembling" again
3.2 Generating and Writing the Virtual Opcodes
3.3 Completing the analysis: why does this really work?
4 Hopes for the Future and Acknowledgments
4.1 Why write this article?
4.2 The general attack approach
4.3 Acknowledgments

DISCLAIMER

ALERT: THIS ARTICLE MUST BE USED ONLY FOR SCIENTIFIC/STUDY PURPOSES. THE AUTHOR OF THIS ARTICLE IS NOT RESPONSIBLE FOR ANY USE OF THE KNOWLEGDE DESCRIBED HERE FOR ILLEGAL PURPOSES. YOU DO ARE ONLY ALLOWED TO READ THIS ARTICLE IF YOU AGREE WITH THIS DISCLAIMER.

1 Introduction

1.1 About Code Virtualizer

Code Virtualizer is a powerful code obfuscation system that helps developers protect their sensitive code areas against Reverse Engineering. Code Virtualizer has been designed to enact high security for your sensitive code while requiring minimal system resources.

Code Virtualizer will convert your original code into Virtual Opcodes that will be only understood by an internal Virtual Machine. Those Virtual Opcodes and the Virtual Machine itself are different for every protected application, avoiding a general attack over Code Virtualizer.

Code Virtualizer can protect your code in any x32 and x64 native PE files, like executable files (EXEs), system services, DLLs, OCXs, ActiveX controls, screen savers and device drivers[1].

1.2 About this article

First of all, I need to say sorry. Probably you will see a lot of mistakes because of my english but I hope you will understand me.

This article aim to explain how Code Virtualizer works. During the last month, I spent all my free time analysing the Code Virtualizer Demo 1.0.1.0 unpacked by softworm[2]. Fortunately, I finished my analysis and I can say that this is the best software I have seen before. Not best in the meaning of protection, but in the meaning of organization. This was the most pleasing software I have analysed.

Three important things to notice are that the description and explanation of the code disassembled by OllyDbg[3] is done in the code execution order. Most things that I am going to say are applicable only for the 1.0.1.0 version of Code Virtualizer. For comments on new versions, see "Hopes for the Future and Acknowledgments”. And I will not threat the 64-bit case.

This article is divided in two parts. Firstly I am going to talk about how the Virtual Machine is generated and why Oreans[4] says that each Virtual Machine has its own characteristics. Secondly I use the concepts described before to explain how the Virtual Opcodes are generated, how they are executed and why they emulate the original code of an application.

Enjoy this article and I hope you learn something reading it.

2 The Virtual Machine - Light VM

2.1 The Virtual Machine itself

I think you have noticed that I called this Virtual Machine as "Light VM”. Actually, not me but Oreans developers did that probably refering to the Themida Virtual Machine.

Basically each Virtual Machine has 150 handlers and a main handler. By handler, I mean a kind of function that will deal with the Virtual Opcodes. In general, they are small (one to six lines of assembly code) and it is really important to understand each one.

Next I will show the first structure that I called Handler_Information and an example of it (figure 1):

  • WORD id // a number that represents the handler

  • DWORD start // the address of the start of this handler in the Code Virtualizer file

  • DWORD end // the address of the end of this handler in the Code Virtualizer file

  • DWORD address // the address of the start of the handler in the protected file

  • WORD order // random number from (0Eh to A4h) that will indicate the place of the handler in the protected file



Figure 1:

Handler_Information structure example

PIC


This structure is the principal one to generate the VM. I will not show you each of the 150 handlers. This is tedious but if you want to study Code Virtualizer deeper, you must read and understand one by one. I will show you just the handler I showed in the figure above (figure 1; id = 0000h; start = 006035F0h; end = 006035F8h) and the main handler (figure 3):



Figure 2:

Handler 0000h

PIC




Figure 3:

Main Handler

PIC


There is a particularity in the main handler: you can see three times the DWORD 11111111h. They are different depending on the protected application. The first DWORD is the address of the seventh line of the main handler in the protected file. The second one is the "image base" of the Virtual Machine. The last DWORD is the total number of handlers in that VM.

2.2 Generating the Virtual Machine

Here I will give arguments to proof after that the phrase "Those Virtual Opcodes and the Virtual Machine itself are different for every protected application, avoiding a general attack over Code Virtualizer." is not a very important feature.

The first step done by Code Virtualizer is to write the main handler. Next the other 150 handlers will be written following the Handler_Information.order sequence from 1Eh to A4h. As Handler_Information.order is randomly generated the result will be a difference sequence of handlers for every protected application (if you want an example, see [5]).

Now I am going to explain how the handler 0000h (see figure 2) is written. The same process occurs for every handler.

The next step is showed by the code below:



Figure 4:

LODS special case

PIC


This piece of code looks for LODS instructions. This is not applicable for the handlers 0154h and 0156h. But why these checks? Well, the LODS instruction in a handler represents the reading of 1, 2 or 4 Virtual Opcodes. And to increase the security, Oreans developers insert random code after the LODS instruction. To do that, they use another structure that I have called Special_Handler. Here you are:

  • WORD Handler_Information.id // see Handler_Information structure

  • BYTE instruction3 // number that says what kind of instruction will be written as the third instruction

  • BYTE instruction2 // number that says what kind of instruction will be written as the second instruction

  • BYTE instruction1 // number that says what kind of instruction will be written as the first instruction

  • BYTE instruction4 // number that says what kind of instruction will be written as the fourth instruction

  • DWORD Random1 // random number that will be part of the instruction 2

  • DWORD Random2 // random number that will be part of the instruction 3


Table 1:

Table of possible random instructions. Each of these instructions can be written in DWORD, WORD or BYTE format using the respective registers ax, bx, al, bl.






instruction1

instruction2

instruction3

instruction4






0

sub eax,ebx

sub eax,Random1

sub eax,Random2

sub ebx,eax






1

add eax,ebx

add eax,Random1

add eax,Random2

add ebx,eax






2

xor eax,ebx

xor eax,Random1

xor eax,Random2

xor ebx,eax









Figure 5:

Example of Special_Handler structure

PIC


So before those operations, the handler 0000h (figure 2) will be like this:



Figure 6:

Handler 0000h before addition of 4 instructions

PIC


The next step is another security feature. Some kind of instructions are mutated by the Oreansf1.F4 function exported by Oreansf1.dll module. This means that the code of each handler will be obfuscated and more, this mutation engine is strictly related to the option Virtual Machine Obfuscation. Actually, this option only changes the complexity of the mutated opcode. This is really something strange because there is no difference if the complexity of the VM is low or highest in a general attack to Code Virtulizer (for more comments, see "Hopes for the Future and Acknowledgments").



Figure 7:

Handler 0000h with mutated opcodes

PIC


Before that, a JUMP to the main handler is written so the next handler will be called.

The next security feature is quite fun to see: all the 150 handlers are mixed randomly!!! For example, a piece of the handler 0161h is followed by a piece of the handler 0001h and the handler 0069h, etc...

So in the end there will be a complete obfuscated, unique and difficult code to be analysed. Really! I do not think so :).

3 The Virtual Opcodes

3.1 Disassembling and "Assembling" again

I know that the things are obscure. You probably still have no idea about how those handlers work but I promise that it will be clear in the section 3.3.

The figure below shows a macro not virtualized. The code that will be virtualized starts at 0040106Eh and ends at 0040107Dh.



Figure 8:

Macro not virtualized

PIC


Next a PUSH 0040108Dh and RET will be added to the original code so the program can continue its execution normally.

After that, the exported function Oreansf1.F1 disassembles the original code as you can see below. It was really a surprise to me when I saw that; I hoped that Code Virtualizer would threat the code through the bytes of the original code not through strings. It uses Delphi functions to threat strings and I think this is not the faster way but for sure it is easier.



Figure 9:

Code disassembled

PIC


Now the function OreansX2dllR.F1 exported by OreansX2dllR.dll will do the principal and most complex work of assemble the assembly code in a Code Virtualizer syntax and generate the most important structure that I have called OreansX2.

OreansX2 structure:

  • DWORD instruction // type of instruction following the Code Virtualizer syntax

  • DWORD sufix // sufix for the instruction

  • DWORD data1 // data for the instruction

  • DWORD data2 // data for the instruction

  • WORD unknown // unknown use


Table 2:

Table of possible instructions for OreansX2 structure



OreansX2.instruction

instruction



00

LOAD



01

STORE



02

MOVE



03

IFJMP



04

EXTRN



05

UNDEF



06

IMULC



07

ADC



08

ADD



09

AND



0A

CMP



0B

OR



0C

SUB



0D

TEST



0E

XOR



0F

MOVZX



10

MOVZX_W



11

LEA



12

INC



13

RCL



14

RCR



15

ROL



16

ROR



17

SAL



18

SAR



19

SHL



1A

SHR



1B

DEC



1C

NOP



1D

MOVSX



1E

MOVSX_W



1F

CLC



20

CLD



21

CLI



22

CMC



23

STC



24

STD



25

STI



26

HLT



27

BT



28

BTC



29

BTR



2A

BTS



2B

SBB



2C

MUL



2D

IMUL



2E

DIV



2F

IDIV



30

BSWAP



31

NEG



32

NOT



33

RET





Table 3:

Table of possible sufixes



OreansX2.sufix

sufix



00



01

ADDR



02

%sADDR, %d



03

%sADDR, %.8x%h



04

BYTE PTR %s[ADDR]



05

WORD PTR %s[ADDR]



06

DWORD PTR %s[ADDR]



07

QWORD PTR %s[ADDR]



08

%sBYTE PTR [%.8x%h]



09

%sWORD PTR [%.8x%h]



0A

%sDWORD PTR [%.8x%h]



0B

%sQWORD PTR [%.8x%h]



0C

ADDR, BYTE PTR %s[%.8x%h]



0D

ADDR, WORD PTR %s[%.8x%h]



0E

ADDR, DWORD PTR %s[%.8x%h]



0F

ADDR, QWORD PTR %s[%.8x%h]



10

%s%d



11

%s%.8x%h



12

reserved



13

reserved



14

reserved



15

reserved



16

reserved



17

reserved



18

BYTE



19

WORD



1A

DWORD



1B

QWORD



1C

reserved



1D

reserved



1E

FLAGS



1F

%s[ADDR]



20

%sBYTE %d



21

%sWORD %d



22

%sDWORD %d



23

%sQWORD %d




As you can see, the syntax is quite logic. It uses XOR, ADD, etc. for well known instructions and obvious names like MOVE, STORE, LOAD for "special" instructions; the sufixes use a single variable ADDR and well known formats like DWORD PTR [ADDR].

I still do not understand completely how those instructions are generated from the original code disassembled but I think that this is not a problem if you do some tests to see the pattern. Next I show you one assembly instruction followed by the equivalent block of Code Virtualizer instructions with their respective OreansX2 structure (see the file [5] for more examples).



Figure 10:

Example of Code Virtualizer syntax

PIC


I do not know if you have noticed it, but the first parameter of the first OreansX2 structure above is 80000002h. 02 means MOVE as you can see in the Table 2, but this 80 means that this instruction has a relative address. That is, the address F0000028h is relative to the image base of the Virtual Machine.

3.2 Generating and Writing the Virtual Opcodes

Having a vector of the OreansX2 structure, now a sequence of operations will be done to reach the next structure that I have called Pre_Handler. The size of this structure is 28h bytes.

  • DWORD counter // counter that is incremented by 0Eh for each Pre_Handler structure

  • DWORD real_opcode_mark // this DWORD is the address of the original opcode in an allocated memory. This is only applicable to the first Code Virtualizer instruction of the block of instructions that represent the original opcode

  • DWORD unknown1 // unknown use

  • DWORD counter_0E // this the Pre_handler.counter plus 0Eh (unknown use)

  • BOOL is_special // True if the original opcode is any kind of call, jump, conditional jump and others. In this case, a special structure will be generated for those instructions

  • BYTE instruction // Same as OreansX2.instruction

  • DWORD sufix // Same as OreansX2.sufix

  • DWORD data1 // Same as OreansX2.data1

  • DWORD data2 // Same as OreansX2.data2

  • WORD unknown2 // Same as OreansX2.unknown

  • 7 bytes unknown

  • BOOL is_relative_address // TRUE if the instruction has a relative address



Figure 11:

Example of Pre_handler structure

PIC


So now the principal structure that is directly related with the Virtual Opcodes generation can be studied. I have called this structure as Handler.

  • WORD handler // this is the principal parameter: it is the the one who will determine what handler must be called. It is equivalent to Handler_Information.id

  • DWORD Pre_Handler_addr // address in memory of the correspondent Pre_Handler structure that generated this Handler structure

  • DWORD memory_opcode // memory address of where the Virtual Opcode represented by this structure will be written

  • BYTE type_of_handler // 0 if the handler does not read Virtual Opcodes through LODS intrscution. 1, 2, 4, 8 if the handler reads 1, 2, 4, 8 Virtual Opcodes

  • BYTE unknown2 // unknown use

  • DWORD data1 // data for the Code Virtualizer instruction (like for example LOAD 18h, data1 will be 18h)

  • DWORD data2 // data for the case of 64-bit Code Virtualizer instrution

  • DWORD file_opcode // address in the protected file of where the Virtual Opcode represented by this structure will be written



Figure 12:

Example of Handler structure

PIC


Each Handler structure can generate 1, 2 or 4 Virtual Opcodes and that is a must to understand how the vector of Handler structures is generated.

This is not so complicated but if I put each case here, this article would be too big. So I will just comment how this works and if you want more details see [6].

Basically each vector of Handler structures starts with the handler 015Bh and ends with the handlers 0161h and 015Ch. The handlers 015Bh and 015CH do not exist actually. They are there just to tell Code Virtualizer that special code must be inserted to handle when the execution of Virtual Opcodes is initiated and when it is finished. This special code will be showed shortly.

Between those handlers the Pre_handler structure is threated like this: if Pre_handler.is_special is TRUE, the handler 0161h will be added to the correspondent Handler structure. After that, a different sequence of Handlers structures is generated for each of the cases: MOVE, LOAD, STORE, SHL, ADD, SUB, IFJMP, RET, UNDEF and default case (for the others Code Virtualizer instructions). You can see more details about those sequences in [6].

Having understood how the vector of Handler structures is generated, you can finally understand the brilliant part of Code Virtualizer: how the Virtual Opcodes are built.

The first thing to say is about when Code Virtualizer finds the handlers 015Bh and 015Ch. There is a pre-built virtualized code (this means that the Code Virtualizer instructions and the others structures are not there) that is responsible to initialize and unitialize the Virtual Machine for example, catching or returning the registers and flags before the protected application executes its Virtual Opcodes.

So now I am going to talk about the generation of Virtual Opcodes given the Handler structure. The first thing that Code Virtualizer does is quite surprising. Using a random number generator, it decides about the execution of a specific CALL. This CALL is responsible to generate "fake" Virtual Opcodes. That is, those Virtual Opcodes are going to be executed but they will not change anything in the program (like a sequence of NOPs) and so they are useful to obsfucate the real Virtual Opcodes. Besides, there are five different sequences of "fake" Virtual Opcodes difficulting even more the analysis of the program. And I say more, the option Virtual Opcode Obfuscation (low, normal, high, highest) is strictly related (I meant only related) with these "fake" Virtual Opcodes. Depending on that option, the chance of the random number generator allow the recursively execution of the specific CALL more than one time can be increased or decreased. So for example, in the middle of the emulation of a instruction, there can be a lot of "fake" Virtual Opcodes. They can increase the size of the Virtual Opcodes by a factor of 3!!!

Unless the "fake" Virtual Opcodes, you can say that the Virtual Opcodes would be identical if you protect an application twice and compare the Virtual Opcodes. What make them different, is a global variable in the Code Virtualizer that I have called key.

So if the handler 0010h must be called, given the Handler_Information.order and the Special_Handler structure (see section 2.1 and 2.2 for the explanation of these structures), the inverse operations of the ones described in Table 1 (that is ADD, SUB, XOR) will be executed to reach the correct Virtual Opcode. The things are a little confusing I think. So let’s clear them.

3.3 Completing the analysis: why does this really work?

The aim of this section is to explain step-by-step the initialization of the Virtual Machine and the execution of the Virtual Opcodes. To do that, I will use a file that I prepare and that does not have fragmented handlers and mutation engine[7].

When the protected application reaches a macro, the code is redirected to a PUSH/JMP sequence in a section created by Code Virtualizer.



Figure 13:

PUSH/JMP example

PIC


The value pushed is the address of the first Virtual Opcode and the jump is to the main handler.



Figure 14:

Virtual Opcodes

PIC




Figure 15:

Main Handler

PIC


The code started at 004072D8h is always called before the execution of every handler. It is responsible to call the handler specified by the Virtual Opcode. The key is initialized with the the address of the first Virtual Opcode and it is stored in the EBX register. The ESI register has the current address of the Virtual Opcode read and the EDI register has the Image base of the Virtual Machine. The stack is used to store values and the EAX register is used for operation like XOR, ADD, etc.

So when the code reaches the address 004072D8h, the registers are like this:



Figure 16:

Register in the Main Handler

PIC


Now the byte 62h is read and after some operation with the key (those random operations explained in the section 2.2; see figure 15), when the code reaches the address 004072E4h, the registers are like this:



Figure 17:

Jumping to handler 2Dh

PIC


As you can see, the key was changed and the ESI register was updated. Now the code jmp dword ptr ds:[edi+eax*4] seems obvious: as EDI has the image base of the Virtual Machine, the EAX value obtained from the Virtual Opcode plus some operation is very important to call the handler if you notice that there is a table of pointers to handlers:



Figure 18:

Piece of table of pointers to handlers

PIC


By now, you know how every handler is called and it is possible to explain why the Virtual Opcodes are unique for every protected application: because of the key. The key is changed a lot of times and it is address depedent. As the Virtual Opcodes depend on the key (see section 3.2 for explanation) and the size of the Virtual Machine is not constant, the Virtual Opcodes are unique.

The first two instructions of the Main Handler (PUSHAD and PUSHFD) push onto the stack the registers EAX, ECX, EDX, EBX, ESP, EBP, ESI, EDI and the Flags. After, the pre-built Virtual Opcodes that I have talked about are responsible to pop those registers in the first 38 bytes of the Virtual Machine. Now a instruction like XOR ECX, ECX will change the value in the address 00407014h. At the end of the execution of the virtualized code, the registers are restored in their correct position allowing the application to continue its execution.



Figure 19:

Virtual Machine registers

PIC


Now that is your time. I will not comment every executed line. I gave you the basis and I hope that the things are more clear now. So trace the example program [7] and understand how the others handlers are executed.

4 Hopes for the Future and Acknowledgments

4.1 Why write this article?

As I said in the disclaimer, the main purpose of this article is to transmit the knowledge that I have learned to you. Besides that, I need to say that I intended to write a tool instead of this article. And I have also started it but as I am not a programmer I saw that with the amount of free time that I will have I would not be able to write this tool.

So what I hope is that someone gets interest in writing this tool (just e-mail me). I can help and even provide source code of what I have coded until now. But be aware that this is not an easy work.

An important thing to say is that this is a very resumed article. I mean there is a lot of details that I omited (no time and so tired now to say them) and others details that I did not notice. If you have any questions or if you saw something wrong in my article or if you wants to improve this article just e-mail me.

And my main hope is to see a similar article about the Themida Virtual Machine. Let me say, this would not be too difficult now before this article mainly because Themida uses the same DLLs as Code Virualizer and because Oreans developers themselves told us that Code Virtualizer is a version of Themida Virtual Machine a little simpler (remember the Light VM).

And a word for Oreans: this is really a great tool to protect sensitive code areas but as you said not 100% safe (there is not anyone 100% safe). I think there is not a similar one in the market too good as this one. Keep your good job improving this software!

4.2 The general attack approach

So here I will comment my ideas about a tool to deal with Code Virtualizer and how to threat new versions. The toll is divided in three parts:

  • Preprocessing

    • Get all information about the file using the PEheader

    • Look for Virtual Machines

      • Fill the VM class with information about the Virtual Machine

      • Identify every handler

    • Look for Macros

      • Get the total size of the Virtual Opcodes

      • Find jumps to the macro. This is the place where the original code was

  • Analysis

    • Find the "fake" Virtual Opcodes and eliminate them

    • Retrieve the Code Virtualizer instructions

    • Analise them and retrieve the original code

  • Posprocessing

    • Save the original code in the correct place

    • Correct the PEheader

    • Save the file

The two most difficult things are to find and identify each handler in the Virtual Machine and to retrieve the original code from the block of Code Virtualizer intructions.

For the first thing I say, you have two options: study the mutation engine and do reverse engineering (very difficult); or as the mutation engine does not mutate all the opcodes I noticed that it is almost 100% possible to find each handler by their not mutated instructions.

For the second thing I say, you have two options: study how the Code Virtualizer instructions are generated from the original disassembled code and do reverse engineering (difficult); or do some tests with differents kind of instructions and see the pattern. By the way, a hint is that a very well recognizable handler is used always for every original instruction: the STORE FLAGS. This makes the work of find the number of original instructions easier.

This tool must support different versions of Code Virtualizer. As the structure of it does not change, you need to adapt a few things for example new handlers, modified handlers, and other things.

A fun example: commands like ADD, XOR, SHL, etc. have in general three handlers; one for the byte operation, one for the word and one for the dword. But when I first saw the three handlers for the SHL instruction I saw something very strange:



Figure 20:

Code Virtualizer bug

PIC


But only in the version 1.2.0.0 we saw: "[!] Fixed Virtualization of "SHL reg16, imm""[8]. Interesting, isn’t it?

4.3 Acknowledgments

I must say a big thanks to people who helped me directly and indirectly to write this article. So here you are:

  • Melvill, Portuogral, forgetoz and Spec0p (CRKTeam): people really important to me. They introduced me to the Reverse Engineering and helped me a lot. This article is especially dedicated to them.

  • softworm: well... what can i say? Without his really good job, this article would not exist.

  • Ricardo Narvaja and CrackSLatinoS: really good tutorials

  • The Reverse Engineering Community (the ones where I am active): CrkPortugal, ARTeam, Unpack.cn, Tuts4you, EXETOOLS

References

[1] Code Virtualizer Help File - Code Virtualizer Help.chm

[2] http://www.unpack.cn/viewthread.php?tid=5802&fpage=1&highlight=code%2Bvirtualizer

[3] OllyDbg v1.10 by Oleh Yuschuk - http://www.ollydbg.de/

[4] http://www.oreans.com/

[5] ../Annex/Example of Code Virtualizer instructions.rtf - this file is included in the file Inside Code Virtializer.rar

[6] ../Annex/Analysis of Code Virtualizer instructions - this folder is included in the file Inside Code Virtializer.rar

[7] ../Annex/handler.exe - this file is included in the file Inside Code Virtializer.rar

[8] http://www.oreans.com/CodeVirtualizerWhatsNew.php

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值