C# to IL 1 Introduction to Microsoft’s IL(MSIL 介绍)

最新推荐文章于 2020-03-29 09:40:32 发布

weixin_34301307

最新推荐文章于 2020-03-29 09:40:32 发布

阅读量139

点赞数 1

原文链接：http://www.cnblogs.com/revoid/p/6685877.html

版权

The code that we write in a programming language like C#, ASP+ or in any other .NET
compatible language is finally converted to either Assembler or Intermediate Language (IL).
Thus, code written in the COBOL Programming Language can be modified in C# and
subsequently(随后) used in ASP+. Therefore, the best way to accentuate(强调) our comprehension(理解) about
the .NET technologies is by understanding IL.
Once you are conversant with(熟悉) IL, you will have no difficulty in understanding the .NET
technologies, since all .NET languages finally compile to it. IL was invented first and it is
programming language neutral. It was then followed by other programming languages like C#,
Visual Basic.NET, ASP.NET, etc.
We shall raise the curtains on(揭开序幕) IL with a significantly small program. Also, we will commence
with(从...开始) the assumption that you are familiar with at least one .NET programming language.

We have written a very small non-working IL program in the il subdirectory and named it as a.
il. How do we assemble it into an executable program? There is no need to fret over(为...着急) this
problem. Microsoft has provided a program called ilasm whose sole task is to create an
executable file from an IL file.

Before you run this command make sure that your path variable is set to the bin sub directory
in the framework. If not, give the command as
　　set path=c:\progra~1\microsoft.net\frameworksdk\bin;%PATH%
Now we use the command as follows:
　　c:\il>ilasm /nologo /quiet a.il

On doing so, the following error is generated:
Source file is ANSI
Error: No entry point declared for executable
***** FAILURE *****

In future, we shall not display the first and the last lines of the output generated by ilasm. We
shall also remove the blank lines between non-blank lines.
In IL, we are permitted to commence a line with or without a dot '.'. Anything that begins with a
dot is a directive to the assembler, asking it to perform some function, such as creating a
function or class etc. Anything that does not start with a '.' is an actual assembler instruction.
The significance of .method is that a function or method called vijay is created and this function
returns void i.e. it does not return any value. The function has been named vijay arbitrarily for
want of any other superior nomenclature.
The assembler was obviously not impressed with this program and thus brandished the
message 'no entry point'. This error message is generated because the IL file can contain
numerous functions, and the assembler has no way of distinguishing as to which of them is to
be executed first.
In IL, the first function to be executed is called the entrypoint function. In C#, the function is
Main. The syntax for a function is the name followed by the familiar pair of round () brackets.
The start point and the end point of the function's code is signified by the curly braces {}

Now no error is generated. The directive entrypoint signifies that the program execution has to
begin from this function. In this case, we have to use this directive notwithstanding the fact
that, this program has only one function. On giving the dir command at the DOS prompt, we
see three files created. a.exe is an executable file which can now be executed to see the output
of the program

Our luck seems to run out when we try to execute the above program because the above runtime error is generated. One probable reason for this could be the poor formation of the
function. Every function should have the instruction 'end of function' incorporated in it. We
obviously overlooked this fact in our haste.

The 'end of function' instruction is called ret. All well formed functions have to end with this
instruction.

On executing the function, we get the same error again. Where could we have faltered this time?

The blunder(大错) was that we forgot to use the mandatory directive called assembly followed by a
name. We have incorporated it in the code above, and have used the name mukhi followed by a
pair of empty curly braces {}. The assembly directive is used to give a name to the program. It is
also called a deployment unit.
The code above is the smallest program that can be assembled without any errors, though it
does not perform anything useful when executed. It does not have any function called Main. It
only has a function called vijay with the entrypoint directive. The program now assembles and
runs with no errors at all.
The concept of assembly is extremely crucial in the .NET world and should be thoroughly
understood. We will explore this directive in the latter half of the chapter.

The cause for the above failure message is that the above program has two functions, vijay and
vijay1, with each containing the .entrypoint directive. As mentioned earlier, this directive
specifies as to which function is to be executed first.
Thus, in functionality, it is akin to the Main function in C#. When C# code gets converted into
IL code, the code contained in the function Main gets converted into a function in IL and
contains the directive .entrypoint. For example, if the first function to be executed in a COBOL
program is called abc, the code generated in IL inserts the .entrypoint directive in this function.
In conventional programming languages, the function to be executed first has to have a specific
name, eg. Main, but in IL, only the .entrypoint directive is required. Therefore, since a program
can have only one starting point, only one function in the IL code is allowed to contain the .
entrypoint directive.
It is pertinent to note that no error message number or explanation is generated, making it
difficult to debug this error.

The .entrypoint directive need not be positioned as the first or last directive in the function. It
has to merely be present in the body of the function, to herald its status as the first function to
be executed. Directives are not assembly instructions and can even be placed after the ret
instruction. To remind you, ret signifies the end of the function code.

We may have a function written in C#, ASP+ or COBOL, but the mechanism for executing this
function in IL is the same. It is as follows:
We have to use the assembler instruction call. The call instruction is to be followed by the
following details in the given sequence:

• return type of the function (void).
• the namespace (System).
• the class (Console).
• the function name (WriteLine()).
The function gets called but does not produce any output. So, we pass a parameter to the
WriteLine function

The above code has a glaring(耀眼的) omission(疏忽). When a function is called in IL, in addition to its return
type, the data type of the parameters that are being passed to the function have to also be
specified. We have already stated that the Writeline function expects a parameter of the class
named System.String, but since no string is passed to the function, it generates a runtime
error.
Thus, there is a significant difference between IL and other programming language when it
comes to calling a function. In IL, when we call a function, we have to specify everything we
know about the function, including its return type and the data types of its parameters. This
ensures that the assembler can authenticate the syntactical propriety of your code, by
conducting appropriate checks at run time.
We shall now see how to facilitate passing of parameters to a function

The assembler instruction ldstr places a string on the stack. The name ldstr is an abbreviated
version of the text "load a string on the stack". A stack is an area of memory that facilitates
passing of parameters to a function. All functions receive their parameters from the stack.
Thus, instructions like ldstr are indispensable(不可缺少的).

We have added some attributes to the method vijay. We shall explain them one by one below.
public: This is called an accessibility attribute as it decides as to who all can access a method.
Public means that this method is accessible to every other part of the program.
hidebysig: A class can be derived from many other classes. The attribute hidebysig ensures that
a function in a parent class is hidden from the derived class having the same name or
signature. In this example, it makes sure that if the function vijay is present in the base class,
it is not visible in the derived class.
static: Methods can either be static or non-static. A static method belongs to a class and not to
an instance. Thus, as we have only a single class, we cannot have more than one copy of a
static function. There are no restrictions on where a static method can be created. The function
with the entrypoint directive must be static. Static functions must have a body or source code
associated with them and they are referenced using the type name and not the instance name.
il managed: Due to its complex nature, we shall postpone the explanation of this attribute.
When the time is appropriate, its functionality will be clearly explained.
The abovementioned attributes do not modify the output of the function. In a short while, it will
become apparent to you as to why we have provided the explanation of these attributes.
Whenever we write a program in the C# programming language, we first specify the keyword
class, followed by the name of the class and then, we enclose the source code within a pair of
curly braces {}. This is demonstrated in a.cs

Let us now introduce the IL directive called class.

Notice the change in assembler output : Class 1 Methods: 1;

The directive .class is followed by the name of the class. It is optional in IL. Let us enhance the
functionality of the class by adding a few class attributes.

We have added three attributes to our class directive:
• private: This signifies that access to the members of the class is restricted to the current
class only.
• auto: This means that the layout of the class in memory will be decided only at runtime,
and not by our program.
• ansi: The source code is generally divided into two main categories:
- Managed Code
- Unmanaged Code
Code written in languages like C is called unmanaged code or untrustworthy code. We need an
attribute that handles interoperability between unmanaged code and managed code. For
example, this attribute can be put to use when we want to transfer strings between managed
and unmanaged code

If we cross the bounds of managed code and vault into the realm of unmanaged code, a string,
which is an array of 2-byte Unicode characters, will be converted into an ANSI string, which is
an array of 1-byte ANSI characters and vice versa. The modifier ansi is used for smooth
transition between managed and unmanaged code

The class zzz has been derived from the class System.Object. In the .NET world, in order to
maintain type consistency, all types are ultimately derived form System.Object. Thus, all
objects have a common base class of Object. In IL, classes are derived from other classes in the
same manner as incorporated in programming languages like C++, C# and Java.

You are bound to wonder as to why we have written such an ungainly program. You need to
exercise a little patience before the mist clears and it all starts to make sense. We shall explain
the newly introduced functions and attributes one by one:
.ctor: We have introduced a new function called .ctor which calls the WriteLine function to
display hell1, but it does not get called. .ctor refers to the constructor.
rtspecialname: This attribute signifies to the runtime that the name of the function is special
and it is to be treated in a special manner.
specialname: This attribute alerts the compilers and tools that the function is special. The
runtime may choose to ignore this attribute.
instance: A normal function is called an instance function. Such a function is associated with
an object, unlike a static method, which is associated with a class.
The reason for choosing the specified name for the function will become apparent in due course.
ldarg.0: This is an assembler instruction which loads either the this pointer or the address of
the ZEROth parameter on the execution stack. We shall explain ldarg.0 in detail subsequently.
mscorlib: In the program above, the function .ctor is being called from the base class System.
Object. The name of the function is normally prefixed with the name of the library that contains
the code. This library name is placed within square brackets. In this case, it is optional because
mscorlib.dll is the default library and it contains most of the classes that .NET requires.
.maxstack: This directive specifies the maximum number of elements that can be present on
the evaluation stack when a method is being executed.
.module: All IL files must be part and parcel of a logical entity called a module. The file is added
to a module using the .module directive. The name of the module may be stated as aa.exe, but
the name of the executable file remains the same as before, i.e. a.exe.
.subsystem: This directive is used to specify the operating system on which the executable will

run. This is another way of specifying the kind of executable the assembly is representing.
Some of the numeric values and their corresponding Operating Systems are as follows:
2 - A Windows Character Subsystem.
3 - A Windows GUI Subsystem.
5 - An older operating system called OS/2.
.corsflags: This directive is used to specify flags that are unique to a 64 bit computer. A value of
1 indicates that it is an executable created from il and a value of 4 signifies a library.
.assembly: We very briefly(短暂地) touched upon a directive called .assembly a couple of pages earlier.
Lets delve(专研) a little deeper now.
Whatever we create is part of an entity called a manifest. The .assembly directive marks the
beginning of a manifest. In the hierarchy, the module is the next smaller entity to a manifest.
The .assembly directive specifies the assembly to which this module belongs. A module can only
contain a single .assembly directive.
The presence of this directive is mandatory for exe files but is optional for modules in a .dll.
This is because this directive is needed to create an assembly for us. It is a basic requirement of
the .NET world. An assembly directive contains other directives.
.hash: Hashing is a common technique used in the computer world and there are a large
number of hashing methods or algorithms used. This directive is used for hashing.
.ver: The .ver directive consists of 4 numbers separated by a colons. They represent the
following information in the order given below:
• major version number
• minor version number
• build
• revision number
extern: If there is a requirement to refer to other assemblies, the extern directive is used. The
code of the core .NET classes is in mscorlib.dll. Besides this dll, when our program needs to
refer to code from a large number of other dlls, the extern directive comes into play.
originator: This is the last directive that we shall explore before we move on to explain the
essence and significance of the above example. This directive discloses the identity of the
creator of the dll. It contains eight bytes of the public key of the owner of the dll. It is obviously
a hash value.
Let us revise(复习) what we have done so far, step by step via a different approach:
(a) We started with the smallest C# program that we could write. This program was called a.cs
and contained the following code:

(b) Then we ran the C# compiler using the following command:

Therefore, the exe file called a.exe got created.
(c) On the executable, we ran a program called ildasm, provided by Microsoft:

This created a text file a.txt with the following contents:

When you read the above file, you will realize that all of it has been explained earlier. We
started out with a simple C# program and then compiled it into an executable file. Under
normal circumstances(环境), it would have got converted into machine language or the assembler of
the computer/microprocessor that the program is running on. Once the executable is created,
we disassemble it using ildasm. The disassembled output is saved in a new file a.txt. This file
could be named as a.il and we could have then reversed gear by running ilasm on it to create
the executable again.

 1 //  Microsoft (R) .NET Framework IL Disassembler.  Version 4.6.1055.0
 2 
 3 
 4 
 5 
 6 // Metadata version: v4.0.30319
 7 .assembly extern mscorlib
 8 {
 9   .publickeytoken = (B7 7A 5C 56 19 34 E0 89 )                         // .z\V.4..
10   .ver 4:0:0:0
11 }
12 .assembly a
13 {
14   .custom instance void [mscorlib]System.Runtime.CompilerServices.CompilationRelaxationsAttribute::.ctor(int32) = ( 01 00 08 00 00 00 00 00 ) 
15   .custom instance void [mscorlib]System.Runtime.CompilerServices.RuntimeCompatibilityAttribute::.ctor() = ( 01 00 01 00 54 02 16 57 72 61 70 4E 6F 6E 45 78   // ....T..WrapNonEx
16                                                                                                              63 65 70 74 69 6F 6E 54 68 72 6F 77 73 01 )       // ceptionThrows.
17 
18   // --- 下列自定义特性会自动添加，不要取消注释 -------
19   //  .custom instance void [mscorlib]System.Diagnostics.DebuggableAttribute::.ctor(valuetype [mscorlib]System.Diagnostics.DebuggableAttribute/DebuggingModes) = ( 01 00 07 01 00 00 00 00 ) 
20 
21   .hash algorithm 0x00008004
22   .ver 0:0:0:0
23 }
24 .module a.exe
25 // MVID: {D65B3A6D-7D07-4C89-AB25-0B869EAF338C}
26 .imagebase 0x00400000
27 .file alignment 0x00000200
28 .stackreserve 0x00100000
29 .subsystem 0x0003       // WINDOWS_CUI
30 .corflags 0x00000001    //  ILONLY
31 // Image base: 0x02780000
32 
33 
34 // =============== CLASS MEMBERS DECLARATION ===================
35 
36 .class private auto ansi beforefieldinit zzz
37        extends [mscorlib]System.Object
38 {
39   .method public hidebysig static void  Main() cil managed
40   {
41     .entrypoint
42     // 代码大小       13 (0xd)
43     .maxstack  8
44     IL_0000:  nop
45     IL_0001:  ldstr      "hi"
46     IL_0006:  call       void [mscorlib]System.Console::WriteLine(string)
47     IL_000b:  nop
48     IL_000c:  ret
49   } // end of method zzz::Main
50 
51   .method public hidebysig specialname rtspecialname 
52           instance void  .ctor() cil managed
53   {
54     // 代码大小       8 (0x8)
55     .maxstack  8
56     IL_0000:  ldarg.0
57     IL_0001:  call       instance void [mscorlib]System.Object::.ctor()
58     IL_0006:  nop
59     IL_0007:  ret
60   } // end of method zzz::.ctor
61 
62 } // end of class zzz
63 
64 
65 // =============================================================
66 
67 // *********** 反汇编完成 ***********************
68 // 警告: 创建了 Win32 资源文件 a.res

Let us take a look at the smallest VB.NET program. We have named it as one.vb and its source
code is as follows:

After writing the above code, we run the Visual.Net compiler, vbc. as:

This produces the file one.exe.
Next we execute ildasm as follows:

This produces the following file a.txt:

You would be amazed to see that the outputs produced by two different compilers are almost
identical. We have shown you this example to demonstrate that, irrespective of the language
you use, ultimately, the source code will get converted to IL code. Whether we use VB.NET or
C#, the same WriteLine function gets called.
Thus, the differences between programming languages has now become a superficial issue. The
endless debate over which language is superior has finally been put to rest. Thus, IL has
created a situation where programmers are free to use the programming language of their
choice.
Let us now demystify the code given above.
Every VB.NET program needs to be included into a module. We’ve called it modmain. All
modules in Visual Basic have to end with the keyword End, hence we see End Module. This is
where the syntax of VB differs that from C#, which does not understand modules.
In VB.NET, functions are known as sub-routines. We need a sub-routine to mark the starting
point of program execution. This sub-routine is called Main.
The VB.NET code not only does it refer to mscorlib.dll, but also uses the file Microsoft.
VisualBasic.
A class called _vbProject is created in IL; as the class name is not mandatory in VB.
The function called _main is the starting sub-routine to be called as it has the entrypoint
directive. Its name is preceded by a leading underscore. These names are chosen by the VB
compiler that generates the IL code.
This function is passed an array of strings as a parameter. It has a custom directive that deals
with the concept of metadata.
Next, we have the full prototype of the function, ending with an optional series of bytes. These
bytes are part of the metadata specifications.
The module modmain gets converted into a class having the same name. This class also has the
same directive .custom as before and a function called Main. The function uses a directive
called .locals to create a variable on the stack that can only be used within the method. This
variable exists only for the duration of the execution of the method and dies when the method
stops running.

Fields are also stored in memory but, it takes a longer time to allocate memory for them. The
word init signifies that on creation, these variables should be initialized to their default values.
The default values depend upon the type of the variable. Numbers are always initialized to the
value ZERO. The word init is followed by the data type of the variable and finally by its name.

转载于:https://www.cnblogs.com/revoid/p/6685877.html

weixin_34301307

关注

1
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
C# to IL 1 Introduction to Microsoft’s IL(MSIL 介绍)

The code that we write in a programming language like C#, ASP+ or in any other .NETcompatible language is finally converted to either Assembler or Intermediate Language (IL).Thus, code written in the ...
复制链接

扫一扫