iczelion pe tut1

pe 专栏收录该内容
13 篇文章 0 订阅


Tutorial 1: Overview of PE file format

This is the complete rewrite of the old PE tutorial no1 which I considered the worst tutorial I have ever written. So I decided to replace it with this new one.

PE stands for Portable Executable. It's the native file format of Win32. Its specification is derived somewhat from the Unix Coff (common object file format). The meaning of "portable executable" is that the file format is universal across win32 platform: the PE loader of every win32 platform recognizes and uses this file format even when Windows is running on CPU platforms other than Intel. It doesn't mean your PE executables would be able to port to other CPU platforms without change. Every win32 executable (except VxDs and 16-bit Dlls) uses PE file format. Even NT's kernel mode drivers use PE file format. Thus studying the PE file format gives you valuable insights into the structure of Windows.

Let's jump into the general outline of PE file format without further ado.

DOS MZ header
DOS stub
PE header
Section table
Section 1
Section 2
Section ...
Section n

The above picture is the general layout of a PE file. All PE files (even 32-bit DLLs) must start with a simple DOS MZ header. We usually aren't interested in this structure much. It's provided in the case when the program is run from DOS, so DOS can recognize it as a valid executable and can thus run the DOS stub which is stored next to the MZ header. The DOS stub is actually a valid EXE that is executed in case the operating system doesn't know about PE file format. It can simply display a string like "This program requires Windows" or it can be a full-blown DOS program depending on the intent of the programmer. We are also not very interested in DOS stub: it's usually provided by the assembler/compiler. In most case, it simply uses int 21h, service 9 to print a string saying "This program cannot run in DOS mode".

After the DOS stub comes the PE header. The PE header is a general term for the PE-related structure named IMAGE_NT_HEADERS. This structure contains many essential fields that are used by the PE loader. We will be quite familiar with it as you know more about PE file format. In the case the program is executed in the operating system that knows about PE file format, the PE loader can find the starting offset of the PE header from the DOS MZ header. Thus it can skip the DOS stub and go directly to the PE header which is the real file header.

The real content of the PE file is divided into blocks called sections. A section is nothing more than a block of data with common attributes such as code/data, read/write etc. You can think of a PE file as a logical disk. The PE header is the boot sector and the sections are files in the disk. The files can have different attributes such as read-only, system, hidden, archive and so on. I want to make it clear from this point onwards that the grouping of data into a section is done on the common attribute basis: not on logical basis. It doesn't matter how the code/data are used , if the data/code in the PE file have the same attribute, they can be lumped together in a section. You should not think of a section as "data", "code" or some other logical concepts: sections can contain both code and data provided that they have the same attribute. If you have a block of data that you want to be read-only, you can put that data in the section that is marked as read-only. When the PE loader maps the sections into memory, it examines the attributes of the sections and gives the memory block occupied by the sections the indicated attributes.

If we view the PE file format as a logical disk, the PE header as the boot sector and the sections as files, we still don't have enough information to find out where the files reside on the disk, ie. we haven't discussed the directory equivalent of the PE file format. Immediately following the PE header is the section table which is an array of structures. Each structure contains the information about each section in the PE file such as its attribute, the file offset, virtual offset. If there are 5 sections in the PE file, there will be exactly 5 members in this structure array. We can then view the section table as the root directory of the logical disk. Each member of the array is equvalent to the each directory entry in the root directory.

That's all about the physical layout of the PE file format. I'll summarize the major steps in loading a PE file into memory below:

  1. When the PE file is run, the PE loader examines the DOS MZ header for the offset of the PE header. If found, it skips to the PE header.
  2. The PE loader checks if the PE header is valid. If so, it goes to the end of the PE header.
  3. Immediately following the PE header is the section table. The PE header reads information about the sections and maps those sections into memory using file mapping. It also gives each section the attributes as specified in the section table.
  4. After the PE file is mapped into memory, the PE loader concerns itself with the logical parts of the PE file, such as the import table.

The above steps are oversimplification and are based on my own observation. There may be some inaccuracies but it should give you the clear picture of the process.

You should download LUEVELSMEYER's description about PE file format. It's very detailed and you should keep it as a reference.

[Iczelion's Win32 Assembly Homepage]

  • 0
  • 0
  • 0
  • 一键三连
  • 扫一扫,分享海报

<p> <span style="font-size:14px;color:#337FE5;">【为什么学爬虫?】</span> </p> <p> <span style="font-size:14px;">       1、爬虫入手容易,但是深入较难,如何写出高效率的爬虫,如何写出灵活性高可扩展的爬虫都是一项技术活。另外在爬虫过程中,经常容易遇到被反爬虫,比如字体反爬、IP识别、验证码等,如何层层攻克难点拿到想要的数据,这门课程,你都能学到!</span> </p> <p> <span style="font-size:14px;">       2、如果是作为一个其他行业的开发者,比如app开发,web开发,学习爬虫能让你加强对技术的认知,能够开发出更加安全的软件和网站</span> </p> <p> <br /> </p> <span style="font-size:14px;color:#337FE5;">【课程设计】</span> <p class="ql-long-10663260"> <span> </span> </p> <p class="ql-long-26664262" style="font-size:11pt;color:#494949;"> 一个完整的爬虫程序,无论大小,总体来说可以分成三个步骤,分别是: </p> <ol> <li class="" style="font-size:11pt;color:#494949;"> 网络请求:模拟浏览器的行为从网上抓取数据。 </li> <li class="" style="font-size:11pt;color:#494949;"> 数据解析:将请求下来的数据进行过滤,提取我们想要的数据。 </li> <li class="" style="font-size:11pt;color:#494949;"> 数据存储:将提取到的数据存储到硬盘或者内存中。比如用mysql数据库或者redis等。 </li> </ol> <p class="ql-long-26664262" style="font-size:11pt;color:#494949;"> 那么本课程也是按照这几个步骤循序渐进的进行讲解,带领学生完整的掌握每个步骤的技术。另外,因为爬虫的多样性,在爬取的过程中可能会发生被反爬、效率低下等。因此我们又增加了两个章节用来提高爬虫程序的灵活性,分别是: </p> <ol> <li class="" style="font-size:11pt;color:#494949;"> 爬虫进阶:包括IP代理,多线程爬虫,图形验证码识别、JS加密解密、动态网页爬虫、字体反爬识别等。 </li> <li class="" style="font-size:11pt;color:#494949;"> Scrapy和分布式爬虫:Scrapy框架、Scrapy-redis组件、分布式爬虫等。 </li> </ol> <p class="ql-long-26664262" style="font-size:11pt;color:#494949;"> 通过爬虫进阶的知识点我们能应付大量的反爬网站,而Scrapy框架作为一个专业的爬虫框架,使用他可以快速提高我们编写爬虫程序的效率和速度。另外如果一台机器不能满足你的需求,我们可以用分布式爬虫让多台机器帮助你快速爬取数据。 </p> <p style="font-size:11pt;color:#494949;">   </p> <p class="ql-long-26664262" style="font-size:11pt;color:#494949;"> 从基础爬虫到商业化应用爬虫,本套课程满足您的所有需求! </p> <p class="ql-long-26664262" style="font-size:11pt;color:#494949;"> <br /> </p> <p> <br /> </p> <p> <span style="font-size:14px;background-color:#FFFFFF;color:#337FE5;">【课程服务】</span> </p> <p> <span style="font-size:14px;">专属付费社群+定期答疑</span> </p> <p> <br /> </p> <p class="ql-long-24357476"> <span style="font-size:16px;"><br /> </span> </p> <p> <br /> </p> <p class="ql-long-24357476"> <span style="font-size:16px;"></span> </p>
©️2020 CSDN 皮肤主题: 大白 设计师:CSDN官方博客 返回首页
钱包余额 0