Injecting techniques

最新推荐文章于 2022-08-11 16:46:40 发布

trents

最新推荐文章于 2022-08-11 16:46:40 发布

阅读量1.5k

点赞数

文章标签： windows application function hook dll system

图片显示有问题，

原文链接：http://www.360doc.com/content/09/0122/14/98883_2383062.shtml

Injecting techniques

Registry
In order to inject a DLL into processes that link with USER32.DLL, you simply can add the DLL name to the value of the following registry key:

HKEY_LOCAL_MACHINE\Software\Microsoft\Windows NT\CurrentVersion\Windows\AppInit_DLLs

Its value contains a single DLL name or group of DLLs separated either by comma or spaces. According to MSDN documentation [7], all DLLs specified by the value of that key are loaded by each Windows-based application running within the current logon session. It is interesting that the actual loading of these DLLs occurs as a part of USER32's initialization. USER32 reads the value of mentioned registry key and calls LoadLibrary() for these DLLs in its DllMain code. However this trick applies only to applications that use USER32.DLL. Another restriction is that this built-in mechanism is supported only by NT and 2K operating systems. Although it is a harmless way to inject a DLL into a Windows processes there are few shortcomings:
- In order to activate/deactivate the injection process you have to reboot Windows.
- The DLL you want to inject will be mapped only into these processes that use USER32.DLL, thus you cannot expect to get your hook injected into console applications, since they usually don't import functions from USER32.DLL.
- On the other hand you don't have any control over the injection process. It means that it is implanted into every single GUI application, regardless you want it or not. It is a redundant overhead especially if you intend to hook few applications only. For more details see [2] "Injecting a DLL Using the Registry"
System-wide Windows Hooks
Certainly a very popular technique for injecting DLL into a targeted process relies on provided by Windows Hooks. As pointed out in MSDN a hook is a trap in the system message-handling mechanism. An application can install a custom filter function to monitor the message traffic in the system and process certain types of messages before they reach the target window procedure.
A hook is normally implemented in a DLL in order to meet the basic requirement for system-wide hooks. The basic concept of that sort of hooks is that the hook callback procedure is executed in the address spaces of each hooked up process in the system. To install a hook you call SetWindowsHookEx() with the appropriate parameters. Once the application installs a system-wide hook, the operating system maps the DLL into the address space in each of its client processes. Therefore global variables within the DLL will be "per-process" and cannot be shared among the processes that have loaded the hook DLL. All variables that contain shared data must be placed in a shared data section. The diagram bellow shows an example of a hook registered by Hook Server and injected into the address spaces named "Application one" and "Application two".

Figure 1

A system-wide hook is registered just ones when SetWindowsHookEx() is executed. If no error occurs a handle to the hook is returned. The returned value is required at the end of the custom hook function when a call to CallNextHookEx() has to be made. After a successful call to SetWindowsHookEx() , the operating system injects the DLL automatically (but not necessary immediately) into all processes that meet the requirements for this particular hook filter. Let's have a closer look at the following dummy WH_GETMESSAGE filter function:

Collapse
```
//---------------------------------------------------------------------------
    
    // GetMsgProc
    
    //
    
    // Filter function for the WH_GETMESSAGE - it's just a dummy function
    
    //---------------------------------------------------------------------------
    
    LRESULT CALLBACK GetMsgProc(
    int code,       // hook code
    
    WPARAM wParam,  // removal option
    
    LPARAM lParam   // message
    
    )
    {
    // We must pass the all messages on to CallNextHookEx.
    
    return ::CallNextHookEx(sg_hGetMsgHook, code, wParam, lParam);
    }
    
```
A system-wide hook is loaded by multiple processes that don't share the same address space.

For instance hook handle sg_hGetMsgHook, that is obtained by SetWindowsHookEx() and is used as parameter in CallNextHookEx() must be used virtually in all address spaces. It means that its value must be shared among hooked processes as well as the Hook Server application. In order to make this variable "visible" to all processes we should store it in the shared data section.

The following is an example of employing #pragma data_seg(). Here I would like to mention that the data within the shared section must be initialized, otherwise the variables will be assigned to the default data segment and #pragma data_seg() will have no effect.

Collapse
```
//---------------------------------------------------------------------------
    
    // Shared by all processes variables
    
    //---------------------------------------------------------------------------
    
    #pragma data_seg(".HKT")
    HHOOK sg_hGetMsgHook       = NULL;
    BOOL  sg_bHookInstalled    = FALSE;
    // We get this from the application who calls SetWindowsHookEx()'s wrapper
    
    HWND  sg_hwndServer        = NULL;
    #pragma data_seg()
```
You should add a SECTIONS statement to the DLL's DEF file as well
Collapse
```
SECTIONS
    .HKT   Read Write Shared
```
or use
Collapse
```
#pragma comment(linker, "/section:.HKT, rws")
```
Once a hook DLL is loaded into the address space of the targeted process, there is no way to unload it unless the Hook Server calls UnhookWindowsHookEx() or the hooked application shuts down. When the Hook Server calls UnhookWindowsHookEx() the operating system loops through an internal list with all processes which have been forced to load the hook DLL. The operating system decrements the DLL's lock count and when it becomes 0, the DLL is automatically unmapped from the process's address space.

Here are some of the advantages of this approach:
- This mechanism is supported by NT/2K and 9x Windows family and hopefully will be maintained by future Windows versions as well.
- Unlike the registry mechanism of injecting DLLs this method allows DLL to be unloaded when Hook Server decides that DLL is no longer needed and makes a call to UnhookWindowsHookEx()
Although I consider Windows Hooks as very handy injection technique, it comes with its own disadvantages:
- Windows Hooks can degrade significantly the entire performance of the system, because they increase the amount of processing the system must perform for each message.
- It requires lot of efforts to debug system-wide Windows Hooks. However if you use more than one instance of VC++ running in the same time, it would simplify the debugging process for more complex scenarios.
- Last but not least, this kind of hooks affect the processing of the whole system and under certain circumstances (say a bug) you must reboot your machine in order to recover it.
Injecting DLL by using CreateRemoteThread() API function
Well, this is my favorite one. Unfortunately it is supported only by NT and Windows 2K operating systems. It is bizarre, that you are allowed to call (link with) this API on Win 9x as well, but it just returns NULL without doing anything.
Injecting DLLs by remote threads is Jeffrey Ritcher's idea and is well documented in his article [9] "Load Your 32-bit DLL into Another Process's Address Space Using INJLIB".

The basic concept is quite simple, but very elegant. Any process can load a DLL dynamically using LoadLibrary() API. The issue is how do we force an external process to call LoadLibrary() on our behalf, if we don't have any access to process's threads? Well, there is a function, called CreateRemoteThread() that addresses creating a remote thread. Here comes the trick - have a look at the signature of thread function, whose pointer is passed as parameter (i.e. LPTHREAD_START_ROUTINE) to the CreateRemoteThread():

Collapse
```
DWORD WINAPI ThreadProc(LPVOID lpParameter);
```
And here is the prototype of LoadLibrary API
Collapse
```
HMODULE WINAPI LoadLibrary(LPCTSTR lpFileName);
```
Yes, they do have "identical" pattern. They use the same calling convention WINAPI, they both accept one parameter and the size of returned value is the same. This match gives us a hint that we can use LoadLibrary() as thread function, which will be executed after the remote thread has been created. Let's have a look at the following sample code:
Collapse
```
hThread = ::CreateRemoteThread(
    hProcessForHooking,
    NULL,
    0,
    pfnLoadLibrary,
    "C:\\HookTool.dll",
    0,
    NULL);
    
```
By using GetProcAddress() API we get the address of the LoadLibrary() API. The dodgy thing here is that Kernel32.DLL is mapped always to the same address space of each process, thus the address of LoadLibrary() function has the same value in address space of any running process. This ensures that we pass a valid pointer (i.e. pfnLoadLibrary) as parameter of CreateRemoteThread().
As parameter of the thread function we use the full path name of the DLL, casting it to LPVOID. When the remote thread is resumed, it passes the name of the DLL to the ThreadFunction (i.e. LoadLibrary). That's the whole trick with regard to using remote threads for injection purposes.

There is an important thing we should consider, if implanting through CreateRemoteThread() API. Every time before the injector application operate on the virtual memory of the targeted process and makes a call to CreateRemoteThread(), it first opens the process using OpenProcess() API and passes PROCESS_ALL_ACCESS flag as parameter. This flag is used when we want to get maximum access rights to this process. In this scenario OpenProcess() will return NULL for some of the processes with low ID number. This error (although we use a valid process ID) is caused by not running under security context that has enough permissions. If you think for a moment about it, you will realize that it makes perfect sense. All those restricted processes are part of the operating system and a normal application shouldn't be allowed to operate on them. What would happen if some application has a bug and accidentally attempts to terminate an operating system's process? To prevent the operating system from that kind of eventual crashes, it is required that a given application must have sufficient privileges to execute APIs that might alter operating system behavior. To get access to the system resources (e.g. smss.exe, winlogon.exe, services.exe, etc) through OpenProcess() invocation, you must be granted the debug privilege. This ability is extremely powerful and offers a way to access the system resources, that are normally restricted. Adjusting the process privileges is a trivial task and can be described with the following logical operations:
- Open the process token with permissions needed to adjust privileges
- Given a privilege's name "SeDebugPrivilege", we should locate its local LUID mapping. The privileges are specified by name and can be found in Platform SDK file winnt.h
- Adjust the token in order to enable the "SeDebugPrivilege" privilege by calling AdjustTokenPrivileges() API
- Close obtained by OpenProcessToken() process token handle
For more details about changing privileges see [10] "Using privilege".
Implanting through BHO add-ins
Sometimes you will need to inject a custom code inside Internet Explorer only. Fortunately Microsoft provides an easy and well documented way for this purpose - Browser Helper Objects. A BHO is implemented as COM DLL and once it is properly registered, each time when IE is launched it loads all COM components that have implemented IObjectWithSite interface.
MS Office add-ins
Similarly, to the BHOs, if you need to implant in MS Office applications code of your own, you can take the advantage of provided standard mechanism by implementing MS Office add-ins. There are many available samples that show how to implement this kind of add-ins.

Interception mechanisms

Injecting a DLL into the address space of an externalprocess is a key element of a spying system. It provides an excellentopportunity to have a control over process's thread activities. However it isnot sufficient to have the DLL injected if you want to intercept API functioncalls within the process.

This part of the articleintends to make a brief review of several available real-world hooking aspects.It focuses on the basic outline for each one of them, exposing their advantagesand disadvantages.

In terms of the level where thehook is applied, there are two mechanisms for API spying - Kernel level andUser level spying. To get better understanding of these two levels you must beaware of the relationship between the Win32 subsystem API and the Native API.Following figure demonstrates where the different hooks are set and illustratesthe module relationships and their dependencies on Windows 2K:

Figure 2

The major implementation difference between them is thatinterceptor engine for kernel-level hooking is wrapped up as a kernel-modedriver, whereas user-level hooking usually employs user-mode DLL.

NT Kernel level hooking
There are several methods for achieving hooking of NT system services in kernel mode. The most popular interception mechanism was originally demonstrated by Mark Russinovich and Bryce Cogswell in their article [3] "Windows NT System-Call Hooking". Their basic idea is to inject an interception mechanism for monitoring NT system calls just bellow the user mode. This technique is very powerful and provides an extremely flexible method for hooking the point that all user-mode threads pass through before they are serviced by the OS kernel.
You can find an excellent design and implementation in "Undocumented Windows 2000 Secrets" as well. In his great book Sven Schreiber explains how to build a kernel-level hooking framework from scratch [5].

Another comprehensive analysis and brilliant implementation has been provided by Prasad Dabak in his book "Undocumented Windows NT" [17].

However, all these hooking strategies, remain out of the scope of this article.
Win32 User level hooking
1. Windows subclassing.
  This method is suitable for situations where the application's behavior might be changed by new implementation of the window procedure. To accomplish this task you simply call SetWindowLongPtr() with GWLP_WNDPROC parameter and pass the pointer to your own window procedure. Once you have the new subclass procedure set up, every time when Windows dispatches a message to a specified window, it looks for the address of the window's procedure associated with the particular window and calls your procedure instead of the original one.
  The drawback of this mechanism is that subclassing is available only within the boundaries of a specific process. In other words an application should not subclass a window class created by another process.
  
  Usually this approach is applicable when you hook an application through add-in (i.e. DLL / In-Proc COM component) and you can obtain the handle to the window whose procedure you would like to replace.
  
  For example, some time ago I wrote a simple add-in for IE (Browser Helper Object) that replaces the original pop-up menu provided by IE using subclassing.
2. Proxy DLL (Trojan DLL)
  An easy way for hacking API is just to replace a DLL with one that has the same name and exports all the symbols of the original one. This technique can be effortlessly implemented using function forwarders. A function forwarder basically is an entry in the DLL's export section that delegates a function call to another DLL's function.
  You can accomplish this task by simply using #pragma comment:
  
  Collapse
```
#pragma comment(linker, "/export:DoSomething=DllImpl.ActuallyDoSomething")
```
  However, if you decide to employ this method, you should take the responsibility of providing compatibilities with newer versions of the original library. For more details see [13a] section "Export forwarding" and [2] "Function Forwarders".
3. Code overwriting
  There are several methods that are based on code overwriting. One of them changes the address of the function used by CALL instruction. This method is difficult, and error prone. The basic idea beneath is to track down all CALL instructions in the memory and replace the addresses of the original function with user supplied one.
  Another method of code overwriting requires a more complicated implementation. Briefly, the concept of this approach is to locate the address of the original API function and to change first few bytes of this function with a JMP instruction that redirects the call to the custom supplied API function. This method is extremely tricky and involves a sequence of restoring and hooking operations for each individual call. It's important to point out that if the function is in unhooked mode and another call is made during that stage, the system won't be able to capture that second call.
  The major problem is that it contradicts with the rules of a multithreaded environment.
  
  However, there is a smart solution that solves some of the issues and provides a sophisticated way for achieving most of the goals of an API interceptor. In case you are interested you might peek at [12] Detours implementation.
4. Spying by a debugger
  An alternative to hooking API functions is to place a debugging breakpoint into the target function. However there are several drawbacks for this method. The major issue with this approach is that debugging exceptions suspend all application threads. It requires also a debugger process that will handle this exception. Another problem is caused by the fact that when the debugger terminates, the debugger is automatically shut down by Windows.
5. Spying by altering of the Import Address Table
  This technique was originally published by Matt Pietrek and than elaborated by Jeffrey Ritcher ([2] "API Hooking by Manipulating a Module's Import Section") and John Robbins ([4] "Hooking Imported Functions"). It is very robust, simple and quite easy to implement. It also meets most of the requirements of a hooking framework that targets Windows NT/2K and 9x operating systems. The concept of this technique relies on the elegant structure of the Portable Executable (PE) Windows file format. To understand how this method works, you should be familiar with some of the basics behind PE file format, which is an extension of Common Object File Format (COFF). Matt Pietrek reveals the PE format in details in his wonderful articles - [6] "Peering Inside the PE.", and [13a/b] "An In-Depth Look into the Win32 PE file format". I will give you a brief overview of the PE specification, just enough to get the idea of hooking by manipulation of the Import Address Table.
  In general an PE binary file is organized, so that it has all code and data sections in a layout that conform to the virtual memory representation of an executable. PE file format is composed of several logical sections. Each of them maintains specific type of data and addresses particular needs of the OS loader.
  
  The section .idata, I would like to focus your attention on, contains information about Import Address Table. This part of the PE structure is particularly very crucial for building a spy program based on altering IAT.
  Each executable that conforms with PE format has layout roughly described by the figure below.
  
  Figure 3
  
  The program loader is responsible for loading an application along with all its linked DLLs into the memory. Since the address where each DLL is loaded into, cannot be known in advance, the loader is not able to determine the actual address of each imported function. The loader must perform some extra work to ensure that the program will call successfully each imported function. But going through each executable image in the memory and fixing up the addresses of all imported functions one by one would take unreasonable amount of processing time and cause huge performance degradation. So, how does the loader resolves this challenge? The key point is that each call to an imported function must be dispatched to the same address, where the function code resides into the memory. Each call to an imported function is in fact an indirect call, routed through IAT by an indirect JMP instruction. The benefit of this design is that the loader doesn't have to search through the whole image of the file. The solution appears to be quite simple - it just fixes-up the addresses of all imports inside the IAT. Here is an example of a snapshot PE File structure of a simple Win32 Application, taken with the help of the [8] PEView utility. As you can see TestApp import table contains two imported by GDI32.DLL function - TextOutA() and GetStockObject().
  
  Figure 4
  
  Actually the hooking process of an imported function is not that complex as it looks at first sight. In a nutshell an interception system that uses IAT patching has to discover the location that holds the address of imported function and replace it with the address of an user supplied function by overwriting it. An important requirement is that the newly provided function must have exactly the same signature as the original one. Here are the logical steps of a replacing cycle:
  - Locate the import section from the IAT of each loaded by the process DLL module as well as the process itself
  - Find the IMAGE_IMPORT_DESCRIPTOR chunk of the DLL that exports that function. Practically speaking, usually we search this entry by the name of the DLL
  - Locate the IMAGE_THUNK_DATA which holds the original address of the imported function
  - Replace the function address with the user supplied one
  By changing the address of the imported function inside the IAT, we ensure that all calls to the hooked function will be re-routed to the function interceptor.
  Replacing the pointer inside the IAT is that .idata section doesn't necessarily have to be a writable section. This requires that we must ensure that .idata section can be modified. This task can be accomplished by using VirtualProtect() API.
  
  Another issue that deserves attention is related to the GetProcAddress() API behavior on Windows 9x system. When an application calls this API outside the debugger it returns a pointer to the function. However if you call this function within from the debugger it actually returns different address than it would when the call is made outside the debugger. It is caused by the fact that that inside the debugger each call to GetProcAddress() returns a wrapper to the real pointer. Returned by GetProcAddress() value points to PUSH instruction followed by the actual address. This means that on Windows 9x when we loop through the thunks, we must check whether the address of examined function is a PUSH instruction (0x68 on x86 platforms) and accordingly get the proper value of the address function.
  
  Windows 9x doesn't implement copy-on-write, thus operating system attempts to keep away the debuggers from stepping into functions above the 2-GB frontier. That is the reason why GetProcAddress() returns a debug thunk instead of the actual address. John Robbins discusses this problem in [4] "Hooking Imported Functions".

Figuring out when to inject the hook DLL

That section reveals some challenges that are facedby developers when the selected injection mechanism is not part of theoperating system's functionality. For example, performing the injection is not your concern when youuse built-in Windows Hooks in order to implant a DLL. It is an OS's responsibilityto force each of those running processes that meet the requirements forthis particular hook, to load the DLL [18]. In fact Windows keeps track of allnewly launched processes and forces them to load the hook DLL. Managinginjection through registry is quite similar to Windows Hooks. The biggest advantage ofall those "built-in" methods is that they come as part of the OS.

Unlike the discussed above implanting techniques, to injectby CreateRemoteThread()requires maintenance of all currently running processes. If the injecting is made not ontime, this can cause the Hook System to miss some of the calls it claims as intercepted.It is crucial that the Hook Server application implements a smart mechanism forreceiving notifications each time when a new process starts or shuts down.One of the suggested methods in this case, is to intercept CreateProcess() API family functions and monitor alltheir invocations. Thus when an user supplied function is called, it can callthe original CreateProcess() with dwCreationFlags OR-ed with CREATE_SUSPENDED flag. This means that the primarythread of the targeted application will be in suspended state, and the HookServer will have the opportunity to inject the DLL by hand-coded machineinstructions and resume the application using ResumeThread() API. For moredetails you might refer to [2] "Injecting Code with CreateProcess()".

The second methodof detecting process execution, is based on implementing a simple device driver.It offers the greatest flexibility and deserves even more attention. WindowsNT/2K provides a special function PsSetCreateProcessNotifyRoutine() exported byNTOSKRNL. This function allows adding a callback function, that is calledwhenever a process is created or deleted. For more details see [11] and [15]from the reference section.

Enumerating processes and modules

Sometimes we would prefer to use injecting of the DLL byCreateRemoteThread() API, especially whenthe system runs under NT/2K. In this case when the Hook Server is started it mustenumerate all active processes and inject the DLL into their address spaces.Windows 9x and Windows 2K provide a built-in implementation (i.e. implemented byKernel32.dll) of Tool Help Library. On the other hand Windows NT uses for thesame purpose PSAPI library. We need a way to allow the Hook Server to run andthen to detect dynamically which process "helper" is available. Thus the systemcan determine which the supported library is, and accordingly to use theappropriate APIs.

I will present an object-orientedarchitecture that implements a simple framework for retrieving processes andmodules under NT/2K and 9x [16]. The design of my classes allows extending theframework according to your specific needs. The implementation itself is prettystraightforward.

CTaskManager implements the system's processor. Itis responsible for creating an instance of a specific library handler (i.e.CPsapiHandler or CToolhelpHandler) that is able to employ the correctprocess information provider library (i.e. PSAPI or ToolHelp32 respectively).CTaskManager is in charge of creating andmarinating a container object that keeps a list with all currently activeprocesses. After instantiating of the CTaskManager object the application calls Populate() method. It forces enumerating of allprocesses and DLL libraries and storing them into a hierarchy kept by CTaskManager's member m_pProcesses.

Following UMLdiagram shows the class relationships of this subsystem:

Figure 5

It is important to highlight the fact that NT'sKernel32.dll doesn't implement any of the ToolHelp32 functions. Therefore wemust link them explicitly, using runtime dynamic linking. If we use staticlinking the code will fail to load on NT, regardless whether or not theapplication has attempted to execute any of those functions. For more detailssee my article "Singleinterface for enumerating processes and modules under NT and Win9x/2K.".

Requirements of the Hook Tool System

Now that I've made a brief introduction to the various concepts of thehooking process it's time to determine the basic requirements and explore the design of aparticular hooking system. These are some of the issues addressed by the Hook Tool System:

Provide a user-level hooking system for spying any Win32 API functions imported by name
Provide the abilities to inject hook driver into all running processes by Windows hooks as well as CreateRemoteThread() API. The framework should offer an ability to set this up by an INI file
Employ an interception mechanism based on the altering Import Address Table
Present an object-oriented reusable and extensible layered architecture
Offer an efficient and scalable mechanism for hooking API functions
Meet performance requirements
Provide a reliable communication mechanism for transferring data between the driver and the server
Implement custom supplied versions of TextOutA/W() and ExitProcess() API functions
Log events to a file
The system is implemented for x86 machines running Windows 9x, Me, NT or Windows 2K operating system

Design and implementation

This part of the article discusses the key components of theframework and how do they interact each other. This outfit is capable to captureany kind of WINAPI imported by namefunctions.

Before I outline thesystem's design, I would like to focus your attention on several methods for injectingand hooking.

First and foremost, it is necessary to selectan implanting method that will meet the requirements for injecting the DLLdriver into all processes. So I designed an abstract approach with twoinjecting techniques, each of them applied accordingly to the settings inthe INI file and the type of the operating system (i.e. NT/2K or 9x). They are- System-wide Windows Hooks and CreateRemoteThread()method. The sample framework offers the ability to inject the DLL onNT/2K by Windows Hooks as well as to implant by CreateRemoteThread() means. This can be determinedby an option in the INI file that holds all settings of the system.

Another crucial moment is thechoice of the hooking mechanism. Not surprisingly, I decided to apply altering IAT asan extremely robust method for Win32 API spying.

Toachieve desired goals I designed a simple framework composed of the followingcomponents and files:

TestApp.exe - a simple Win32 test application that just outputs a text using TextOut() API. The purpose of this app is to show how it gets hooked up.
HookSrv.exe - control program
HookTool .DLL - spy library implemented as Win32 DLL
HookTool.ini - a configuration file
NTProcDrv.sys - a tiny Windows NT/2K kernel-mode driver for monitoring process creation and termination. This component is optional and addresses the problem with detection of process execution under NT based systems only.

HookSrv is a simple control program. Its main role isto load the HookTool.DLL and then to activate the spying engine. After loadingthe DLL, the Hook Server calls InstallHook()function and passes a handle to a hidden windows where the DLL should post allmessages to.

HookTool.DLL is the hook driver andthe heart of presented spying system. It implements the actual interceptorand provides three user supplied functions TextOutA/W() and ExitProcess() functions.

Although the article emphasizes on Windows internals and thereis no need for it to be object-oriented, I decided to encapsulate relatedactivities in reusable C++ classes. This approach provides more flexibility andenables the system to be extended. It also benefits developers with the abilityto use individual classes outside this project.

Following UML class diagram illustrates the relationshipsbetween set of classes used in HookTool.DLL's implementation.

Figure 6

In this section of the article I would like to draw yourattention to the class design of the HookTool.DLL. Assigning responsibilities tothe classes is an important part of the development process. Each of thepresented classes wraps up a specific functionality and represents a particularlogical entity.

CModuleScope is the main doorway of the system. Itis implemented using "Singleton" pattern and works in a thread-safe manner. Itsconstructor accepts 3 pointers to the data declared in the shared segment, thatwill be used by all processes. By this means the values of those system-widevariables can be maintained very easily inside the class, keeping the rule forencapsulation.

When an application loads the HookToollibrary, the DLL creates one instance of CModuleScope on receiving DLL_PROCESS_ATTACH notification. This step just initializes the only instance of CModuleScope. An important piece of the CModuleScopeobject construction is the creation of an appropriate injector object.The decision which injector to use will be made after parsing theHookTool.ini file and determining the value of UseWindowsHookparameter under [Scope] section. In case that the system is runningunder Windows 9x, the value of this parameter won't be examined by thesystem, because Window 9x doesn't support injecting by remote threads.

After instantiating of the main processor object, a call to ManageModuleEnlistment() method will be made.Here is a simplified version of its implementation:

Collapse

// Called on DLL_PROCESS_ATTACH DLL notification

BOOL CModuleScope::ManageModuleEnlistment()
{
BOOL bResult = FALSE;
// Check if it is the hook server

if (FALSE == *m_pbHookInstalled)
{
// Set the flag, thus we will know that the server has been installed

*m_pbHookInstalled = TRUE;
// and return success error code

bResult = TRUE;
}
// and any other process should be examined whether it should be

// hooked up by the DLL

else
{
bResult = m_pInjector->IsProcessForHooking(m_szProcessName);
if (bResult)
InitializeHookManagement();
}
return bResult;
}

The implementation of the method ManageModuleEnlistment() is straightforward and examines whether the call has been made by the Hook Server, inspecting the value m_pbHookInstalled points to. If an invocation has been initiated by the Hook Server, it just sets up indirectly the flag sg_bHookInstalled to TRUE. It tells that the Hook Server has been started.

The next action taken by the Hook Server is to activate the engine through a single call to InstallHook() DLL exported function. Actually its call is delegated to a method of CModuleScope - InstallHookMethod(). The main purpose of this function is to force targeted for hooking processes to load or unload the HookTool.DLL.

Collapse

 // Activate/Deactivate hooking

engine BOOL	CModuleScope::InstallHookMethod(BOOL bActivate, HWND hWndServer)
{
BOOL bResult;
if (bActivate)
{
*m_phwndServer = hWndServer;
bResult = m_pInjector->InjectModuleIntoAllProcesses();
}
else
{
m_pInjector->EjectModuleFromAllProcesses();
*m_phwndServer = NULL;
bResult = TRUE;
}
return bResult;
}

HookTool.DLL provides two mechanisms for self injecting into theaddress space of an external process - one that uses Windows Hooks andanother that employs injecting of DLL by CreateRemoteThread() API. The architecture of the system defines an abstract class CInjector that exposes pure virtual functions for injecting and ejecting DLL. The classes CWinHookInjector and CRemThreadInjector inherit from the same base - CInjector class. However they provide different realization of the pure virtual methods InjectModuleIntoAllProcesses() and EjectModuleFromAllProcesses(), defined in CInjector interface.

CWinHookInjector class implements Windows Hooks injecting mechanism. It installs a filter function by the following call

Collapse

// Inject the DLL into all running processes

BOOL CWinHookInjector::InjectModuleIntoAllProcesses()
{
*sm_pHook = ::SetWindowsHookEx(
WH_GETMESSAGE,
(HOOKPROC)(GetMsgProc),
ModuleFromAddress(GetMsgProc),
0
);
return (NULL != *sm_pHook);
}

As you can see it makes a request to the system for registering WH_GETMESSAGE hook. The server executes this method only once. The last parameter of SetWindowsHookEx() is 0, because GetMsgProc()is designed to operate as a system-wide hook. The callback functionwill be invoked by the system each time when a window is about toprocess a particular message. It is interesting that we have to providea nearly dummy implementation of the GetMsgProc()callback, since we don't intend to monitor the message processing. Wesupply this implementation only in order to get free injectionmechanism provided by the operating system.

After making the call to SetWindowsHookEx(), OS checks whether the DLL (i.e. HookTool.DLL) that exports GetMsgProc()has been already mapped in all GUI processes. If the DLL hasn't beenloaded yet, Windows forces those GUI processes to map it. Aninteresting fact is, that a system-wide hook DLL should not return FALSE in its DllMain(). That's because the operating system validates DllMain()'s return value and keeps trying to load this DLL until its DllMain() finally returns TRUE.

A quite different approach is demonstrated by the CRemThreadInjector class. Here the implementation is based on injecting the DLL using remote threads. CRemThreadInjectorextends the maintenance of the Windows processes by providing means forreceiving notifications of process creation and termination. It holdsan instance of CNtInjectorThread class that observes the process execution. CNtInjectorThread object takes care for getting notifications from the kernel-mode driver. Thus each time when a process is created a call to CNtInjectorThread ::OnCreateProcess() is issued, accordingly when the process exits CNtInjectorThread ::OnTerminateProcess()is automatically called. Unlike the Windows Hooks, the method thatrelies on remote thread, requires manual injection each time when a newprocess is created. Monitoring process activities will provide us witha simple technique for alerting when a new process starts.

CNtDriverController class implements a wrapper around APIfunctions for administering services and drivers. It is designed tohandle the loading and unloading of the kernel-mode driverNTProcDrv.sys. Its implementation will be discussed later.

After a successful injection of HookTool.DLL into a particular process, a call to ManageModuleEnlistment() method is issued inside the DllMain(). Recall the method's implementation that I described earlier. It examines the shared variable sg_bHookInstalled through the CModuleScope 's member m_pbHookInstalled. Since the server's initialization had already set the value of sg_bHookInstalled to TRUE,the system checks whether this application must be hooked up and if so,it actually activates the spy engine for this particular process.

Turning the hacking engine on, takes place in the CModuleScope::InitializeHookManagement()'s implementation. The idea of this method is to install hooks for some vital functions as LoadLibrary() API family as well as GetProcAddress().By this means we can monitor loading of DLLs after the initializationprocess. Each time when a new DLL is about to be mapped it is necessaryto fix-up its import table, thus we ensure that the system won't missany call to the captured function.

At the end of the InitializeHookManagement() method we provide initializations for the function we actually want to spy on.

Since the sample code demonstrates capturing of more than one usersupplied functions, we must provide a single implementation for eachindividual hooked function. This means that using this approach youcannot just change the addresses inside IAT of the different importedfunctions to point to a single "generic" interception function. Thespying function needs to know which function this call comes to. It isalso crucial that the signature of the interception routine must beexactly the same as the original WINAPI function prototype, otherwise the stack will be corrupted. For example CModuleScope implements three static methods MyTextOutA(),MyTextOutW() and MyExitProcess().Once the HookTool.DLL is loaded into the address space of a process andthe spying engine is activated, each time when a call to the original TextOutA() is issued, CModuleScope:: MyTextOutA() gets called instead.

Proposed design of the spying engine itself is quite efficient andoffers great flexibility. However, it is suitable mostly for scenarioswhere the set of functions for interception is known in advance andtheir number is limited.

If you want to add new hooks to the system you simply declare and implement the interception function as I did with MyTextOutA/W() and MyExitProcess(). Then you have to register it in the way shown by InitializeHookManagement() implementation.

Intercepting and tracing process execution is a very useful mechanismfor implementing systems that require manipulations of externalprocesses. Notifying interested parties upon starting of a newprocesses is a classic problem of developing process monitoring systemsand system-wide hooks. The Win32 API provides a set of great libraries(PSAPI and ToolHelp [16]) that allow you to enumerate processescurrently running in the system. Although these APIs are extremelypowerful they don't permit you to get notifications when a new processstarts or ends up. Luckily, NT/2K provides a set of APIs, documented inWindows DDK documentation as "Process Structure Routines" exported byNTOSKRNL. One of these APIs PsSetCreateProcessNotifyRoutine()offers the ability to register system-wide callback function which iscalled by OS each time when a new process starts, exits or has beenterminated. The mentioned API can be employed as a simple way to fortracking down processes simply by implementing a NT kernel-mode driverand a user mode Win32 control application. The role of the driver is todetect process execution and notify the control program about theseevents. The implementation of the Windows process's observer NTProcDrvprovides a minimal set of functionalities required for processmonitoring under NT based systems. For more details see articles [11]and [15]. The code of the driver can be located in the NTProcDrv.cfile. Since the user mode implementation installs and uninstalls thedriver dynamically the currently logged-on user must have administratorprivileges. Otherwise you won't be able to install the driver and itwill disturb the process of monitoring. A way around is to manuallyinstall the driver as an administrator or run HookSrv.exe using offeredby Windows 2K "Run as different user" option.

Last butnot least, the provided tools can be administered by simply changing the settings of an INI file (i.e. HookTool.ini). This file determines whether to use Windows hooks (for 9x and NT/2K) or CreateRemoteThread()(only under NT/2K) for injecting. It also offers a way to specify whichprocess must be hooked up and which shouldn't be intercepted. If youwould like to monitor the process there is an option (Enabled) undersection [Trace] that allows to log system activities. This optionallows you to report rich error information using the methods exposedby CLogFile class. In fact ClogFile provides thread-safe implementationand you don't have to take care about synchronization issues related toaccessing shared system resources (i.e. the log file). For more detailssee CLogFile and content of HookTool.ini file.

Sample code

The project compiles with VC6++ SP4 and requires Platform SDK. In aproduction Windows NT environment you need to provide PSAPI.DLL inorder to use provided CTaskManager implementation.

Before you run thesample code make sure that all the settings in HookTool.ini file havebeen set according to your specific needs.

For those that will like the lower-level stuff and are interested infurther development of the kernel-mode driver NTProcDrv code, they mustinstall Windows DDK.

Out of the scope

For the sake of simplicity these are some of the subjects I intentionally left out of the scope of this article:

Monitoring Native API calls
A driver for monitoring process execution on Windows 9x systems.
UNICODE support, although you can still hook UNICODE imported APIs

Conclusion

This article by far doesn't provide a complete guide for theunlimited API hooking subject and without any doubt it misses somedetails. However I tried to fit in this few pages just enough importantinformation that might help those who are interested in user mode Win32API spying.

References

[1] "Windows 95 System Programming Secrets", Matt Pietrek
[2] "Programming Application forMS Windows" , Jeffrey Richter
[3] "Windows NT System-CallHooking", Mark Russinovich and Bryce Cogswell, Dr.Dobb's Journal January 1997
[4] "Debugging applications" , John Robbins
[5] "Undocumented Windows 2000Secrets" , Sven Schreiber
[6] "PeeringInside the PE: A Tour of the Win32 Portable Executable File Format"by Matt Pietrek, March 1994
[7] MSDN Knowledge base Q197571
[8]PEview Version 0.67, Wayne J. Radburn
[9] "Load Your 32-bit DLL into Another Process's Address Space Using INJLIB" MSJ May 1994
[10] "Programming WindowsSecurity", Keith Brown
[11] "Detecting Windows NT/2Kprocess execution" Ivo Ivanov, 2002
[12] "Detours" Galen Hunt and Doug Brubacher
[13a] "An In-DepthLook into the Win32 PE file format", part 1, Matt Pietrek, MSJ February 2002
[13b] "An In-DepthLook into the Win32 PE file format", part 2, Matt Pietrek, MSJ March 2002
[14] "Inside MS Windows 2000 ThirdEdition" , David Solomon and Mark Russinovich
[15] "Nerditorium",James Finnegan, MSJ January 1999
[16] "Single interface forenumerating processes and modules under NT and Win9x/2K.", Ivo Ivanov, 2001
[17] "Undocumented Windows NT", Prasad Dabak, Sandeep Phadke and Milind Borate
[18] Platform SDK: Windows User Interface, Hooks

Injecting techniques

Registry
In order to inject a DLL into processes that link with USER32.DLL, you simply can add the DLL name to the value of the following registry key:

HKEY_LOCAL_MACHINE\Software\Microsoft\Windows NT\CurrentVersion\Windows\AppInit_DLLs

Its value contains a single DLL name or group of DLLs separated either by comma or spaces. According to MSDN documentation [7], all DLLs specified by the value of that key are loaded by each Windows-based application running within the current logon session. It is interesting that the actual loading of these DLLs occurs as a part of USER32's initialization. USER32 reads the value of mentioned registry key and calls LoadLibrary() for these DLLs in its DllMain code. However this trick applies only to applications that use USER32.DLL. Another restriction is that this built-in mechanism is supported only by NT and 2K operating systems. Although it is a harmless way to inject a DLL into a Windows processes there are few shortcomings:
- In order to activate/deactivate the injection process you have to reboot Windows.
- The DLL you want to inject will be mapped only into these processes that use USER32.DLL, thus you cannot expect to get your hook injected into console applications, since they usually don't import functions from USER32.DLL.
- On the other hand you don't have any control over the injection process. It means that it is implanted into every single GUI application, regardless you want it or not. It is a redundant overhead especially if you intend to hook few applications only. For more details see [2] "Injecting a DLL Using the Registry"
System-wide Windows Hooks
Certainly a very popular technique for injecting DLL into a targeted process relies on provided by Windows Hooks. As pointed out in MSDN a hook is a trap in the system message-handling mechanism. An application can install a custom filter function to monitor the message traffic in the system and process certain types of messages before they reach the target window procedure.
A hook is normally implemented in a DLL in order to meet the basic requirement for system-wide hooks. The basic concept of that sort of hooks is that the hook callback procedure is executed in the address spaces of each hooked up process in the system. To install a hook you call SetWindowsHookEx() with the appropriate parameters. Once the application installs a system-wide hook, the operating system maps the DLL into the address space in each of its client processes. Therefore global variables within the DLL will be "per-process" and cannot be shared among the processes that have loaded the hook DLL. All variables that contain shared data must be placed in a shared data section. The diagram bellow shows an example of a hook registered by Hook Server and injected into the address spaces named "Application one" and "Application two".

Figure 1

A system-wide hook is registered just ones when SetWindowsHookEx() is executed. If no error occurs a handle to the hook is returned. The returned value is required at the end of the custom hook function when a call to CallNextHookEx() has to be made. After a successful call to SetWindowsHookEx() , the operating system injects the DLL automatically (but not necessary immediately) into all processes that meet the requirements for this particular hook filter. Let's have a closer look at the following dummy WH_GETMESSAGE filter function:

Collapse
```
//---------------------------------------------------------------------------
    
    // GetMsgProc
    
    //
    
    // Filter function for the WH_GETMESSAGE - it's just a dummy function
    
    //---------------------------------------------------------------------------
    
    LRESULT CALLBACK GetMsgProc(
    int code,       // hook code
    
    WPARAM wParam,  // removal option
    
    LPARAM lParam   // message
    
    )
    {
    // We must pass the all messages on to CallNextHookEx.
    
    return ::CallNextHookEx(sg_hGetMsgHook, code, wParam, lParam);
    }
    
```
A system-wide hook is loaded by multiple processes that don't share the same address space.

For instance hook handle sg_hGetMsgHook, that is obtained by SetWindowsHookEx() and is used as parameter in CallNextHookEx() must be used virtually in all address spaces. It means that its value must be shared among hooked processes as well as the Hook Server application. In order to make this variable "visible" to all processes we should store it in the shared data section.

The following is an example of employing #pragma data_seg(). Here I would like to mention that the data within the shared section must be initialized, otherwise the variables will be assigned to the default data segment and #pragma data_seg() will have no effect.

Collapse
```
//---------------------------------------------------------------------------
    
    // Shared by all processes variables
    
    //---------------------------------------------------------------------------
    
    #pragma data_seg(".HKT")
    HHOOK sg_hGetMsgHook       = NULL;
    BOOL  sg_bHookInstalled    = FALSE;
    // We get this from the application who calls SetWindowsHookEx()'s wrapper
    
    HWND  sg_hwndServer        = NULL;
    #pragma data_seg()
```
You should add a SECTIONS statement to the DLL's DEF file as well
Collapse
```
SECTIONS
    .HKT   Read Write Shared
```
or use
Collapse
```
#pragma comment(linker, "/section:.HKT, rws")
```
Once a hook DLL is loaded into the address space of the targeted process, there is no way to unload it unless the Hook Server calls UnhookWindowsHookEx() or the hooked application shuts down. When the Hook Server calls UnhookWindowsHookEx() the operating system loops through an internal list with all processes which have been forced to load the hook DLL. The operating system decrements the DLL's lock count and when it becomes 0, the DLL is automatically unmapped from the process's address space.

Here are some of the advantages of this approach:
- This mechanism is supported by NT/2K and 9x Windows family and hopefully will be maintained by future Windows versions as well.
- Unlike the registry mechanism of injecting DLLs this method allows DLL to be unloaded when Hook Server decides that DLL is no longer needed and makes a call to UnhookWindowsHookEx()
Although I consider Windows Hooks as very handy injection technique, it comes with its own disadvantages:
- Windows Hooks can degrade significantly the entire performance of the system, because they increase the amount of processing the system must perform for each message.
- It requires lot of efforts to debug system-wide Windows Hooks. However if you use more than one instance of VC++ running in the same time, it would simplify the debugging process for more complex scenarios.
- Last but not least, this kind of hooks affect the processing of the whole system and under certain circumstances (say a bug) you must reboot your machine in order to recover it.
Injecting DLL by using CreateRemoteThread() API function
Well, this is my favorite one. Unfortunately it is supported only by NT and Windows 2K operating systems. It is bizarre, that you are allowed to call (link with) this API on Win 9x as well, but it just returns NULL without doing anything.
Injecting DLLs by remote threads is Jeffrey Ritcher's idea and is well documented in his article [9] "Load Your 32-bit DLL into Another Process's Address Space Using INJLIB".

The basic concept is quite simple, but very elegant. Any process can load a DLL dynamically using LoadLibrary() API. The issue is how do we force an external process to call LoadLibrary() on our behalf, if we don't have any access to process's threads? Well, there is a function, called CreateRemoteThread() that addresses creating a remote thread. Here comes the trick - have a look at the signature of thread function, whose pointer is passed as parameter (i.e. LPTHREAD_START_ROUTINE) to the CreateRemoteThread():

Collapse
```
DWORD WINAPI ThreadProc(LPVOID lpParameter);
```
And here is the prototype of LoadLibrary API
Collapse
```
HMODULE WINAPI LoadLibrary(LPCTSTR lpFileName);
```
Yes, they do have "identical" pattern. They use the same calling convention WINAPI, they both accept one parameter and the size of returned value is the same. This match gives us a hint that we can use LoadLibrary() as thread function, which will be executed after the remote thread has been created. Let's have a look at the following sample code:
Collapse
```
hThread = ::CreateRemoteThread(
    hProcessForHooking,
    NULL,
    0,
    pfnLoadLibrary,
    "C:\\HookTool.dll",
    0,
    NULL);
    
```
By using GetProcAddress() API we get the address of the LoadLibrary() API. The dodgy thing here is that Kernel32.DLL is mapped always to the same address space of each process, thus the address of LoadLibrary() function has the same value in address space of any running process. This ensures that we pass a valid pointer (i.e. pfnLoadLibrary) as parameter of CreateRemoteThread().
As parameter of the thread function we use the full path name of the DLL, casting it to LPVOID. When the remote thread is resumed, it passes the name of the DLL to the ThreadFunction (i.e. LoadLibrary). That's the whole trick with regard to using remote threads for injection purposes.

There is an important thing we should consider, if implanting through CreateRemoteThread() API. Every time before the injector application operate on the virtual memory of the targeted process and makes a call to CreateRemoteThread(), it first opens the process using OpenProcess() API and passes PROCESS_ALL_ACCESS flag as parameter. This flag is used when we want to get maximum access rights to this process. In this scenario OpenProcess() will return NULL for some of the processes with low ID number. This error (although we use a valid process ID) is caused by not running under security context that has enough permissions. If you think for a moment about it, you will realize that it makes perfect sense. All those restricted processes are part of the operating system and a normal application shouldn't be allowed to operate on them. What would happen if some application has a bug and accidentally attempts to terminate an operating system's process? To prevent the operating system from that kind of eventual crashes, it is required that a given application must have sufficient privileges to execute APIs that might alter operating system behavior. To get access to the system resources (e.g. smss.exe, winlogon.exe, services.exe, etc) through OpenProcess() invocation, you must be granted the debug privilege. This ability is extremely powerful and offers a way to access the system resources, that are normally restricted. Adjusting the process privileges is a trivial task and can be described with the following logical operations:
- Open the process token with permissions needed to adjust privileges
- Given a privilege's name "SeDebugPrivilege", we should locate its local LUID mapping. The privileges are specified by name and can be found in Platform SDK file winnt.h
- Adjust the token in order to enable the "SeDebugPrivilege" privilege by calling AdjustTokenPrivileges() API
- Close obtained by OpenProcessToken() process token handle
For more details about changing privileges see [10] "Using privilege".
Implanting through BHO add-ins
Sometimes you will need to inject a custom code inside Internet Explorer only. Fortunately Microsoft provides an easy and well documented way for this purpose - Browser Helper Objects. A BHO is implemented as COM DLL and once it is properly registered, each time when IE is launched it loads all COM components that have implemented IObjectWithSite interface.
MS Office add-ins
Similarly, to the BHOs, if you need to implant in MS Office applications code of your own, you can take the advantage of provided standard mechanism by implementing MS Office add-ins. There are many available samples that show how to implement this kind of add-ins.

Interception mechanisms

Figure 2

The major implementation difference between them is thatinterceptor engine for kernel-level hooking is wrapped up as a kernel-modedriver, whereas user-level hooking usually employs user-mode DLL.

NT Kernel level hooking
There are several methods for achieving hooking of NT system services in kernel mode. The most popular interception mechanism was originally demonstrated by Mark Russinovich and Bryce Cogswell in their article [3] "Windows NT System-Call Hooking". Their basic idea is to inject an interception mechanism for monitoring NT system calls just bellow the user mode. This technique is very powerful and provides an extremely flexible method for hooking the point that all user-mode threads pass through before they are serviced by the OS kernel.
You can find an excellent design and implementation in "Undocumented Windows 2000 Secrets" as well. In his great book Sven Schreiber explains how to build a kernel-level hooking framework from scratch [5].

Another comprehensive analysis and brilliant implementation has been provided by Prasad Dabak in his book "Undocumented Windows NT" [17].

However, all these hooking strategies, remain out of the scope of this article.
Win32 User level hooking
1. Windows subclassing.
  This method is suitable for situations where the application's behavior might be changed by new implementation of the window procedure. To accomplish this task you simply call SetWindowLongPtr() with GWLP_WNDPROC parameter and pass the pointer to your own window procedure. Once you have the new subclass procedure set up, every time when Windows dispatches a message to a specified window, it looks for the address of the window's procedure associated with the particular window and calls your procedure instead of the original one.
  The drawback of this mechanism is that subclassing is available only within the boundaries of a specific process. In other words an application should not subclass a window class created by another process.
  
  Usually this approach is applicable when you hook an application through add-in (i.e. DLL / In-Proc COM component) and you can obtain the handle to the window whose procedure you would like to replace.
  
  For example, some time ago I wrote a simple add-in for IE (Browser Helper Object) that replaces the original pop-up menu provided by IE using subclassing.
2. Proxy DLL (Trojan DLL)
  An easy way for hacking API is just to replace a DLL with one that has the same name and exports all the symbols of the original one. This technique can be effortlessly implemented using function forwarders. A function forwarder basically is an entry in the DLL's export section that delegates a function call to another DLL's function.
  You can accomplish this task by simply using #pragma comment:
  
  Collapse
```
#pragma comment(linker, "/export:DoSomething=DllImpl.ActuallyDoSomething")
```
  However, if you decide to employ this method, you should take the responsibility of providing compatibilities with newer versions of the original library. For more details see [13a] section "Export forwarding" and [2] "Function Forwarders".
3. Code overwriting
  There are several methods that are based on code overwriting. One of them changes the address of the function used by CALL instruction. This method is difficult, and error prone. The basic idea beneath is to track down all CALL instructions in the memory and replace the addresses of the original function with user supplied one.
  Another method of code overwriting requires a more complicated implementation. Briefly, the concept of this approach is to locate the address of the original API function and to change first few bytes of this function with a JMP instruction that redirects the call to the custom supplied API function. This method is extremely tricky and involves a sequence of restoring and hooking operations for each individual call. It's important to point out that if the function is in unhooked mode and another call is made during that stage, the system won't be able to capture that second call.
  The major problem is that it contradicts with the rules of a multithreaded environment.
  
  However, there is a smart solution that solves some of the issues and provides a sophisticated way for achieving most of the goals of an API interceptor. In case you are interested you might peek at [12] Detours implementation.
4. Spying by a debugger
  An alternative to hooking API functions is to place a debugging breakpoint into the target function. However there are several drawbacks for this method. The major issue with this approach is that debugging exceptions suspend all application threads. It requires also a debugger process that will handle this exception. Another problem is caused by the fact that when the debugger terminates, the debugger is automatically shut down by Windows.
5. Spying by altering of the Import Address Table
  This technique was originally published by Matt Pietrek and than elaborated by Jeffrey Ritcher ([2] "API Hooking by Manipulating a Module's Import Section") and John Robbins ([4] "Hooking Imported Functions"). It is very robust, simple and quite easy to implement. It also meets most of the requirements of a hooking framework that targets Windows NT/2K and 9x operating systems. The concept of this technique relies on the elegant structure of the Portable Executable (PE) Windows file format. To understand how this method works, you should be familiar with some of the basics behind PE file format, which is an extension of Common Object File Format (COFF). Matt Pietrek reveals the PE format in details in his wonderful articles - [6] "Peering Inside the PE.", and [13a/b] "An In-Depth Look into the Win32 PE file format". I will give you a brief overview of the PE specification, just enough to get the idea of hooking by manipulation of the Import Address Table.
  In general an PE binary file is organized, so that it has all code and data sections in a layout that conform to the virtual memory representation of an executable. PE file format is composed of several logical sections. Each of them maintains specific type of data and addresses particular needs of the OS loader.
  
  The section .idata, I would like to focus your attention on, contains information about Import Address Table. This part of the PE structure is particularly very crucial for building a spy program based on altering IAT.
  Each executable that conforms with PE format has layout roughly described by the figure below.
  
  Figure 3
  
  The program loader is responsible for loading an application along with all its linked DLLs into the memory. Since the address where each DLL is loaded into, cannot be known in advance, the loader is not able to determine the actual address of each imported function. The loader must perform some extra work to ensure that the program will call successfully each imported function. But going through each executable image in the memory and fixing up the addresses of all imported functions one by one would take unreasonable amount of processing time and cause huge performance degradation. So, how does the loader resolves this challenge? The key point is that each call to an imported function must be dispatched to the same address, where the function code resides into the memory. Each call to an imported function is in fact an indirect call, routed through IAT by an indirect JMP instruction. The benefit of this design is that the loader doesn't have to search through the whole image of the file. The solution appears to be quite simple - it just fixes-up the addresses of all imports inside the IAT. Here is an example of a snapshot PE File structure of a simple Win32 Application, taken with the help of the [8] PEView utility. As you can see TestApp import table contains two imported by GDI32.DLL function - TextOutA() and GetStockObject().
  
  Figure 4
  
  Actually the hooking process of an imported function is not that complex as it looks at first sight. In a nutshell an interception system that uses IAT patching has to discover the location that holds the address of imported function and replace it with the address of an user supplied function by overwriting it. An important requirement is that the newly provided function must have exactly the same signature as the original one. Here are the logical steps of a replacing cycle:
  - Locate the import section from the IAT of each loaded by the process DLL module as well as the process itself
  - Find the IMAGE_IMPORT_DESCRIPTOR chunk of the DLL that exports that function. Practically speaking, usually we search this entry by the name of the DLL
  - Locate the IMAGE_THUNK_DATA which holds the original address of the imported function
  - Replace the function address with the user supplied one
  By changing the address of the imported function inside the IAT, we ensure that all calls to the hooked function will be re-routed to the function interceptor.
  Replacing the pointer inside the IAT is that .idata section doesn't necessarily have to be a writable section. This requires that we must ensure that .idata section can be modified. This task can be accomplished by using VirtualProtect() API.
  
  Another issue that deserves attention is related to the GetProcAddress() API behavior on Windows 9x system. When an application calls this API outside the debugger it returns a pointer to the function. However if you call this function within from the debugger it actually returns different address than it would when the call is made outside the debugger. It is caused by the fact that that inside the debugger each call to GetProcAddress() returns a wrapper to the real pointer. Returned by GetProcAddress() value points to PUSH instruction followed by the actual address. This means that on Windows 9x when we loop through the thunks, we must check whether the address of examined function is a PUSH instruction (0x68 on x86 platforms) and accordingly get the proper value of the address function.
  
  Windows 9x doesn't implement copy-on-write, thus operating system attempts to keep away the debuggers from stepping into functions above the 2-GB frontier. That is the reason why GetProcAddress() returns a debug thunk instead of the actual address. John Robbins discusses this problem in [4] "Hooking Imported Functions".

Figuring out when to inject the hook DLL

Enumerating processes and modules

Following UMLdiagram shows the class relationships of this subsystem:

Figure 5

Requirements of the Hook Tool System

Provide a user-level hooking system for spying any Win32 API functions imported by name
Provide the abilities to inject hook driver into all running processes by Windows hooks as well as CreateRemoteThread() API. The framework should offer an ability to set this up by an INI file
Employ an interception mechanism based on the altering Import Address Table
Present an object-oriented reusable and extensible layered architecture
Offer an efficient and scalable mechanism for hooking API functions
Meet performance requirements
Provide a reliable communication mechanism for transferring data between the driver and the server
Implement custom supplied versions of TextOutA/W() and ExitProcess() API functions
Log events to a file
The system is implemented for x86 machines running Windows 9x, Me, NT or Windows 2K operating system

Design and implementation

This part of the article discusses the key components of theframework and how do they interact each other. This outfit is capable to captureany kind of WINAPI imported by namefunctions.

Before I outline thesystem's design, I would like to focus your attention on several methods for injectingand hooking.

Another crucial moment is thechoice of the hooking mechanism. Not surprisingly, I decided to apply altering IAT asan extremely robust method for Win32 API spying.

Toachieve desired goals I designed a simple framework composed of the followingcomponents and files:

TestApp.exe - a simple Win32 test application that just outputs a text using TextOut() API. The purpose of this app is to show how it gets hooked up.
HookSrv.exe - control program
HookTool .DLL - spy library implemented as Win32 DLL
HookTool.ini - a configuration file
NTProcDrv.sys - a tiny Windows NT/2K kernel-mode driver for monitoring process creation and termination. This component is optional and addresses the problem with detection of process execution under NT based systems only.

HookTool.DLL is the hook driver andthe heart of presented spying system. It implements the actual interceptorand provides three user supplied functions TextOutA/W() and ExitProcess() functions.

Following UML class diagram illustrates the relationshipsbetween set of classes used in HookTool.DLL's implementation.

Figure 6

After instantiating of the main processor object, a call to ManageModuleEnlistment() method will be made.Here is a simplified version of its implementation:

Collapse

// Called on DLL_PROCESS_ATTACH DLL notification

BOOL CModuleScope::ManageModuleEnlistment()
{
BOOL bResult = FALSE;
// Check if it is the hook server

if (FALSE == *m_pbHookInstalled)
{
// Set the flag, thus we will know that the server has been installed

*m_pbHookInstalled = TRUE;
// and return success error code

bResult = TRUE;
}
// and any other process should be examined whether it should be

// hooked up by the DLL

else
{
bResult = m_pInjector->IsProcessForHooking(m_szProcessName);
if (bResult)
InitializeHookManagement();
}
return bResult;
}

Collapse

 // Activate/Deactivate hooking

engine BOOL	CModuleScope::InstallHookMethod(BOOL bActivate, HWND hWndServer)
{
BOOL bResult;
if (bActivate)
{
*m_phwndServer = hWndServer;
bResult = m_pInjector->InjectModuleIntoAllProcesses();
}
else
{
m_pInjector->EjectModuleFromAllProcesses();
*m_phwndServer = NULL;
bResult = TRUE;
}
return bResult;
}

CWinHookInjector class implements Windows Hooks injecting mechanism. It installs a filter function by the following call

Collapse

// Inject the DLL into all running processes

BOOL CWinHookInjector::InjectModuleIntoAllProcesses()
{
*sm_pHook = ::SetWindowsHookEx(
WH_GETMESSAGE,
(HOOKPROC)(GetMsgProc),
ModuleFromAddress(GetMsgProc),
0
);
return (NULL != *sm_pHook);Injecting techniques

  
  Registry 
In order to inject
    a DLL into processes that link with USER32.DLL, you simply can add the DLL
    name to the value of the following registry key: 

    HKEY_LOCAL_MACHINE\Software\Microsoft\Windows
    NT\CurrentVersion\Windows\AppInit_DLLs 
    Its value contains a single DLL name or group of DLLs
    separated either by comma or spaces. According to MSDN documentation [7], all
    DLLs specified by the value of that key are loaded by each Windows-based
    application running within the current logon session. It is interesting that
    the actual loading of these DLLs occurs as a part of USER32's initialization.
    USER32 reads the value of mentioned registry key and calls LoadLibrary() for these DLLs in its DllMain code. However this trick applies only to
    applications that use USER32.DLL. Another restriction is that this built-in
    mechanism is supported only by NT and 2K operating systems. Although it is a
    harmless way to inject a DLL into a Windows processes there are few
    shortcomings: 

    
    
    
    In order to activate/deactivate the injection
        process you have to reboot Windows.
        
The DLL you want to inject will be mapped only into
        these processes that use USER32.DLL, thus you cannot expect to get your hook
        injected into console applications, since they usually don't import
        functions from USER32.DLL.
        
On the other hand you don't have any control over
        the injection process. It means that it is implanted into every single GUI
        application, regardless you want it or not. It is a redundant overhead
        especially if you intend to hook few applications only. For more details see
        [2] "Injecting a DLL Using the Registry" 
    

    
System-wide Windows Hooks 
Certainly a very popular technique for injecting DLL into
    a targeted process relies on provided by Windows Hooks. As pointed out in MSDN
    a hook is a trap in the system message-handling mechanism. An application can
    install a custom filter function to monitor the message traffic in the system
    and process certain types of messages before they reach the target window
    procedure.
    A hook is normally implemented in a DLL in
    order to meet the basic requirement for system-wide hooks. The basic concept
    of that sort of hooks is that the hook callback procedure is executed in the
    address spaces of each hooked up process in the system. To install a hook you
    call SetWindowsHookEx() with the appropriate
    parameters. Once the application installs a system-wide hook, the operating
    system maps the DLL into the address space in each of its client processes.
    Therefore global variables within the DLL will be "per-process" and cannot be
    shared among the processes that have loaded the hook DLL. All variables that
    contain shared data must be placed in a shared data section. The diagram
    bellow shows an example of a hook registered by Hook Server and injected into
    the address spaces named "Application one" and "Application two".

    
    Figure 1
     
    A system-wide hook is
    registered just ones when SetWindowsHookEx()
    is executed. If no error occurs a handle to the hook is returned. The
    returned value is required at the end of the custom hook function when a
    call to CallNextHookEx() has to be made.
    After a successful call to SetWindowsHookEx() , the operating system injects
    the DLL automatically (but not necessary immediately) into all processes that
    meet the requirements for this particular hook filter. Let's have a closer
    look at the following dummy WH_GETMESSAGE
    filter function:
    
    
    
     
     
     
      Collapse
    
    
    //---------------------------------------------------------------------------
    
    // GetMsgProc
    
    //
    
    // Filter function for the WH_GETMESSAGE - it's just a dummy function
    
    //---------------------------------------------------------------------------
    
    LRESULT CALLBACK GetMsgProc(
    int code,       // hook code
    
    WPARAM wParam,  // removal option
    
    LPARAM lParam   // message
    
    )
    {
    // We must pass the all messages on to CallNextHookEx.
    
    return ::CallNextHookEx(sg_hGetMsgHook, code, wParam, lParam);
    }
    

    A system-wide hook is loaded by multiple processes
    that don't share the same address space.
    

    For instance
    hook handle sg_hGetMsgHook, that is obtained
    by SetWindowsHookEx() and is used as
    parameter in CallNextHookEx() must be used
    virtually in all address spaces. It means that its value must be shared among
    hooked processes as well as the Hook Server application. In order to make this
    variable "visible" to all processes we should store it in the shared data
    section.
    

    The following is an example of employing #pragma data_seg(). Here I would like to mention
    that the data within the shared section must be initialized, otherwise the
    variables will be assigned to the default data segment and #pragma data_seg() will have no effect. 

    
    
    
     
     
     
      Collapse
    
    

    //---------------------------------------------------------------------------
    
    // Shared by all processes variables
    
    //---------------------------------------------------------------------------
    
    #pragma data_seg(".HKT")
    HHOOK sg_hGetMsgHook       = NULL;
    BOOL  sg_bHookInstalled    = FALSE;
    // We get this from the application who calls SetWindowsHookEx()'s wrapper
    
    HWND  sg_hwndServer        = NULL;
    #pragma data_seg()

    You should add a SECTIONS statement to the DLL's DEF file as well
    
    
    
     
     
     
      Collapse
    
    

    SECTIONS
    .HKT   Read Write Shared

    or use
    
    
    
     
     
     
      Collapse
    
    

    #pragma comment(linker, "/section:.HKT, rws")

    Once a hook DLL is loaded into the address space of
    the targeted process, there is no way to unload it unless the Hook Server
    calls UnhookWindowsHookEx() or the hooked
    application shuts down. When the Hook Server calls UnhookWindowsHookEx() the operating system loops
    through an internal list with all processes which have been forced to load the
    hook DLL. The operating system decrements the DLL's lock count and when it
    becomes 0, the DLL is automatically unmapped from the process's address
    space.

    Here are some of the advantages of this
    approach: 

    
    
    This mechanism is supported by NT/2K and 9x Windows
        family and hopefully will be maintained by future Windows versions as well.
        
Unlike the registry mechanism of injecting DLLs
        this method allows DLL to be unloaded when Hook Server decides that DLL is
        no longer needed and makes a call to UnhookWindowsHookEx() 

    Although I consider Windows Hooks as very handy
    injection technique, it comes with its own disadvantages: 

    
    
    Windows Hooks can degrade significantly the entire
        performance of the system, because they increase the amount of processing
        the system must perform for each message.
        
It requires lot of efforts to debug system-wide
        Windows Hooks. However if you use more than one instance of VC++ running in
        the same time, it would simplify the debugging process for more complex
        scenarios.
        
Last but not least, this kind of hooks affect the
        processing of the whole system and under certain circumstances (say a bug)
        you must reboot your machine in order to recover it.

        

    
Injecting DLL by using CreateRemoteThread() API function
Well,
    this is my favorite one. Unfortunately it is supported only by NT and
    Windows 2K operating systems. It is bizarre, that you are allowed to
    call (link with) this API on Win 9x as well, but it just returns NULL without doing anything.
          Injecting DLLs by
    remote threads is Jeffrey Ritcher's idea and is well documented in his article [9] "Load
    Your 32-bit DLL into Another Process's Address Space Using INJLIB".

    The basic concept is quite simple, but very elegant. Any
    process can load a DLL dynamically using LoadLibrary() API. The issue is how do we force an
    external process to call LoadLibrary() on
    our behalf, if we don't have any access to process's threads? Well, there is a
    function, called CreateRemoteThread() that
    addresses creating a remote thread. Here comes the trick - have a look at the
    signature of thread function, whose pointer is passed as parameter (i.e. LPTHREAD_START_ROUTINE) to the CreateRemoteThread():

    
    
    
     
     
     
      Collapse
    
    

    DWORD WINAPI ThreadProc(LPVOID lpParameter);

    And
    here is the prototype of LoadLibrary API
    
    
    
     
     
     
      Collapse
    
    

    HMODULE WINAPI LoadLibrary(LPCTSTR lpFileName);

    Yes,
    they do have "identical" pattern. They use the same calling convention WINAPI, they both accept one parameter and the
    size of returned value is the same. This match gives us a hint that we can use
    LoadLibrary() as thread function, which will
    be executed after the remote thread has been created. Let's have a look at the
    following sample code:
    
    
    
     
     
     
      Collapse
    
    

    hThread = ::CreateRemoteThread(
    hProcessForHooking,
    NULL,
    0,
    pfnLoadLibrary,
    "C:\\HookTool.dll",
    0,
    NULL);
    

    By using GetProcAddress() API we get
    the address of the LoadLibrary() API. The
    dodgy thing here is that Kernel32.DLL is mapped always to the same address
    space of each process, thus the address of LoadLibrary() function has the same value in
    address space of any running process. This ensures that we pass a valid
    pointer (i.e. pfnLoadLibrary) as parameter
    of CreateRemoteThread().
    As parameter of the thread function we use the full path
    name of the DLL, casting it to LPVOID. When
    the remote thread is resumed, it passes the name of the DLL to the
    ThreadFunction (i.e. LoadLibrary). That's
    the whole trick with regard to using remote threads for injection purposes.

    There is an important thing we should consider, if
    implanting through CreateRemoteThread() API.
    Every time before the injector application operate on the virtual memory of
    the targeted process and makes a call to CreateRemoteThread(), it first opens the process
    using OpenProcess() API and passes PROCESS_ALL_ACCESS flag as parameter. This flag is
    used when we want to get maximum access rights to this process. In this
    scenario OpenProcess() will return NULL for some of the processes with low ID number. This error
    (although we use a valid process ID) is caused by not running under security context
    that has enough permissions. If you think for a moment about it, you
    will realize that it makes perfect sense. All those restricted processes are
    part of the operating system and a normal application shouldn't be allowed
    to operate on them. What would happen if some application has a bug and accidentally
    attempts to terminate an operating system's process? To prevent the
    operating system from that kind of eventual crashes, it is required that
    a given application must have sufficient privileges to execute APIs that might
    alter operating system behavior. To get access to the system resources (e.g.
    smss.exe, winlogon.exe, services.exe, etc) through OpenProcess() invocation, you must be granted the
    debug privilege. This ability is extremely powerful and offers a way to access
    the system resources, that are normally restricted. Adjusting the process
    privileges is a trivial task and can be described with the following logical
    operations: 

    
    
    Open the process token with permissions needed to
        adjust privileges
        
Given a privilege's name "SeDebugPrivilege", we should locate its local
        LUID mapping. The privileges are specified by name and can be found in
        Platform SDK file winnt.h
        
Adjust the token in order to enable the "SeDebugPrivilege" privilege by calling AdjustTokenPrivileges() API
        
Close obtained by OpenProcessToken() process token handle
        

    For more details about changing privileges see [10] "Using
    privilege".

    
Implanting through BHO add-ins
Sometimes you will need to inject a custom code inside
    Internet Explorer only. Fortunately Microsoft provides an easy and well
    documented way for this purpose - Browser Helper Objects. A BHO is implemented
    as COM DLL and once it is properly registered, each time when IE is launched
    it loads all COM components that have implemented IObjectWithSite interface.

    
MS Office add-ins
Similarly,
    to the BHOs, if you need to implant in MS Office applications code of your
    own, you can take the advantage of provided standard mechanism by implementing
    MS Office add-ins. There are many available samples that show how to implement
    this kind of add-ins.

Interception mechanisms 

Injecting a DLL into the address space of an external
process is a key element of a spying system. It provides an excellent
opportunity to have a control over process's thread activities. However it is
not sufficient to have the DLL injected if you want to intercept API function
calls within the process. 

This part of the article
intends to make a brief review of several available real-world hooking aspects.
It focuses on the basic outline for each one of them, exposing their advantages
and disadvantages. 

In terms of the level where the
hook is applied, there are two mechanisms for API spying - Kernel level and
User level spying. To get better understanding of these two levels you must be
aware of the relationship between the Win32 subsystem API and the Native API.
Following figure demonstrates where the different hooks are set and illustrates
the module relationships and their dependencies on Windows 2K: 

Figure 2


The major implementation difference between them is that
interceptor engine for kernel-level hooking is wrapped up as a kernel-mode
driver, whereas user-level hooking usually employs user-mode DLL. 


  
  NT Kernel level
    hooking
There are several methods for achieving
    hooking of NT system services in kernel mode. The most popular interception
    mechanism was originally demonstrated by Mark Russinovich and Bryce Cogswell
    in their article [3] "Windows NT System-Call Hooking". Their basic idea is to
    inject an interception mechanism for monitoring NT system calls just bellow
    the user mode. This technique is very powerful and provides an extremely
    flexible method for hooking the point that all user-mode threads pass through
    before they are serviced by the OS kernel.
    You can
    find an excellent design and implementation in "Undocumented Windows 2000 Secrets"
    as well. In his great book Sven Schreiber explains how to build a
    kernel-level hooking framework from scratch [5].
    

    Another comprehensive analysis and brilliant
    implementation has been provided by Prasad Dabak in his book "Undocumented
    Windows NT" [17].

    However, all these hooking
    strategies, remain out of the scope of this article. 

    
Win32 User level hooking

    
    
    
    Windows subclassing.
This
        method is suitable for situations where the application's behavior might be
        changed by new implementation of the window procedure. To accomplish this
        task you simply call SetWindowLongPtr()
        with GWLP_WNDPROC parameter and pass the
        pointer to your own window procedure. Once you have the new subclass
        procedure set up, every time when Windows dispatches a message to a
        specified window, it looks for the address of the window's procedure
        associated with the particular window and calls your procedure instead of
        the original one.
        The drawback of this mechanism is
        that subclassing is available only within the boundaries of a specific
        process. In other words an application should not subclass a window class
        created by another process.

        Usually this approach is
        applicable when you hook an application through add-in (i.e. DLL / In-Proc
        COM component) and you can obtain the handle to the window whose procedure
        you would like to replace.

        For example, some time
        ago I wrote a simple add-in for IE (Browser Helper Object) that replaces the
        original pop-up menu provided by IE using subclassing.

        

        
Proxy DLL (Trojan DLL)
An
        easy way for hacking API is just to replace a DLL with one that has the same
        name and exports all the symbols of the original one. This technique can be
        effortlessly implemented using function forwarders. A function forwarder
        basically is an entry in the DLL's export section that delegates a function
        call to another DLL's function.
        You can accomplish
        this task by simply using #pragma comment:
        

        
      
      
       
       
       
        Collapse
      
      

        #pragma comment(linker, "/export:DoSomething=DllImpl.ActuallyDoSomething")

        However, if you decide to employ this method, you
        should take the responsibility of providing compatibilities with newer
        versions of the original library. For more details see [13a] section "Export
        forwarding" and [2] "Function Forwarders".

        

        
Code overwriting
There are
        several methods that are based on code overwriting. One of them changes the
        address of the function used by CALL instruction. This method is difficult,
        and error prone. The basic idea beneath is to track down all CALL
        instructions in the memory and replace the addresses of the original
        function with user supplied one.
        Another method of
        code overwriting requires a more complicated implementation. Briefly, the
        concept of this approach is to locate the address of the original API
        function and to change first few bytes of this function with a JMP
        instruction that redirects the call to the custom supplied API function.
        This method is extremely tricky and involves a sequence of restoring and
        hooking operations for each individual call. It's important to point out
        that if the function is in unhooked mode and another call is made during
        that stage, the system won't be able to capture that second call.
The major problem is that it contradicts with the rules
        of a multithreaded environment.

        However, there is a
        smart solution that solves some of the issues and provides a sophisticated
        way for achieving most of the goals of an API interceptor. In case you are
        interested you might peek at [12] Detours implementation.

        

        
Spying by a debugger
An
        alternative to hooking API functions is to place a debugging breakpoint into
        the target function. However there are several drawbacks for this method.
        The major issue with this approach is that debugging exceptions suspend all
        application threads. It requires also a debugger process that will handle
        this exception. Another problem is caused by the fact that when the debugger
        terminates, the debugger is automatically shut down by Windows.

        
Spying by altering of the Import Address Table
This technique was originally published by Matt Pietrek
        and than elaborated by Jeffrey Ritcher ([2] "API Hooking by Manipulating a
        Module's Import Section") and John Robbins ([4] "Hooking Imported
        Functions"). It is very robust, simple and quite easy to implement. It also
        meets most of the requirements of a hooking framework that targets Windows
        NT/2K and 9x operating systems. The concept of this technique relies on the
        elegant structure of the Portable Executable (PE) Windows file format. To
        understand how this method works, you should be familiar with some of the
        basics behind PE file format, which is an extension of Common Object File
        Format (COFF). Matt Pietrek reveals the PE format in details in his
        wonderful articles - [6] "Peering Inside the PE.", and [13a/b] "An In-Depth
        Look into the Win32 PE file format". I will give you a brief overview of the
        PE specification, just enough to get the idea of hooking by manipulation of
        the Import Address Table.
        In general an PE binary
        file is organized, so that it has all code and data sections in a layout
        that conform to the virtual memory representation of an executable. PE file
        format is composed of several logical sections. Each of them maintains
        specific type of data and addresses particular needs of the OS loader. 

        The section .idata, I
        would like to focus your attention on, contains information about Import
        Address Table. This part of the PE structure is particularly very crucial
        for building a spy program based on altering IAT.
Each executable that conforms with PE format has layout
        roughly described by the figure below.

        Figure 3

        

         The program loader is responsible for loading an application along with
        all its linked DLLs into the memory. Since the address where each DLL is
        loaded into, cannot be known in advance, the loader is not able
        to determine the actual address of each imported function. The loader must perform
        some extra work to ensure that the program will call
        successfully each imported function. But going through each executable image in the memory
        and fixing up the addresses of all imported functions one by one would
        take unreasonable amount of processing time and cause huge performance degradation.
        So, how does the loader resolves this challenge? The key point is that each
        call to an imported function must be dispatched to the same address,
        where the function code resides into the memory. Each call to an imported function
        is in fact an indirect call, routed through IAT by an indirect JMP instruction.
        The benefit of this design is that the loader doesn't have to search through
        the whole image of the file. The solution appears to be quite simple
        - it just fixes-up the addresses of all imports inside the IAT. Here is an
        example of a snapshot PE File structure of a simple Win32 Application, taken
        with the help of the [8] PEView utility. As you can see TestApp import table
        contains two imported by GDI32.DLL function - TextOutA() and GetStockObject().
        

        Figure 4

        

        Actually the hooking process of an imported function is not that complex as
        it looks at first sight. In a nutshell an interception system that uses
        IAT patching has to discover the location that holds the address of imported
        function and replace it with the address of an user supplied function by
        overwriting it. An important requirement is that the newly provided function
        must have exactly the same signature as the original one. Here are the logical
        steps of a replacing cycle:
        
      
      Locate the import section from the IAT of each
 loaded by the process DLL module as well as the process itself
 
Find the IMAGE_IMPORT_DESCRIPTOR chunk of the DLL that
 exports that function. Practically speaking, usually we search this entry
 by the name of the DLL
 
Locate the IMAGE_THUNK_DATA which holds the original
 address of the imported function
 
Replace the function address with the user
 supplied one 

        
By changing the address of the
        imported function inside the IAT, we ensure that all calls to the hooked
        function will be re-routed to the function interceptor.
           Replacing the pointer inside the IAT
        is that .idata section doesn't necessarily
        have to be a writable section. This requires that we must ensure that .idata section can be modified. This task can be
        accomplished by using VirtualProtect()
        API.

         Another issue that deserves attention is related
        to the GetProcAddress() API behavior on
        Windows 9x system. When an application calls this API outside the debugger
        it returns a pointer to the function. However if you call this function within
        from the debugger it actually returns different address than it would
        when the call is made outside the debugger. It is caused by the fact that that
        inside the debugger each call to GetProcAddress() returns a wrapper to the real
        pointer. Returned by GetProcAddress()
        value points to PUSH instruction followed by
        the actual address. This means that on Windows 9x when we loop through the
        thunks, we must check whether the address of examined function is a PUSH instruction (0x68 on x86 platforms) and
        accordingly get the proper value of the address function.

        Windows 9x doesn't implement copy-on-write, thus
        operating system attempts to keep away the debuggers from stepping into
        functions above the 2-GB frontier. That is the reason why GetProcAddress() returns a debug thunk instead
        of the actual address. John Robbins discusses this problem in [4] "Hooking
        Imported Functions".

        

    

Figuring out when to inject the hook DLL

 That section reveals some challenges that are faced
by developers when the selected injection mechanism is not part of the
operating system's functionality. For example, performing the injection is not your concern when you
use built-in Windows Hooks in order to implant a DLL. It is an OS's responsibility
to force each of those running processes that meet the requirements for
this particular hook, to load the DLL [18]. In fact Windows keeps track of all
newly launched processes and forces them to load the hook DLL. Managing
injection through registry is quite similar to Windows Hooks. The biggest advantage of
all those "built-in" methods is that they come as part of the OS.

Unlike the discussed above implanting techniques, to inject
by CreateRemoteThread()
requires maintenance of all currently running processes. If the injecting is made not on
time, this can cause the Hook System to miss some of the calls it claims as intercepted.
It is crucial that the Hook Server application implements a smart mechanism for
receiving notifications each time when a new process starts or shuts down.
One of the suggested methods in this case, is to intercept CreateProcess() API family functions and monitor all
their invocations. Thus when an user supplied function is called, it can call
the original CreateProcess() with dwCreationFlags OR-ed with CREATE_SUSPENDED flag. This means that the primary
thread of the targeted application will be in suspended state, and the Hook
Server will have the opportunity to inject the DLL by hand-coded machine
instructions and resume the application using ResumeThread() API. For more
details you might refer to [2] "Injecting Code with CreateProcess()".

The second method
of detecting process execution, is based on implementing a simple device driver.
It offers the greatest flexibility and deserves even more attention. Windows
NT/2K provides a special function PsSetCreateProcessNotifyRoutine() exported by
NTOSKRNL. This function allows adding a callback function, that is called
whenever a process is created or deleted. For more details see [11] and [15]
from the reference section.

Enumerating processes and modules

Sometimes we would prefer to use injecting of the DLL by
CreateRemoteThread()  API, especially when
the system runs under NT/2K. In this case when the Hook Server is started it must
enumerate all active processes and inject the DLL into their address spaces.
Windows 9x and Windows 2K provide a built-in implementation (i.e. implemented by
Kernel32.dll) of Tool Help Library. On the other hand Windows NT uses for the
same purpose PSAPI library. We need a way to allow the Hook Server to run and
then to detect dynamically which process "helper" is available. Thus the system
can determine which the supported library is, and accordingly to use the
appropriate APIs.

I will present an object-oriented
architecture that implements a simple framework for retrieving processes and
modules under NT/2K and 9x [16]. The design of my classes allows extending the
framework according to your specific needs. The implementation itself is pretty
straightforward.

CTaskManager implements the system's processor. It
is responsible for creating an instance of a specific library handler (i.e.
CPsapiHandler or CToolhelpHandler) that is able to employ the correct
process information provider library (i.e. PSAPI or ToolHelp32 respectively).
CTaskManager is in charge of creating and
marinating a container object that keeps a list with all currently active
processes. After instantiating of the CTaskManager object the application calls Populate() method. It forces enumerating of all
processes and DLL libraries and storing them into a hierarchy kept by CTaskManager's member m_pProcesses. 

Following UML
diagram shows the class relationships of this subsystem: 

Figure 5


It is important to highlight the fact that NT's
Kernel32.dll doesn't implement any of the ToolHelp32 functions. Therefore we
must link them explicitly, using runtime dynamic linking. If we use static
linking the code will fail to load on NT, regardless whether or not the
application has attempted to execute any of those functions. For more details
see my article "Single
interface for enumerating processes and modules under NT and Win9x/2K.". 

Requirements of the Hook Tool System


Now that I've made a brief introduction to the various concepts of the
hooking process it's time to determine the basic requirements and explore the design of a
particular hooking system. These are some of the issues addressed by the Hook Tool System:


  
  Provide a user-level hooking system for spying any
    Win32 API functions imported by name
    
Provide the abilities to inject hook driver into all
    running processes by Windows hooks as well as CreateRemoteThread() API. The framework should
    offer an ability to set this up by an INI file
    
Employ an interception mechanism based on the
    altering Import Address Table
    
Present an object-oriented reusable and extensible
    layered architecture
    
Offer an efficient and scalable mechanism for hooking
    API functions
    
Meet performance requirements
    
Provide a reliable communication mechanism for
    transferring data between the driver and the server
    
Implement custom supplied versions of TextOutA/W() and ExitProcess() API functions
    
Log events to a file
    
The system is implemented for x86 machines running
    Windows 9x, Me, NT or Windows 2K operating system 

Design and implementation

This part of the article discusses the key components of the
framework and how do they interact each other. This outfit is capable to capture
any kind of WINAPI imported by name
functions.

    Before I outline the
system's design, I would like to focus your attention on several methods for injecting
and hooking.

First and foremost, it is necessary to select
an implanting method that will meet the requirements for injecting the DLL
driver into all processes. So I designed an abstract approach with two
injecting techniques, each of them applied accordingly to the settings in
the INI file and the type of the operating system (i.e. NT/2K or 9x). They are
- System-wide Windows Hooks and CreateRemoteThread()
method. The sample framework offers the ability to inject the DLL on
NT/2K by Windows Hooks as well as to implant by CreateRemoteThread() means. This can be determined
by an option in the INI file that holds all settings of the system.

        Another crucial moment is the
choice of the hooking mechanism. Not surprisingly, I decided to apply altering IAT as
an extremely robust method for Win32 API spying.

To
achieve desired goals I designed a simple framework composed of the following
components and files:


  
  TestApp.exe - a simple Win32 test application that
    just outputs a text using TextOut() API. The purpose of this app is to show
    how it gets hooked up.
    
HookSrv.exe - control program
    
HookTool .DLL - spy library implemented as Win32 DLL
    
HookTool.ini - a configuration file
    
NTProcDrv.sys - a tiny Windows NT/2K kernel-mode
    driver for monitoring process creation and termination. This component is
    optional and addresses the problem with detection of process execution under
    NT based systems only. 

 HookSrv is a simple control program. Its main role is
to load the HookTool.DLL and then to activate the spying engine. After loading
the DLL, the Hook Server calls InstallHook()
function and passes a handle to a hidden windows where the DLL should post all
messages to.

 HookTool.DLL is the hook driver and
the heart of presented spying system. It implements the actual interceptor
and provides three user supplied functions TextOutA/W() and ExitProcess() functions.

Although the article emphasizes on Windows internals and there
is no need for it to be object-oriented, I decided to encapsulate related
activities in reusable C++ classes. This approach provides more flexibility and
enables the system to be extended. It also benefits developers with the ability
to use individual classes outside this project.

Following UML class diagram illustrates the relationships
between set of classes used in HookTool.DLL's implementation. 

Figure 6


In this section of the article I would like to draw your
attention to the class design of the HookTool.DLL. Assigning responsibilities to
the classes is an important part of the development process. Each of the
presented classes wraps up a specific functionality and represents a particular
logical entity.

CModuleScope is the main doorway of the system. It
is implemented using "Singleton" pattern and works in a thread-safe manner. Its
constructor accepts 3 pointers to the data declared in the shared segment, that
will be used by all processes. By this means the values of those system-wide
variables can be maintained very easily inside the class, keeping the rule for
encapsulation.

When an application loads the HookTool
library, the DLL creates one instance of CModuleScope on receiving DLL_PROCESS_ATTACH notification. This step just initializes the only instance of CModuleScope. An important piece of the CModuleScope
object construction is the creation of an appropriate injector object.
The decision which injector to use will be made after parsing the
HookTool.ini file and determining the value of UseWindowsHook
parameter under [Scope] section. In case that the system is running
under Windows 9x, the value of this parameter won't be examined by the
system, because Window 9x doesn't support injecting by remote threads.


After instantiating of the main processor object, a call to ManageModuleEnlistment() method will be made.
Here is a simplified version of its implementation: 


  
  
   
   
   
    Collapse
  
  

// Called on DLL_PROCESS_ATTACH DLL notification

BOOL CModuleScope::ManageModuleEnlistment()
{
BOOL bResult = FALSE;
// Check if it is the hook server

if (FALSE == *m_pbHookInstalled)
{
// Set the flag, thus we will know that the server has been installed

*m_pbHookInstalled = TRUE;
// and return success error code

bResult = TRUE;
}
// and any other process should be examined whether it should be

// hooked up by the DLL

else
{
bResult = m_pInjector->IsProcessForHooking(m_szProcessName);
if (bResult)
InitializeHookManagement();
}
return bResult;
}


The implementation of the method ManageModuleEnlistment() is straightforward and examines whether the call has been made by the Hook Server, inspecting the value m_pbHookInstalled points to. If an  invocation has been initiated by the Hook Server, it just sets up indirectly the flag sg_bHookInstalled to TRUE. It tells that the Hook Server has been started.


The next action taken by the Hook Server is to activate the engine through a single call to InstallHook() DLL exported function. Actually its call is delegated to a method of CModuleScope - InstallHookMethod(). The main purpose of this function is to force targeted for hooking processes to load or unload the HookTool.DLL.


  
  
   
   
   
    Collapse
  
  

 // Activate/Deactivate hooking

engine BOOL	CModuleScope::InstallHookMethod(BOOL bActivate, HWND hWndServer)
{
BOOL bResult;
if (bActivate)
{
*m_phwndServer = hWndServer;
bResult = m_pInjector->InjectModuleIntoAllProcesses();
}
else
{
m_pInjector->EjectModuleFromAllProcesses();
*m_phwndServer = NULL;
bResult = TRUE;
}
return bResult;
}

HookTool.DLL provides two mechanisms for self injecting into the
address space of an external process - one that uses Windows Hooks and
another that employs injecting of DLL by CreateRemoteThread() API. The architecture of the system defines an abstract class CInjector that exposes pure virtual functions for injecting and ejecting DLL. The classes CWinHookInjector and CRemThreadInjector  inherit from the same base -  CInjector class. However they provide different  realization of the pure virtual methods InjectModuleIntoAllProcesses() and EjectModuleFromAllProcesses(), defined in CInjector interface.


CWinHookInjector class implements Windows Hooks injecting mechanism. It installs a filter function by the following call


  
  
   
   
   
    Collapse
  
  

// Inject the DLL into all running processes

BOOL CWinHookInjector::InjectModuleIntoAllProcesses()
{
*sm_pHook = ::SetWindowsHookEx(
WH_GETMESSAGE,
(HOOKPROC)(GetMsgProc),
ModuleFromAddress(GetMsgProc),
0
);
return (NULL != *sm_pHook);
}


As you can see it makes a request to the system for registering WH_GETMESSAGE hook. The server executes this method only once. The last parameter of SetWindowsHookEx() is 0, because GetMsgProc()
is designed to operate as a system-wide hook. The callback function
will be invoked by the system each time when a window is about to
process a particular message. It is interesting that we have to provide
a nearly dummy implementation of the GetMsgProc()
callback, since we don't intend to monitor the message processing. We
supply this implementation only in order to get free injection
mechanism provided by the operating system.


After making the call to SetWindowsHookEx(), OS checks whether the DLL (i.e. HookTool.DLL) that exports GetMsgProc()
has been already mapped in all GUI processes. If the DLL hasn't been
loaded yet, Windows forces those GUI processes to map it. An
interesting fact is, that a system-wide hook DLL should not return FALSE in its DllMain(). That's because the operating system validates DllMain()'s return value and keeps trying to load this DLL until its DllMain() finally returns TRUE.


A quite different approach is demonstrated by the CRemThreadInjector class. Here the  implementation is based on injecting the DLL using remote threads. CRemThreadInjector
extends the maintenance of the Windows processes by providing means for
receiving notifications of process creation and termination. It holds
an instance of CNtInjectorThread class that observes the process execution. CNtInjectorThread object takes care for getting notifications from the kernel-mode driver. Thus each time when a process is created a call to CNtInjectorThread ::OnCreateProcess() is issued, accordingly when the process exits CNtInjectorThread ::OnTerminateProcess()
is automatically called. Unlike the Windows Hooks, the method that
relies on remote thread, requires manual injection each time when a new
process is created. Monitoring process activities will provide us with
a simple technique for alerting when a new process starts.


CNtDriverController class implements a wrapper around API
functions for administering services and drivers. It is designed to
handle the loading and unloading of the kernel-mode driver
NTProcDrv.sys. Its implementation will be discussed later.


After a successful injection of HookTool.DLL into a particular process, a call to ManageModuleEnlistment() method is issued inside the DllMain(). Recall the method's implementation that I described earlier. It  examines the shared variable sg_bHookInstalled through the CModuleScope 's member m_pbHookInstalled. Since the server's initialization had already set the value of sg_bHookInstalled to TRUE,
the system checks whether this application must be hooked up and if so,
it actually activates the spy engine for this particular process.


Turning the hacking engine on, takes place in the CModuleScope::InitializeHookManagement()'s implementation. The idea of this method is to install hooks for some vital functions as LoadLibrary() API family as well as GetProcAddress().
By this means we can monitor loading of DLLs after the initialization
process. Each time when a new DLL is about to be mapped it is necessary
to fix-up its import table, thus we ensure that the system won't miss
any call to the captured function.


At the end of the InitializeHookManagement() method we provide initializations for the function we actually want to spy on.


Since the sample code demonstrates capturing of more than one user
supplied functions, we must provide a single implementation for each
individual hooked function. This means that using this approach you
cannot just change the addresses inside IAT of the different imported
functions to point to a single "generic" interception function. The
spying function needs to know which function this call comes to. It is
also crucial that the signature of the interception routine must be
exactly the same as the original WINAPI function prototype, otherwise the stack will be corrupted. For example CModuleScope implements three static methods MyTextOutA(),MyTextOutW() and MyExitProcess().
Once the HookTool.DLL is loaded into the address space of a process and
the spying engine is activated, each time when a call to the original TextOutA() is issued, CModuleScope:: MyTextOutA() gets called instead.


Proposed design of the spying engine itself is quite efficient and
offers great flexibility. However, it is suitable mostly for scenarios
where the set of functions for interception is known in advance and
their number is limited.


If you want to add new hooks to the system you simply declare and implement the interception function as I did with MyTextOutA/W() and MyExitProcess(). Then you have to register it in the way shown by InitializeHookManagement() implementation.


Intercepting and tracing process execution is a very useful mechanism
for implementing systems that require manipulations of external
processes. Notifying interested parties upon starting of a new
processes is a classic problem of developing process monitoring systems
and system-wide hooks. The Win32 API provides a set of great libraries
(PSAPI and ToolHelp [16]) that allow you to enumerate processes
currently running in the system. Although these APIs are extremely
powerful they don't permit you to get notifications when a new process
starts or ends up. Luckily, NT/2K provides a set of APIs, documented in
Windows DDK documentation as "Process Structure Routines" exported by
NTOSKRNL. One of these APIs PsSetCreateProcessNotifyRoutine()
offers the ability to register system-wide callback function which is
called by OS each time when a new process starts, exits or has been
terminated. The mentioned API can be employed as a simple way to for
tracking down processes simply by implementing a NT kernel-mode driver
and a user mode Win32 control application. The role of the driver is to
detect process execution and notify the control program about these
events. The implementation of the Windows process's observer NTProcDrv
provides a minimal set of functionalities required for process
monitoring under NT based systems. For more details see articles [11]
and [15]. The code of the driver can be located in the NTProcDrv.c
file. Since the user mode implementation installs and uninstalls the
driver dynamically the currently logged-on user must have administrator
privileges. Otherwise you won't be able to install the driver and it
will disturb the process of monitoring. A way around is to manually
install the driver as an administrator or run HookSrv.exe using offered
by Windows 2K "Run as different user" option.  


Last but
not least, the provided tools can be administered by simply changing the settings of an INI file (i.e. HookTool.ini). This file determines whether to use Windows hooks (for 9x and NT/2K) or CreateRemoteThread()
(only under NT/2K) for injecting. It also offers a way to specify which
process must be hooked up and which shouldn't be intercepted. If you
would like to monitor the process there is an option (Enabled) under
section [Trace] that allows to log system activities. This option
allows you to report rich error information using the methods exposed
by CLogFile class. In fact ClogFile provides thread-safe implementation
and you don't have to take care about synchronization issues related to
accessing shared system resources (i.e. the log file). For more details
see CLogFile and content of HookTool.ini file.

Sample code

The project compiles with VC6++ SP4 and requires Platform SDK. In a
production Windows NT environment you need to provide PSAPI.DLL in
order to use provided
 CTaskManager implementation.

Before you run the
sample code make sure that all the settings in HookTool.ini file have
been set according to your specific needs.


For those that will like the lower-level stuff and are interested in
further development of the kernel-mode driver NTProcDrv code, they must
install Windows DDK.

Out of the scope


For the sake of simplicity these are some of the subjects I intentionally left out of the scope of this article:


  
  Monitoring Native API calls
    
A driver for monitoring process execution on Windows
    9x systems.
    
UNICODE support, although you can still hook UNICODE imported APIs

Conclusion

This article by far doesn't provide a complete guide for the
unlimited API hooking subject and without any doubt it misses some
details. However I tried to fit in this few pages just enough important
information that might help those who are interested in user mode Win32
API spying.

References


[1] 
"Windows 95 System Programming Secrets", Matt Pietrek
[2] "Programming Application for
MS Windows" , Jeffrey Richter


[3] "Windows NT System-Call
Hooking"
, Mark Russinovich and Bryce Cogswell, Dr.Dobb's Journal January 1997

[4] "Debugging applications" , John Robbins


[5] "Undocumented Windows 2000
Secrets" , Sven Schreiber


[6] "Peering
Inside the PE: A Tour of the Win32 Portable Executable File Format"
by Matt Pietrek, March 1994 

[7] MSDN Knowledge base Q197571

[8]
PEview Version 0.67
, Wayne J. Radburn

[9] "Load Your 32-bit DLL into Another Process's Address Space Using INJLIB"  MSJ May 1994

[10] "Programming Windows
Security"
, Keith Brown

[11] "Detecting Windows NT/2K
process execution" Ivo Ivanov, 2002


[12] "Detours" Galen Hunt and Doug Brubacher


[13a] "An In-Depth
Look into the Win32 PE file format"
, part 1, Matt Pietrek, MSJ February 2002

[13b] "An In-Depth
Look into the Win32 PE file format"
, part 2, Matt Pietrek, MSJ March 2002

[14] "Inside MS Windows 2000 Third
Edition" , David Solomon and Mark Russinovich


[15] "Nerditorium",
James Finnegan, MSJ January 1999


[16] "Single interface for
enumerating processes and modules under NT and Win9x/2K."
, Ivo Ivanov, 2001

[17] "Undocumented Windows NT"
, Prasad Dabak, Sandeep Phadke and Milind Borate

[18] Platform SDK: Windows User Interface, Hooks

}

At the end of the InitializeHookManagement() method we provide initializations for the function we actually want to spy on.

Sample code

The project compiles with VC6++ SP4 and requires Platform SDK. In aproduction Windows NT environment you need to provide PSAPI.DLL inorder to use provided CTaskManager implementation.

Before you run thesample code make sure that all the settings in HookTool.ini file havebeen set according to your specific needs.

For those that will like the lower-level stuff and are interested infurther development of the kernel-mode driver NTProcDrv code, they mustinstall Windows DDK.