终于到了期待已久的编写代码环节了,哈哈哈哈~
开始
首先,从上一章中我们了解到,shellcode代码是不可以包含任何的导入表的,所以我们写的所有shellcode代码都不可以调用系统库,这里说的不能调用是指不可以静态包含这些系统库。
那么,既然不能调用系统库,我们怎么实现我们的代码功能呢,总不能啥也不干,只用"printf"函数吧,哦,不好意思,“printf”也不可以调用,因为他也是系统库里的,哈哈哈,有木有一点崩溃呢。
相信聪明的同学已经猜到了,既然不能静态的调用,那么是不是可以动态调用呢,没错,实际上我们的shellcode代码是可以动态寻找系统库,然后调用的。
让我们重温一下调用其他库函数的流程:
1、首先,我们需要获取所需库的句柄,我们需要用到“LoadLibraryA”或"LoadLibraryW"这个API,其中前一个是Ansi编码的,后边是Unicode编码的,这个API在那个系统库呢,我们可以从MSDN中看到它属于"kernel32.dll"
2、当我们需要从库中获取某个函数的地址时我们需要使用"GetProcAddress"API,凑巧的是,这个API同样在"kernel32.dll"库中,是不是就减少了我们麻烦呢~
当我们知道上述的流程后,接下来就简单了,我们只需要在代码中获取"Kernel32.dll"加载的函数地址就可以了,这里提一句,因为这个库是系统库,所以每个进程一般都会包含且调用这个库,不一般的情况是什么呢,那就是像我们似的写了一个shellcode代码,咱不关心它,因为它不调用说明它啥事也没干。
好了,废话不多说,读这么多文字相信各位同学已经很难受了吧(PS:写这么多文字我也感觉难受)。
获取"kernel32.dll"在进程中的地址
要获取"kernel32.dll"在进程中的内存地址,有两种方法可以获取:
获取系统库在进程中的地址之前我们需要了解PEB结构是什么,大家可以自行去查阅获取,这里就不过多的做介绍了。
1、直接使用汇编获取模块的地址
__asm {
mov eax,fs:[30h]
mov eax,[eax+0ch]
mov eax,[eax+14h]
mov eax,[eax]
mov eax,[eax]
mov eax,[eax+10h]
ret
}
上边这段代码仅适用于32位进程的获取,由于64位进程的特殊性,它并不适合在64位进程下获取相应的模块地址。PS:在xp系统下,模块地址的加载顺序为"exe本身"、“ntdll.dll”、“kernel32.dll”;win7以后的系统,模块地址加载顺序则为"exe本身"、“ntdll.dll”、“kernel32.dll”、“kernelbase.dll”。
2、解析模块加载的链表获取"kernel32.dll"的地址
#include <Windows.h>
#include <winternl.h>
HMODULE GetKernel32BaseAddress()
{
HMODULE hKernel32 = NULL;
//保存模块名
WCHAR wszModuleName[MAX_PATH];
#ifdef _WIN64
//获取gs偏移60h
PPEB lpPeb = (PPEB)__readgsqword(0x60);
#else
//获取fs偏移30h
PPEB lpPeb = (PPEB)__readfsdword(0x30);
#endif
//模块列表
PLIST_ENTRY pListHead = &lpPeb->Ldr->InMemoryOrderModuleList;
PLIST_ENTRY pListData = pListHead->Flink;
while (pListData != pListHead)
{
PLDR_DATA_TABLE_ENTRY pLDRData = CONTAINING_RECORD(pListData, LDR_DATA_TABLE_ENTRY, InMemoryOrderLinks);
//模块路径字符串数量
DWORD dwLen = pLDRData->FullDllName.Length / 2;
if (dwLen > 12)
{
for (size_t i = 0; i < 12; i++)
{
wszModuleName[11 - i] = pLDRData->FullDllName.Buffer[dwLen - 1 - i];
}
//kernel32.dll
if ((wszModuleName[0] == 'k' || wszModuleName[0] == 'K') &&
(wszModuleName[1] == 'e' || wszModuleName[1] == 'E') &&
(wszModuleName[2] == 'r' || wszModuleName[2] == 'R') &&
(wszModuleName[3] == 'n' || wszModuleName[3] == 'N') &&
(wszModuleName[4] == 'e' || wszModuleName[4] == 'E') &&
(wszModuleName[5] == 'l' || wszModuleName[5] == 'L') &&
wszModuleName[6] == '3' &&
wszModuleName[7] == '2' &&
wszModuleName[8] == '.' &&
(wszModuleName[9] == 'd' || wszModuleName[9] == 'D') &&
(wszModuleName[10] == 'l' || wszModuleName[10] == 'L') &&
(wszModuleName[11] == 'l' || wszModuleName[11] == 'L'))
{
//kernel32.dll在进程中的基址
hKernel32 = (HMODULE)pLDRData->DllBase;
break;
}
}
pListData = pListData->Flink;
}
return hKernel32;
}
上述代码主要是根据PEB结构,循环遍历链表,通过比对链表中的模块名获取其相应的地址。相对于第一种方法来说这种方式不仅适用于32位进程,同样适用于64位进程。原理则是读取"gs"和"fs"的偏移以获取对应的PEB结构,从而获取PEB中的模块地址。
获取"GetprocAddress"API的地址
从MSDN中我们知道此API是由"kernel32.dll"导出的,而在上述步骤中我们已经获取到了"kernel32.dll"在进程中的内存地址,剩下的就是解析其导出表,从而得到"GetprocAddress"函数的地址了。解析PE文件导入表需要对PE文件有基本的了解,PE文件的格式大家可以自行查阅,这里也不做过多的解析了,直接上代码:
#include <Windows.h>
#include <winternl.h>
FARPROC GetPorcAddressBaseAddress()
{
FARPROC pGetPorcAddress = NULL;
HMODULE hKernel32 = GetKernel32BaseAddress();
if (!hKernel32)
return pGetPorcAddress;
//获取Dos头
PIMAGE_DOS_HEADER lpDosHeader = (PIMAGE_DOS_HEADER)hKernel32;
//获取NT头
PIMAGE_NT_HEADERS lpNtHeader = (PIMAGE_NT_HEADERS)((unsigned char*)hKernel32 + lpDosHeader->e_lfanew);
if (!lpNtHeader->OptionalHeader.DataDirectory[IMAGE_DIRECTORY_ENTRY_EXPORT].Size &&
!lpNtHeader->OptionalHeader.DataDirectory[IMAGE_DIRECTORY_ENTRY_EXPORT].VirtualAddress)
{
return pGetPorcAddress;
}
//导出表
PIMAGE_EXPORT_DIRECTORY lpExports = (PIMAGE_EXPORT_DIRECTORY)((unsigned char*)hKernel32 + lpNtHeader->OptionalHeader.DataDirectory[IMAGE_DIRECTORY_ENTRY_EXPORT].VirtualAddress);
//函数名
PDWORD lpdwFunName = (PDWORD)((unsigned char*)hKernel32 + lpExports->AddressOfNames);
//函数序号
PWORD lpdwOrd = (PWORD)((unsigned char*)hKernel32 + lpExports->AddressOfNameOrdinals);
//函数地址
PDWORD lpdwFunAddr = (PDWORD)((unsigned char*)hKernel32 + lpExports->AddressOfFunctions);
for (DWORD dwLoop = 0; dwLoop < lpExports->NumberOfNames; dwLoop++)
{
char* pFunName = (char*)(lpdwFunName[dwLoop] + (unsigned char*)hKernel32);
//GetProcAddress
if (pFunName[0] == 'G' && pFunName[1] == 'e' &&
pFunName[2] == 't' && pFunName[3] == 'P' &&
pFunName[4] == 'r' && pFunName[5] == 'o' &&
pFunName[6] == 'c' && pFunName[7] == 'A' &&
pFunName[8] == 'd' && pFunName[9] == 'd' &&
pFunName[10] == 'r' && pFunName[11] == 'e' &&
pFunName[12] == 's' && pFunName[13] == 's')
{
pGetPorcAddress = (FARPROC)(lpdwFunAddr[lpdwOrd[dwLoop]] + (unsigned char*)hKernel32);
break;
}
}
return pGetPorcAddress;
}
至此,我们已经做完初步解析工作了,剩下的就可以完成我们自己的功能函数了。
功能代码
我们来一个简单的功能,利用shellcode在进程中做一个弹窗,上代码:
#pragma comment(linker,"/entry:ShellCodeEntry")
int ShellCodeEntry()
{
typedef FARPROC(WINAPI* FN_GetProcAddress)(_In_ HMODULE hModule, _In_ LPCSTR lpProcName);
typedef HMODULE(WINAPI* FN_LoadLibraryA)(_In_ LPCSTR lpLibFileName);
FN_GetProcAddress fn_GetProcAddress;
fn_GetProcAddress = (FN_GetProcAddress)GetPorcAddressBaseAddress();
if (!fn_GetProcAddress)
return 0;
FN_LoadLibraryA fn_LoadlibraryA;
//LoadLibraryA
char szLoadLibraryA[] = { 'L','o','a','d','L','i','b','r','a','r','y','A',0 };
HMODULE hKernel32Address = GetKernel32BaseAddress();
fn_LoadlibraryA = (FN_LoadLibraryA)fn_GetProcAddress(hKernel32Address, szLoadLibraryA);
if (!fn_LoadlibraryA)
return 0;
typedef int (WINAPI* FM_MessageBoxA)(__in_opt HWND hWnd, __in_opt LPCSTR lpText, __in_opt LPCSTR lpCaption, __in UINT uType);
char szUser32[] = { 'U','s','e','r','3','2','.','d','l','l',0 };
char szMessageBoxA[] = { 'M','e','s','s','a','g','e','B','o','x','A',0 };
FM_MessageBoxA fn_MessageBoxA = (FM_MessageBoxA)(fn_GetProcAddress(fn_LoadlibraryA(szUser32),szMessageBoxA));
=
if (fn_MessageBoxA)
{
char szText[] = { 'H','e','l','l','o',0 };
fn_MessageBoxA(NULL,szText, 0, 0);
}
}
这里有一点需要特别注意一下:
定义字符串的时候不要使用双引号,如:fn_MessageBoxA(NULL,“Hello”, 0, 0)
如果你定义了双引号的数据,这个双引号的数据就会以常量的方式保存到数据节中,我们使用shellcode的时候这个地址在其他进程中是找不到的,就会导致调用双引号字符串的地方出现未知的错误,无法正确的运行,这里需要特别注意一下。
当然,这里就没有方法解决了吗,是有的。github有一个开源的项目,其中就解决了这个问题,这里附上一份由imbyter改造过的代码:
obf.h
#pragma once
// reference https://github.com/andrivet/ADVobfuscator
#if defined(_MSC_VER)
#define ALWAYS_INLINE __forceinline
#else
#define ALWAYS_INLINE __attribute__((always_inline))
#endif
#include <iomanip>
#include <iostream>
// std::index_sequence will be available with C++14 (C++1y). For the moment, implement a (very) simplified and partial version. You can find more complete versions on the Internet
// MakeIndex<N>::type generates Indexes<0, 1, 2, 3, ..., N>
namespace andrivet {
namespace ADVobfuscator {
template<int... I>
struct Indexes { using type = Indexes<I..., sizeof...(I)>; };
template<int N>
struct Make_Indexes { using type = typename Make_Indexes<N - 1>::type::type; };
template<>
struct Make_Indexes<0> { using type = Indexes<>; };
}
}
// Very simple compile-time random numbers generator.
// For a more complete and sophisticated example, see:
// http://www.researchgate.net/profile/Zalan_Szgyi/publication/259005783_Random_number_generator_for_C_template_metaprograms/file/e0b49529b48272c5a6.pdf
#include <random>
namespace andrivet {
namespace ADVobfuscator {
namespace
{
// I use current (compile time) as a seed
constexpr char time[] = __TIME__; // __TIME__ has the following format: hh:mm:ss in 24-hour time
// Convert time string (hh:mm:ss) into a number
constexpr int DigitToInt(char c) { return c - '0'; }
const int seed = DigitToInt(time[7]) +
DigitToInt(time[6]) * 10 +
DigitToInt(time[4]) * 60 +
DigitToInt(time[3]) * 600 +
DigitToInt(time[1]) * 3600 +
DigitToInt(time[0]) * 36000;
}
// 1988, Stephen Park and Keith Miller
// "Random Number Generators: Good Ones Are Hard To Find", considered as "minimal standard"
// Park-Miller 31 bit pseudo-random number generator, implemented with G. Carta's optimisation:
// with 32-bit math and without division
template<int N>
struct MetaRandomGenerator
{
private:
static constexpr unsigned a = 16807; // 7^5
static constexpr unsigned m = 2147483647; // 2^31 - 1
static constexpr unsigned s = MetaRandomGenerator<N - 1>::value;
static constexpr unsigned lo = a * (s & 0xFFFF); // Multiply lower 16 bits by 16807
static constexpr unsigned hi = a * (s >> 16); // Multiply higher 16 bits by 16807
static constexpr unsigned lo2 = lo + ((hi & 0x7FFF) << 16); // Combine lower 15 bits of hi with lo's upper bits
static constexpr unsigned hi2 = hi >> 15; // Discard lower 15 bits of hi
static constexpr unsigned lo3 = lo2 + hi;
public:
static constexpr unsigned max = m;
static constexpr unsigned value = lo3 > m ? lo3 - m : lo3;
};
template<>
struct MetaRandomGenerator<0>
{
static constexpr unsigned value = seed;
};
// Note: A bias is introduced by the modulo operation.
// However, I do belive it is neglictable in this case (M is far lower than 2^31 - 1)
template<int N, int M>
struct MetaRandom
{
static const int value = MetaRandomGenerator<N + 1>::value % M;
};
}
}
namespace andrivet {
namespace ADVobfuscator {
struct HexChar
{
unsigned char c_;
unsigned width_;
HexChar(unsigned char c, unsigned width) : c_{ c }, width_{ width } {}
};
inline std::ostream& operator<<(std::ostream& o, const HexChar& c)
{
return (o << std::setw(c.width_) << std::setfill('0') << std::hex << (int)c.c_ << std::dec);
}
inline HexChar hex(char c, int w = 2)
{
return HexChar(c, w);
}
}
}
namespace andrivet {
namespace ADVobfuscator {
// Represents an obfuscated string, parametrized with an alrorithm number N, a list of indexes Indexes and a key Key
template<int N, char Key, typename Indexes>
struct MetaString;
// Partial specialization with a list of indexes I, a key K and algorithm N = 0
// Each character is encrypted (XOR) with the same key
template<char K, int... I>
struct MetaString<0, K, Indexes<I...>>
{
// Constructor. Evaluated at compile time.
constexpr ALWAYS_INLINE MetaString(const char* str)
: key_{ K }, buffer_{ encrypt(str[I], K)... } { }
// Runtime decryption. Most of the time, inlined
inline const char* decrypt()
{
for (size_t i = 0; i < sizeof...(I); ++i)
buffer_[i] = decrypt(buffer_[i]);
buffer_[sizeof...(I)] = 0;
//LOG("--- Implementation #" << 0 << " with key 0x" << hex(key_));
return const_cast<const char*>(buffer_);
}
private:
// Encrypt / decrypt a character of the original string with the key
constexpr char key() const { return key_; }
constexpr char ALWAYS_INLINE encrypt(char c, int k) const { return c ^ k; }
constexpr char decrypt(char c) const { return encrypt(c, key()); }
volatile int key_; // key. "volatile" is important to avoid uncontrolled over-optimization by the compiler
volatile char buffer_[sizeof...(I)+1]; // Buffer to store the encrypted string + terminating null byte
};
// Partial specialization with a list of indexes I, a key K and algorithm N = 1
// Each character is encrypted (XOR) with an incremented key.
template<char K, int... I>
struct MetaString<1, K, Indexes<I...>>
{
// Constructor. Evaluated at compile time.
constexpr ALWAYS_INLINE MetaString(const char* str)
: key_(K), buffer_{ encrypt(str[I], I)... } { }
// Runtime decryption. Most of the time, inlined
inline const char* decrypt()
{
for (size_t i = 0; i < sizeof...(I); ++i)
buffer_[i] = decrypt(buffer_[i], i);
buffer_[sizeof...(I)] = 0;
//LOG("--- Implementation #" << 1 << " with key 0x" << hex(key_));
return const_cast<const char*>(buffer_);
}
private:
// Encrypt / decrypt a character of the original string with the key
constexpr char key(size_t position) const { return static_cast<char>(key_ + position); }
constexpr char ALWAYS_INLINE encrypt(char c, size_t position) const { return c ^ key(position); }
constexpr char decrypt(char c, size_t position) const { return encrypt(c, position); }
volatile int key_; // key. "volatile" is important to avoid uncontrolled over-optimization by the compiler
volatile char buffer_[sizeof...(I)+1]; // Buffer to store the encrypted string + terminating null byte
};
// Partial specialization with a list of indexes I, a key K and algorithm N = 2
// Shift the value of each character and does not store the key. It is only used at compile-time.
template<char K, int... I>
struct MetaString<2, K, Indexes<I...>>
{
// Constructor. Evaluated at compile time. Key is *not* stored
constexpr ALWAYS_INLINE MetaString(const char* str)
: buffer_{ encrypt(str[I])..., 0 } { }
// Runtime decryption. Most of the time, inlined
inline const char* decrypt()
{
for (size_t i = 0; i < sizeof...(I); ++i)
buffer_[i] = decrypt(buffer_[i]);
//LOG("--- Implementation #" << 2 << " with key 0x" << hex(K));
return const_cast<const char*>(buffer_);
}
private:
// Encrypt / decrypt a character of the original string with the key
// Be sure that the encryption key is never 0.
constexpr char key(char key) const { return 1 + (key % 13); }
constexpr char ALWAYS_INLINE encrypt(char c) const { return c + key(K); }
constexpr char decrypt(char c) const { return c - key(K); }
// Buffer to store the encrypted string + terminating null byte. Key is not stored
volatile char buffer_[sizeof...(I)+1];
};
// Helper to generate a key
template<int N>
struct MetaRandomChar
{
// Use 0x7F as maximum value since most of the time, char is signed (we have however 1 bit less of randomness)
static const char value = static_cast<char>(1 + MetaRandom<N, 0x7F - 1>::value);
};
}
}
// Prefix notation
//#define DEF_OBFUSCATED(str) andrivet::ADVobfuscator::MetaString<andrivet::ADVobfuscator::MetaRandom<__COUNTER__, 3>::value, andrivet::ADVobfuscator::MetaRandomChar<__COUNTER__>::value, andrivet::ADVobfuscator::Make_Indexes<sizeof(str) - 1>::type>(str)
//#define OBFUSCATED(str) (DEF_OBFUSCATED(str).decrypt())
#define DEF(str) andrivet::ADVobfuscator::MetaString<andrivet::ADVobfuscator::MetaRandom<__COUNTER__, 3>::value, andrivet::ADVobfuscator::MetaRandomChar<__COUNTER__>::value, andrivet::ADVobfuscator::Make_Indexes<sizeof(str) - 1>::type>(str)
#define O(str) (DEF(str).decrypt())
使用的时候我们仅需要包含这个头文件,然后使用的时候如下:fn_MessageBoxA(NULL, O(“Hello”), 0, 0)
这样我们就不需要定义字符串的时候单引号一个字节一个字节的定义字符串了。
OK,到这里,我们就写完了一个完整的shellcoed代码。
你以为这样就结束了吗?NO,这只是简单写完了一个符合shellcode代码规范的PE文件,它还不是一个shellcode,就算把生成的这个PE文件放到进程中它也是一个错误的,我们需要对它提取shellcode代码,然后放到进程中才可以无缝执行,具体的方式我么放到下一节中。