티스토리 뷰
Contents
- Introduction
- Windows Hooks
- The CreateRemoteThread & LoadLibrary Technique
- The CreateRemoteThread & WriteProcessMemory Technique
- Some Final Words
- Appendixes
- References
- Article History
Introduction
Several password spy tutorials have been posted to The Code Project,but all of them rely on Windows hooks. Is there any other way to makesuch a utility? Yes, there is. But first, let me review the problembriefly, just to make sure we're all on the same page.
To "read" the contents of any control - either belonging to your application or not - you generally send the WM_GETTEXT
message to it. This also applies to edit controls, except in onespecial case. If the edit control belongs to another process and the ES_PASSWORD
style is set, this approach fails. Only the process that "owns" the password control can get its contents via WM_GETTEXT
. So, our problem reduces to the following: How to get
::SendMessage( hPwdEdit, WM_GETTEXT, nMaxChars, psBuffer );
executed in the address space of another process.
In general, there are three possibilities to solve this problem:
- Put your code into a DLL; then, map the DLL to the remote process via windows hooks.
- Put your code into a DLL and map the DLL to the remote process using the CreateRemoteThread & LoadLibrary technique.
- Instead of writing a separate DLL, copy your code to the remote process directly - via
WriteProcessMemory
- and start its execution withCreateRemoteThread
. A detailed description of this technique can be found here.
I. Windows Hooks
Demo applications: HookSpy and HookInjEx
The primary role of windows hooks is to monitor the message traffic of some thread. In general there are:
- Local hooks, where you monitor the message traffic of any thread belonging to your process.
- Remote hooks, which can be:
- thread-specific, to monitor the message traffic of a thread belonging to another process;
- system-wide, to monitor the message traffic for all threads currently running on the system.
If the hooked thread belongs to another process (cases 2a & 2b),your hook procedure must reside in a dynamic-link library (DLL). Thesystem then maps the DLL containing the hook procedure into the addressspace of the hooked thread. Windows will map the entire DLL, not justthe hook procedure. That is why Windows hooks can be used to injectcode into another process's address space.
While I won't discuss hooks in this article further (take a look at the SetWindowHookEx
API in MSDN for more details), let me give you two more hints that you won't find in the documentation, but might still be useful:
- After a successful call to
SetWindowsHookEx
, thesystem maps the DLL into the address space of the hooked threadautomatically, but not necessary immediately. Because windows hooks areall about messages, the DLL isn't really mapped until an adequate eventhappens. For example:If you install a hook that monitors all nonqueued messages of some thread (
WH_CALLWNDPROC
),the DLL won't be mapped into the remote process until a message isactually sent to (some window of) the hooked thread. In other words, ifUnhookWindowsHookEx
is called before a message was sent tothe hooked thread, the DLL will never be mapped into the remote process(although the call toSetWindowsHookEx
itself succeeded). To force an immediate mapping, send an appropriate event to the concerned thread right after the call toSetWindowsHookEx
.The same is true for unmapping the DLL after calling
UnhookWindowsHookEx
. The DLL isn't really unmapped until an adequate event happens.
- When you install hooks, they can affect the overall systemperformance (especially system-wide hooks). However, you can easilyovercome this shortcoming if you use thread-specific hooks solely as aDLL mapping mechanism, and not to trap messages. Consider the followingcode snippet:
BOOL APIENTRY DllMain( HANDLE hModule,
DWORD ul_reason_for_call,
LPVOID lpReserved )
{
if( ul_reason_for_call == DLL_PROCESS_ATTACH )
{
// Increase reference count via LoadLibrary
char lib_name[MAX_PATH];
::GetModuleFileName( hModule, lib_name, MAX_PATH );
::LoadLibrary( lib_name );
// Safely remove hook
::UnhookWindowsHookEx( g_hHook );
}
return TRUE;
}
So, what happens? First we map the DLL to the remote process viaWindows hooks. Then, right after the DLL has actually been mapped, weunhook it. Normally, the DLL would be unmapped now, too, as soon as thefirst message to the hooked thread would arrive. The dodgy thing is weprevent this unmapping by increasing the DLLs reference count via
LoadLibrary
.The question that remains is: How to unload the DLL now, once we are finished?
UnhookWindowsHookEx
won't do it because we unhooked the thread already. You could do it this way:- Install another hook, just before you want to unmap the DLL;
- Send a "special" message to the remote thread;
- Catch this message in your hook procedure; in response, call
FreeLibrary & UnhookWindowsHookEx
.
Now, hooks are used only while mapping/unmapping the DLL to/from theremote process; there is no influence on the performance of the"hooked" thread in the meantime. Put anohter way: We get a DLL mappingmechanism that doesn't interfere the target process more than the
LoadLibrary
technique discussed below does (see Section II.). However, opposed to theLoadLibrary
technique, this solution works on both: WinNT and Win9x.But, when should one use this trick? Always when the DLL has to bepresent in the remote process for a longer period of time (i.e. if yousubclass a control belonging to another process) and you want tointerfere the target process as little as possible. I didn't use it inHookSpy because the DLL there is injected just for a moment - just longenough to get the password. I rather provided another example -HookInjEx - to demonstrate it. HookInjEx maps/unmaps a DLL into"explorer.exe", where it subclasses the Start button. More precisely:It swaps the left and right mouse clicks for the Start button.
You will find HookSpy and HookInjEx as well as their sources in the download package at the beginning of the article.
II. The CreateRemoteThread & LoadLibrary Technique
Demo application: LibSpy
In general, any process can load a DLL dynamically by using the LoadLibrary
API. But, how do we force an external process to call this function? The answer is CreateRemoteThread
.
Let's take a look at the declaration of the LoadLibrary
and FreeLibrary
APIs first:
HINSTANCE LoadLibrary(
LPCTSTR lpLibFileName // address of filename of library module
);
BOOL FreeLibrary(
HMODULE hLibModule // handle to loaded library module
);
Now, compare them with the declaration of ThreadProc
- the thread routine - passed to CreateRemoteThread
:
DWORD WINAPI ThreadProc(
LPVOID lpParameter // thread data
);
As you can see, all functions use the same calling convention andall accept a 32-bit parameter. Also, the size of the returned value isthe same. In other words: We may pass a pointer to LoadLibrary/FreeLibrary
as the thread routine to CreateRemoteThread
.
However, there are two problems (see the description for CreateRemoteThread
below):
- The
lpStartAddress
parameter inCreateRemoteThread
must represent the starting address of the thread routine in the remote process. - If
lpParameter
- the parameter passed toThreadFunc
- is interpreted as an ordinary 32-bit value (FreeLibrary
interprets it as anHMODULE
), everything is fine. However, iflpParameter
is interpreted as a pointer (LoadLibraryA
interprets it as a pointer to achar
string), it must point to some data in the remote process.
The first problem is actually solved by itself. Both LoadLibrary
and FreeLibray
are functions residing in kernel32.dll. Because kernel32 is guaranteed to be present and at the same load address in every "normal" process (see Appendix A), the address of LoadLibrary/FreeLibray
is the same in every process too. This ensures that a valid pointer is passed to the remote process.
The second problem is also easy to solve: Simply copy the DLL module name (needed by LoadLibrary
) to the remote process via WriteProcessMemory
.
So, to use the CreateRemoteThread
& LoadLibrary technique
, follow these steps:
- Retrieve a
HANDLE
to the remote process (OpenProcess
). - Allocate memory for the DLL name in the remote process (
VirtualAllocEx
). - Write the DLL name, including full path, to the allocated memory (
WriteProcessMemory
). - Map your DLL to the remote process via
CreateRemoteThread & LoadLibrary
. - Wait until the remote thread terminates (
WaitForSingleObject
); this is until the call toLoadLibrary
returns. Put another way, the thread will terminate as soon as ourDllMain
(called with reasonDLL_PROCESS_ATTACH
) returns. - Retrieve the exit code of the remote thread (
GetExitCodeThread
). Note that this is the value returned byLoadLibrary
, thus the base address (HMODULE
) of our mapped DLL. - Free the memory allocated in Step #2 (
VirtualFreeEx
). - Unload the DLL from the remote process via
CreateRemoteThread & FreeLibrary
. Pass theHMODULE
handle retreived in Step #6 toFreeLibrary
(vialpParameter
inCreateRemoteThread
).
Note: If your injected DLL spawns any new threads, be sure they are all terminated before unloading it. - Wait until the thread terminates (
WaitForSingleObject
).
Also, don't forget to close all the handles once you are finished:To both threads, created in Steps #4 and #8; and the handle to theremote process, retrieved in Step #1.
Let's examine some parts of LibSpy's sources now, to see how theabove steps are implemented in reality. For the sake of simplicity,error handling and unicode support are removed.
HANDLE hThread;
char szLibPath[_MAX_PATH]; // The name of our "LibSpy.dll" module
// (including full path!);
void* pLibRemote; // The address (in the remote process) where
// szLibPath will be copied to;
DWORD hLibModule; // Base address of loaded module (==HMODULE);
HMODULE hKernel32 = ::GetModuleHandle("Kernel32");
// initialize szLibPath
//...
// 1. Allocate memory in the remote process for szLibPath
// 2. Write szLibPath to the allocated memory
pLibRemote = ::VirtualAllocEx( hProcess, NULL, sizeof(szLibPath),
MEM_COMMIT, PAGE_READWRITE );
::WriteProcessMemory( hProcess, pLibRemote, (void*)szLibPath,
sizeof(szLibPath), NULL );
// Load "LibSpy.dll" into the remote process
// (via CreateRemoteThread & LoadLibrary)
hThread = ::CreateRemoteThread( hProcess, NULL, 0,
(LPTHREAD_START_ROUTINE) ::GetProcAddress( hKernel32,
"LoadLibraryA" ),
pLibRemote, 0, NULL );
::WaitForSingleObject( hThread, INFINITE );
// Get handle of the loaded module
::GetExitCodeThread( hThread, &hLibModule );
// Clean up
::CloseHandle( hThread );
::VirtualFreeEx( hProcess, pLibRemote, sizeof(szLibPath), MEM_RELEASE );
Assume our SendMessage
- the code that we actually wanted to inject - was placed in DllMain
(DLL_PROCESS_ATTACH
), so it has already been executed by now. Then, it is time to unload the DLL from the target process:
// Unload "LibSpy.dll" from the target process
// (via CreateRemoteThread & FreeLibrary)
hThread = ::CreateRemoteThread( hProcess, NULL, 0,
(LPTHREAD_START_ROUTINE) ::GetProcAddress( hKernel32,
"FreeLibrary" ),
(void*)hLibModule, 0, NULL );
::WaitForSingleObject( hThread, INFINITE );
// Clean up
::CloseHandle( hThread );
Interprocess Communications
Until now, we only talked about how to inject the DLL to the remoteprocess. However, in most situations the injected DLL will need tocommunicate with your original application in some way (recall that theDLL is mapped into some remote process now, not to our localapplication!). Take our Password Spy: The DLL has to know the handle tothe control that actually contains the password. Obviously, this valuecan't be hardcoded into it at compile time. Similarly, once the DLLgets the password, it has to send it back to our application so we candisplay it appropriately.
Fortunately, there are many ways to deal with this situation: File Mapping, WM_COPYDATA
, the Clipboard, and the sometimes very handy #pragma data_seg
,to name just a few. I won't describe these techniques here because theyare all well documented either in MSDN (see InterprocessCommunications) or in other tutorials. Anyway, I used solely the #pragma data_seg
in the LibSpy example.
You will find LibSpy and its sources in the download package at the beginning of the article.
III. The CreateRemoteThread & WriteProcessMemory Technique
Demo application: WinSpy
Another way to copy some code to another process's address space andthen execute it in the context of this process involves the use ofremote threads and the WriteProcessMemory
API. Instead of writing a separate DLL, you copy the code to the remote process directly now - via WriteProcessMemory
- and start its execution with CreateRemoteThread
.
Let's take a look at the declaration of CreateRemoteThread
first:
HANDLE CreateRemoteThread(
HANDLE hProcess, // handle to process to create thread in
LPSECURITY_ATTRIBUTES lpThreadAttributes, // pointer to security
// attributes
DWORD dwStackSize, // initial thread stack size, in bytes
LPTHREAD_START_ROUTINE lpStartAddress, // pointer to thread
// function
LPVOID lpParameter, // argument for new thread
DWORD dwCreationFlags, // creation flags
LPDWORD lpThreadId // pointer to returned thread identifier
);
If you compare it to the declaration of CreateThread
(MSDN), you will notice the following differences:
- The
hProcess
parameter is additional inCreateRemoteThread
. It is the handle to the process in which the thread is to be created. - The
lpStartAddress
parameter inCreateRemoteThread
represents the starting address of the thread in the remote processesaddress space. The function must exist in the remote process, so wecan't simply pass a pointer to the localThreadFunc
. We have to copy the code to the remote process first. - Similarly, the data pointed to by
lpParameter
must exist in the remote process, so we have to copy it there, too.
Now, we can summarize this technique in the following steps:
- Retrieve a
HANDLE
to the remote process (OpenProces
). - Allocate memory in the remote process's address space for injected data (
VirtualAllocEx
). - Write a copy of the initialised
INJDATA
structure to the allocated memory (WriteProcessMemory
). - Allocate memory in the remote process's address space for injected code.
- Write a copy of
ThreadFunc
to the allocated memory. - Start the remote copy of
ThreadFunc
viaCreateRemoteThread
. - Wait until the remote thread terminates (
WaitForSingleObject
). - Retrieve the result from the remote process (
ReadProcessMemory
orGetExitCodeThread
). - Free the memory allocated in Steps #2 and #4 (
VirtualFreeEx
). - Close the handles retrieved in Steps #6 and #1 (
CloseHandle
).
Additional caveats that ThreadFunc
has to obey:
ThreadFunc
should not call any functions besides those in kernel32.dll and user32.dll; only kernel32 and user32 are, if present (note that user32isn't mapped into every Win32 process!), guaranteed to be at the sameload address in both the local and the target process (see Appendix A). If you need functions from other libraries, pass the addresses ofLoadLibrary
andGetProcAddress
to the injected code, and let it go and get the rest itself. You could also useGetModuleHandle
instead ofLoadLibrary
, if for one or another reason the debatable DLL is already mapped into the target process.
Similarly, if you want to call your own subroutines from withinThreadFunc
, copy each routine to the remote process individually and supply their addresses toThreadFunc
viaINJDATA
.- Don't use any static strings. Rather pass all strings to
ThreadFunc
viaINJDATA
.
Why?The compiler puts all static strings into the ".data" section of anexecutable and only references (=pointers) remain in the code. Then,the copy ofThreadFunc
in the remote process would point to something that doesn't exist (at least not in its address space). - Remove the /GZ compiler switch; it is set by default in debug builds (see Appendix B).
- Either declare
ThreadFunc
andAfterThreadFunc
asstatic
or disable incremental linking (see Appendix C). - There must be less than a page-worth (4 Kb) of local variables in
ThreadFunc
(see Appendix D). Note that in debug builds some 10 bytes of the available 4 Kb are used for internal variables. If you have a
switch
block with more than threecase
statements, either split it up like this:switch( expression ) {
case constant1: statement1; goto END;
case constant2: statement2; goto END;
case constant3: statement2; goto END;
}
switch( expression ) {
case constant4: statement4; goto END;
case constant5: statement5; goto END;
case constant6: statement6; goto END;
}
END:
or modify it into an
if-else if
sequence (see Appendix E).
You will almost certainly crash the target process if you don't playby those rules. Just remember: Don't assume anything in the targetprocess is at the same address as it is in your process (see Appendix F).
GetWindowTextRemote(A/W)
All the functionality you need to get the password from a "remote" edit control is encapsulated in GetWindowTextRemot(A/W)
:
int GetWindowTextRemoteA( HANDLE hProcess, HWND hWnd, LPSTR lpString );
int GetWindowTextRemoteW( HANDLE hProcess, HWND hWnd, LPWSTR lpString );
Parameters
hProcess
- Handle to the process the edit control belongs to.
hWnd
- Handle to the edit control containing the password.
lpString
- Pointer to the buffer that is to receive the text.
Return Value
The return value is the number of characters copied.
Let's examine some parts of its sources now - especially the injected data and code - to see how GetWindowTextRemote
works. Again, unicode support is removed for the sake of simplicity.
INJDATA
typedef LRESULT (WINAPI *SENDMESSAGE)(HWND,UINT,WPARAM,LPARAM);
typedef struct {
HWND hwnd; // handle to edit control
SENDMESSAGE fnSendMessage; // pointer to user32!SendMessageA
char psText[128]; // buffer that is to receive the password
} INJDATA;
INJDATA
is the data structure being injected into the remote process. However, before doing so the structure's pointer to SendMessageA
is initialised in our application. The dodgy thing here is that user32.dll is (if present!) always mapped to the same address in every process; thus, the address of SendMessageA
is always the same, too. This ensures that a valid pointer is passed to the remote process.
ThreadFunc
static DWORD WINAPI ThreadFunc (INJDATA *pData)
{
pData->fnSendMessage( pData->hwnd, WM_GETTEXT, // Get password
sizeof(pData->psText),
(LPARAM)pData->psText );
return 0;
}
// This function marks the memory address after ThreadFunc.
// int cbCodeSize = (PBYTE) AfterThreadFunc - (PBYTE) ThreadFunc.
static void AfterThreadFunc (void)
{
}
ThradFunc
is the code executed by the remote thread. Point of interest:
- Note how
AfterThreadFunc
is used to calculate the code size ofThreadFunc
. In general this isn't the best idea, because the linker is free to change order of your functions (i.e. it could placeThreadFunc
behindAfterThreadFunc
).However, you can be pretty sure that in small projects, like our WinSpyis, the order of your functions will be preserved. If necessary, youcould also use the /ORDER linker option to help you out; or yet better:Determine the size ofThreadFunc
with a dissasembler.
How to Subclass a Remote Control With this Technique
Demo application: InjectEx
Let's explain something more complicated now: How to subclass a control belonging to another process with this technique?
First of all, note that you have to copy two functions to the remote process to accomplish this task:
ThreadFunc
, which actually subclasses the control in the remote process viaSetWindowLong
, andNewProc
, the new window procedure of the subclassed control.
However, the main problem is how to pass data to the remote NewProc
. Because NewProc
is a callback function and thus has to conform to specific guidelines, we can't simply pass a pointer to INJDATA
to it as an argument. Fortunately, there are other ways to solve thisproblem (I found two), but all rely on the assembly language. So, whenI tried to preserve the assembly for the appendixes until now, it won'tgo without it this time.
Solution 1
Observe the following picture:
Note that INJDATA
is placed immediately before NewProc
in the remote process? This way NewProc
knows the memory location of INJDATA
in the remote processes address space at compile time. More precisely: It knows the address of INJDATA
relative to its own location, but that's actually all we need. Now NewProc
might look like this:
static LRESULT CALLBACK NewProc(
HWND hwnd, // handle to window
UINT uMsg, // message identifier
WPARAM wParam, // first message parameter
LPARAM lParam ) // second message parameter
{
INJDATA* pData = (INJDATA*) NewProc; // pData points to
// NewProc;
pData--; // now pData points to INJDATA;
// recall that INJDATA in the remote
// process is immediately before NewProc;
//-----------------------------
// subclassing code goes here
// ........
//-----------------------------
// call original window procedure;
// fnOldProc (returned by SetWindowLong) was initialised
// by (the remote) ThreadFunc and stored in (the remote) INJDATA;
return pData->fnCallWindowProc( pData->fnOldProc,
hwnd,uMsg,wParam,lParam );
}
However, there is still a problem. Observe the first line:
INJDATA* pData = (INJDATA*) NewProc;
This way, a hardcoded value (the memory location of the original NewProc
in our process) will be arranged to pData
. That is not quite what we want: The memory location of the "current" copy of NewProc
in the remote process, regardless of to what location it is (NewProc
) actually moved. In other words, we would need some kind of a "this pointer."
While there is no way to solve this in C/C++, it can be done with inline assembly. Consider the modified NewProc
:
static LRESULT CALLBACK NewProc(
HWND hwnd, // handle to window
UINT uMsg, // message identifier
WPARAM wParam, // first message parameter
LPARAM lParam ) // second message parameter
{
// calculate location of the INJDATA struct;
// remember that INJDATA in the remote process
// is placed right before NewProc;
INJDATA* pData;
_asm {
call dummy
dummy:
pop ecx // <- ECX contains the current EIP
sub ecx, 9 // <- ECX contains the address of NewProc
mov pData, ecx
}
pData--;
//-----------------------------
// subclassing code goes here
// ........
//-----------------------------
// call original window procedure
return pData->fnCallWindowProc( pData->fnOldProc,
hwnd,uMsg,wParam,lParam );
}
So, what's going on? Virtually every processor has a specialregister that points to the memory location of the next instruction tobe executed. That's the so-called instruction pointer, denoted EIP on32-bit Intel and AMD processors. Because EIP is a special-purposeregister, you can't access it programmatically as you can generalpurpose registers (EAX, EBX, etc). Put another way: There is no OpCode,with which you could address EIP and read or change its contentsexplicitly. However, EIP can still be changed (and is changed all thetime) implicitly, by instructions such as JMP
, CALL
and RET
. Let's, for example, explain how the subroutine CALL/RET
mechanism works on 32-bit Intel and AMD processors:
When you call a subroutine (viaCALL
), theaddress of the subroutine is loaded into EIP. But, even before EIP ismodified, its old value is automatically pushed onto the stack (for uselater as a return instruction-pointer). At the end of a subroutine, theRET
instruction automatically pops the top of the stack into EIP.
Now you know how EIP is modified via CALL
and RET
, but how to get its current value?
Well, remember that CALL
pushes EIP onto the stack? So, in order to get its current value call a"dummy function" and pop the stack right thereafter. Let's explain thewhole trick at our compiled NewProc
:
Address OpCode/Params Decoded instruction
--------------------------------------------------
:00401000 55 push ebp ; entry point of
; NewProc
:00401001 8BEC mov ebp, esp
:00401003 51 push ecx
:00401004 E800000000 call 00401009 ; *a* call dummy
:00401009 59 pop ecx ; *b*
:0040100A 83E909 sub ecx, 00000009 ; *c*
:0040100D 894DFC mov [ebp-04], ecx ; mov pData, ECX
:00401010 8B45FC mov eax, [ebp-04]
:00401013 83E814 sub eax, 00000014 ; pData--;
.....
.....
:0040102D 8BE5 mov esp, ebp
:0040102F 5D pop ebp
:00401030 C21000 ret 0010
- A dummy function call; it just jumps to the next instruction and pushes EIP onto the stack.
- Pop the stack into ECX. ECX then holds EIP; this is exactly the address of the
"pop ECX"
instruction as well. - Note that the "distance" between the entry point of
NewProc
and the"pop ECX"
instruction is 9 bytes; thus, to calculate the address ofNewProc
, subtract 9 from ECX.
This way, NewProc
can always calculate its own address,regardless of to what location it is actually moved! However, be awarethat the distance between the entry point of NewProc
and the "pop ECX"
instruction might change as you change your compiler/linker options,and is thus different in release and debug builds, too. But, the pointis that you still know the exact value at compile time:
- First, compile your function.
- Determine the correct distance with a disassembler.
- Finally, recompile with the correct distance.
That's the solution used in InjectEx. InjectEx, similarly asHookInjEx, swaps the left and right mouse clicks for the Start button.
Solution 2
Placing INJDATA
right before NewProc
in the remote processes address space isn't the only way to solve our problem. Consider the following variant of NewProc
:
static LRESULT CALLBACK NewProc(
HWND hwnd, // handle to window
UINT uMsg, // message identifier
WPARAM wParam, // first message parameter
LPARAM lParam ) // second message parameter
{
INJDATA* pData = 0xA0B0C0D0; // a dummy value
//-----------------------------
// subclassing code goes here
// ........
//-----------------------------
// call original window procedure
return pData->fnCallWindowProc( pData->fnOldProc,
hwnd,uMsg,wParam,lParam );
}
Here, 0xA0B0C0D0
is just a placeholder for the real (absolute!) address of INJDATA
in the remote processes address space. Recall that you can't know thisaddress at compile time. However, you do know the location of INJDATA
in the remote process right after the call to VirtualAllocEx
(for INJDATA
) is made.
Our NewProc
could compile into something like this:
Address OpCode/Params Decoded instruction
--------------------------------------------------
:00401000 55 push ebp
:00401001 8BEC mov ebp, esp
:00401003 C745FCD0C0B0A0 mov [ebp-04], A0B0C0D0
:0040100A ...
....
....
:0040102D 8BE5 mov esp, ebp
:0040102F 5D pop ebp
:00401030 C21000 ret 0010
Thus, its compiled code (in hexadecimal) would be: 558BECC745FCD0C0B0A0......8BE55DC21000
.
Now, you would proceed as follows:
- Copy
INJDATA
,ThreadFunc
andNewProc
to the target process. - Change the code of
NewProc
, so thatpData
holds the real address ofINJDATA
.
For example, let's say the address ofINJDATA
(the value returned byVirtualAllocEx
) in the target process is0x008a0000
. Then you modify the code ofNewProc
as follows:
558BECC745FCD0C0B0A0......8BE55DC21000
<-
originalNewProc
1558BECC745FC00008A00......8BE55DC21000
<-
modifiedNewProc
with real address ofINJDATA
Put another way: You replace the dummy valueA0B0C0D0
with the real address ofINJDATA
. 2 - Start execution of the remote
ThreadFunc
, which in turn subclasses the control in the remote process.
A0B0C0D0
and 008a0000
in the compiled code appear in reverse order. It's because Intel andAMD processors use the little-endian notation for to represent their(multi-byte) data. In other words: The low-order byte of a number isstored in memory at the lowest address, and the high-order byte at thehighest address.Imagine the word UNIX stored in four bytes. Inbig-endian systems, it would be stored as UNIX. In little-endiansystems, it would be stored as XINU.
² Some (bad) cracks modify the code of an executable in asimilar way. However, once loaded into memory, a program can't changeits own code (the code resides in the ".text" section of an executable,which is write protected). Still we could modify our remote NewProc
, because it was previously copied to a peace of memory with PAGE_EXECUTE_READWRITE
permission.
When to use the CreateRemoteThread & WriteProcessMemory technique
The CreateRemoteThread & WriteProcessMemory technique of codeinjection is, when compared to the other methods, more flexible in thatyou don't need an additional DLL. Unfortunately, it is also morecomplicated and riskier than the other methods. You can (and mostprobably will) easily crash the remote process, as soon as something iswrong with your ThreadFunc
(see Appendix F). Because debugging a remote ThreadFunc
can also be a nightmare, you should use this technique only wheninjecting at most a few instructions. To inject a larger peace of code,use one of the methods discussed in Sections II and I.
Again, WinSpy and InjectEx, as well as their sources, can be found in the download package at the beginning of the article.
Some Final Words
At the end, let's summarize some facts we didn't mention so far:
OS | Processes | |
I. Hooks | Win9x and WinNT | only processes that link with USER32.DLL1 |
II. CreateRemoteThread & LoadLibrary | WinNT only2 | all processes3, including system services4 |
III. CreateRemoteThread & WriteProcessMemory | WinNT only | all processes, including system services |
- Obviously you can't hook a thread that has no message queue. Also,
SetWindowsHookEx
wont work with system services, even if they link against USER32.DLL. - There is no
CreateRemoteThread
norVirtualAllocEx
on Win9x. (Actually, they can be emulated on Win9x, too; but that's a story for yet another day.) - All processes = All Win32 processes + csrss.exe
Native applications (smss.exe, os2ss.exe, autochk.exe, etc) don't use Win32 APIs, and thus don't link against kernel32.dlleither. The only exception is csrss.exe, the Win32 subsystem itself.It's a native application but some of its libraries (~winsrv.dll)require Win32 DLLs, including kernel32.dll. - If you want to inject code into system services (lsass.exe,services.exe, winlogon.exe, and so on) or into csrss.exe, set theprivileges of your process to "SeDebugPrivilege" (
AdjustTokenPrivileges
) before opening a handle to the remote process (OpenProcess
).
That's almost it. There is just one more thing that you should bearin mind: Your injected code can, especially if something is wrong withit, easily pull the target process down to oblivion with it. Justremember: Power comes with responsibility!
Because many examples in this article were about passwords, you might find it interesting to read the article Super Password Spy++,written by Zhefu Zhang, too. There he explains how to get the passwordsout of an Internet Explorer password field. More. He even shows you howto protect your password controls against such attacks.
Last note: The only reward someone gets for writing andpublishing an article is the feedback he gets, so, if you found ituseful simply drop in a comment or vote for it (
Acknowledgments
First, thanks to my readers at CodeGuru, where this *text* wasinitially published. It is mainly because of your questions, that thearticle grew from its initial 1200 words to what it is today: An 6000word "animal." However, if there is someone that especially deserves tobe singled out, then it is Rado Picha. Parts of the article greatlybenefited from his suggestions and explanations to me. Last, but notleast, thanks to Susan Moore for helping me through that minefieldcalled the English language, and making my article more readable.
Appendices
A) Why are kernel32.dll and user32.dll always mapped to the same address?
- My presumption: Because Microsoft programmers thought that it could be a useful speed optimization. Let's explain why.
In general, an executable is composed of several sections, including a ".reloc" section.
When the linker creates an EXE or DLL file, it makes an assumptionabout where the file will be mapped into memory. That's the so-calledassumed/preferred load/base address. All the absolute addresses in theimage are based on this linker assumed load address. If for whateverreason the image isn't loaded at this address, the PE - portableexecutable - loader has to fix all the absolute addresses in the image.That is where the ".reloc" section comes in: It contains a list of allthe places in the image, where the difference between the linkerassumed load address and the actual load address needs to be factoredin (anyway, note that most of the instructions produced by the compileruse some kind of relative addressing; as a result, there are not asmany relocations as you might think). If, on the other side, the loaderis able to load the image at the linkers preferred base address, the".reloc" section is completely ignored.
But, how do kernel32.dll, user32.dll and their load addresses fit into the story? Because every Win32 application needs kernel32.dll, and most of them need user32.dll, too, you can improve the load time of all executables by always mapping them (kernel32 and user32) to their preferred bases. Then the loader must never fix any (absolute) addresses in kernel32.dll and user32.dll.Let's close out this discussion with the following example:
Set the image base of some App.exe to KERNEL32's ( /base:"0x77e80000"
) or to USER32's (/base:"0x77e10000"
) preferred base. If App.exe doesn't import from USER32, justLoadLibrary
it. Then compile App.exe and try to run it. An error box pops up ("Illegal System DLL Relocation") and App.exe fails to load.Why? When creating a process, the loader on Win 2000, Win XP and Win 2003 checks if kernel32.dll and user32.dll (their names are hardcoded into the loader) are mapped at their preferred bases; if not, a hard error is raised. In WinNT 4 ole32.dll was also checked. In WinNT 3.51 and lower such checks were not present, so kernel32.dll and user32.dll could be anywhere. Anyway, the only module that is always at its base is ntdll.dll. The loader doesn't check it, but if ntdll.dll is not at its base, the process just can't be created.
To summarize, on WinNT 4 and higher:
- DLLs, that are always mapped to their bases: kernel32.dll, user32.dll and ntdll.dll.
- DLLs that are present in every Win32 application (+ csrss.exe): kernel32.dll and ntdll.dll.
- The only DLL that is present in every process, even in native applications: ntdll.dll.
B) The /GZ compiler switch
- In Debug builds, the /GZ compiler feature is turned on bydefault. You can use it to catch some errors (see the documentation fordetails). But what does it mean to our executable?
When /GZ is turned on, the compiler will add some additional codeto every function residing in the executable, including a function call(added at the very end of every function) that verifies the ESP stackpointer hasn't changed through our function. But wait, a function callis added to
ThreadFunc
? That's the road to disaster. Now the remote copy ofThreadFunc
will call a function that doesn't exist in the remote process (at least not at the same address). C) Static functions Vs. Incremental linking
- Incremental linking is used to shorten the linking time whenbuilding your applications. The difference between normally andincrementally linked executables is that in incrementally linked oneseach function call goes through an extra
JMP
instruction emitted by the linker (an exception to this rule are functions declared as static!). TheseJMP
s allow the linker to move the functions around in memory without updating all theCALL
instructions that reference the function. But it's exactly thisJMP
that causes problems too: nowThreadFunc
andAfterThreadFunc
will point to theJMP
instructions instead to the real code. So, when calculating the size ofThreadFunc
this way:const int cbCodeSize = ((LPBYTE) AfterThreadFunc - (LPBYTE) ThreadFunc);
you will actually calculate the "distance" between theJMP
s that point toThreadFunc
andAfterThreadFunc
respectively (usually they will appear one right after the other; but don't count on this). Now suppose ourThreadFunc
is at address004014C0
and the accompanyingJMP
instruction at00401020
.:00401020 jmp 004014C0
...
:004014C0 push EBP ; real address of ThreadFunc
:004014C1 mov EBP, ESP
...
Then
WriteProcessMemory( .., &ThreadFunc, cbCodeSize, ..);
will copy the"JMP 004014C0"
instruction (and all instructions in the range ofcbCodeSize
that follow it) to the remote process - not the realThreadFunc
. The first thing the remote thread will execute will be a"JMP 004014C0"
. Well, it will also be among its last instructions - not only to the remote thread, but to the whole process.However, there is an exception to this
JMP
instruction "rule." If a function is declared asstatic
, it will be called directly, even if linked incrementally. That's why Rule #4 says to declareThreadFunc
andAfterThreadFunc
asstatic
or disable incremental linking. (Some other aspects of incrementallinking can be found in the article "Remove Fatty Deposits from YourApplications Using Our 32-bit Liposuction Tools" by Matt Pietrek) D) Why can my
ThreadFunc
have only 4k of local variables?- Local variables are always stored on the stack. If a functionhas, say, 256 bytes of local variables, the stack pointer is decreasedby 256 when entering the function (more precisely, in the functionsprologue). The following function:
void Dummy(void) {
BYTE var[256];
var[0] = 0;
var[1] = 1;
var[255] = 255;
}
could, for instance, compile into something like this:
:00401000 push ebp
:00401001 mov ebp, esp
:00401003 sub esp, 00000100 ; change ESP as storage for
; local variables is needed
:00401006 mov byte ptr [esp], 00 ; var[0] = 0;
:0040100A mov byte ptr [esp+01], 01 ; var[1] = 1;
:0040100F mov byte ptr [esp+FF], FF ; var[255] = 255;
:00401017 mov esp, ebp ; restore stack pointer
:00401019 pop ebp
:0040101A retNote how the stack pointer (ESP) was changed in the above example?But what is different if a function needs more than 4 Kb for its localvariables? Well, then the stack pointer isn't changed directly. Rather,another function (a stack probe) is called, which in turn changes itappropriately. But it's exactly this additional function call thatmakes our
ThreadFunc
"corrupt," because its remote copy would call something that's not there.Let's see what the documentation says about stack probes and the /Gs compiler option:
"The /Gssize option is an advanced feature withwhich you can control stack probes. A stack probe is a sequence of codethat the compiler inserts into every function call. When activated, astack probe reaches benignly into memory by the amount of spacerequired to store the associated function's local variables.
If a function requires more than size stack space for local variables, its stack probe is activated. The default value of sizeis the size of one page (4 Kb for 80x86 processors). This value allowsa carefully tuned interaction between an application for Win32 and theWindows NT virtual-memory manager to increase the amount of memorycommitted to the program stack at run time."
I'm sure one or another wondered about the above statement: "...astack probe reaches benignly into memory...". Those compiler options(their descriptions!) are sometimes really irritating, at least untilyou look under the hood and see what's going on. If, for instance, afunction needs 12 Kb storage for its local variables, the memory on thestack would be "allocated" (more precisely: committed) this way:
sub esp, 0x1000 ; "allocate" first 4 Kb
test [esp], eax ; touches memory in order to commit a
; new page (if not already committed)
sub esp, 0x1000 ; "allocate" second 4 Kb
test [esp], eax ; ...
sub esp, 0x1000
test [esp], eaxNote how the stack pointer is changed in 4 Kb steps now and, more importantly, how the bottom of the stack is "touched" (via
test
)after each step. This ensures the page containing the bottom of thestack is being committed, before "allocating" (committing) another page.After reading ..
"Each new thread receives its own stack space, consistingof both committed and reserved memory. By default, each thread uses 1Mb of reserved memory, and one page of committed memory. The systemwill commit one page block from the reserved stack memory as needed."(see MSDN
CreateThread > dwStackSize >
"Thread Stack Size")... it should also be clear why the documentation about /Gs says thatyou get with stack probes a carefully tuned interaction between yourapplication and the Windows NT virtual-memory manager.
Now back to our
ThreadFunc
and 4 Kb limit:
Althoughyou could prevent calls to the stack probe routine with /Gs, thedocumentation warns you about doing so. Further, the documentation saysyou can turn stack probes on or off by using the#pragma check_stack
directive. However, it seems thispragma
doesn't affect stack probes at all (either the documentation is buggy,or I am missing some other facts?). Anyway, recall that the CreateRemoteThread & WriteProcessMemory techniqueshould be used only when injecting small peaces of code, so your localvariables should rarely *consume* more than a few bytes and thus notget even close to the 4 Kb limit.E) Why should I split up my
switch
block with more than threecase
statements?- Again, it is easiest to explain it with an example. Consider the following function:
int Dummy( int arg1 )
{
int ret =0;
switch( arg1 ) {
case 1: ret = 1; break;
case 2: ret = 2; break;
case 3: ret = 3; break;
case 4: ret = 0xA0B0; break;
}
return ret;
}
It would compile into something like this:
Address OpCode/Params Decoded instruction
--------------------------------------------------
; arg1 -> ECX
:00401000 8B4C2404 mov ecx, dword ptr [esp+04]
:00401004 33C0 xor eax, eax ; EAX = 0
:00401006 49 dec ecx ; ECX --
:00401007 83F903 cmp ecx, 00000003
:0040100A 771E ja 0040102A
; JMP to one of the addresses in table ***
; note that ECX contains the offset
:0040100C FF248D2C104000 jmp dword ptr [4*ecx+0040102C]
:00401013 B801000000 mov eax, 00000001 ; case 1: eax = 1;
:00401018 C3 ret
:00401019 B802000000 mov eax, 00000002 ; case 2: eax = 2;
:0040101E C3 ret
:0040101F B803000000 mov eax, 00000003 ; case 3: eax = 3;
:00401024 C3 ret
:00401025 B8B0A00000 mov eax, 0000A0B0 ; case 4: eax = 0xA0B0;
:0040102A C3 ret
:0040102B 90 nop
; Address table ***
:0040102C 13104000 DWORD 00401013 ; jump to case 1
:00401030 19104000 DWORD 00401019 ; jump to case 2
:00401034 1F104000 DWORD 0040101F ; jump to case 3
:00401038 25104000 DWORD 00401025 ; jump to case 4Note how the
switch-case
was implemented?
Rather than examining every singlecase
statement separately, an address table is created. Then, we jump to the rightcase
by simply calculating the offset into the address table. If you thinkfor a moment, this really is an improvement. Imagine you had aswitch
with 50case
statements. Without the above trick, you had to execute 50CMP
andJMP
instructions to get to the lastcase
. With the address table, on the contrary, you can jump to anycase
by a single table look-up. In terms of computer algorithms and time complexity: We replace an O(2n) algorithm by an O(5) one, where:- O denotes the worst-case time complexity.
- We assume five instructions are neccessary to calculate theoffset, do the table look-up, and finally jump to the appropriateaddress.
Now, one might think the above was possible only because the
case
constants were carefully chosen to be consecutive (1,2,3,4).Fortunately, it turns out the same solution can be applied to mostreal-world examples, only the offset calculation becomes somewhat morecomplicated. But there are two exceptions, though:- if there are three or less
case
statements or - if the
case
constants are completely unrelated to each other (i.e."case 1"
,"case 13"
,"case 50"
, and"case 1000"
)
then the resulting code does it the long way by examining every single
case
constant separately, with theCMP
andJMP
instructions. In other words, then the resulting code is essentially the same as if you had an ordinaryif-else if
sequence.Point of interest: If you ever wondered for what reason only a constant-expression can accompany a
case
statement, then you know why by now. In order to create the addresstable, this value obviously has to be known at compile time.Now back to the problem!
Notice theJMP
instruction at address0040100C
? Let's see what Intel's documentation says about the hex opcodeFF
:Opcode Instruction Description
FF /4 JMP r/m32 Jump near, absolute indirect,
address given in r/m32Oops, the debatable
JMP
uses some kind of absolute addressing? In other words, one of its operands (0040102C
in our case) represents an absolute address. Need I say more? Now, the remoteThreadFunc
would blindly think the address table for itsswitch
is at0040102C
,JMP
to a wrong place, and thus effectively crash the remote process. F) Why does the remote process crash, anyway?
- When your remote process crashes, it will always be for one of the following reasons:
- You referenced a string inside of
ThreadFunc
that doesn't exist. - One or more instructions in
ThreadFunc
use absolute addressing (see Appendix E for an example). ThreadFunc
calls a function that doesn't exist (the call could be added by the compiler/linker). When you will look atThreadFunc
in dissasembler in this case you will see something like this::004014C0 push EBP ; entry point of ThreadFunc
:004014C1 mov EBP, ESP
...
:004014C5 call 0041550 ; this will crash the
; remote process
...
:00401502 retIf the debatable
CALL
was added by the compiler(because some "forbidden" switch, such as /GZ, was turned on), it willbe located either somewhere at the beginning or near the end ofThreadFunc
.
In any case, you can't be careful enough with the CreateRemoteThread & WriteProcessMemory technique. Especially watch for your compiler/linker options. They could easily add something to your
ThreadFunc
. - You referenced a string inside of
- Total
- Today
- Yesterday
- SSM
- HPUX
- 시간표
- 오라클
- 책
- 애니메이션
- 회식
- 과제물
- 프로그래밍
- wow
- 실전! 업무에 바로 쓰는 SQL 튜닝
- oracle
- Japanimation
- 모임
- 박영창
- 오픈 소스 SW와 전략적 활용
- 후기
- 실습으로 배우는 Unix System Admin (HPUX)
- 영화
- 네트워크
- 삼성 소프트웨어 멤버십
- 일기
- World Of Warcraft
- 와우
- hp-ux
- 캐논
- SQL 튜닝
- 레포트
- 리눅스
일 | 월 | 화 | 수 | 목 | 금 | 토 |
---|---|---|---|---|---|---|
1 | 2 | 3 | 4 | |||
5 | 6 | 7 | 8 | 9 | 10 | 11 |
12 | 13 | 14 | 15 | 16 | 17 | 18 |
19 | 20 | 21 | 22 | 23 | 24 | 25 |
26 | 27 | 28 | 29 | 30 | 31 |