Processes, Threads, and objects - whiteowl911/leveleffect GitHub Wiki
So far in this chapter, you’ve seen the structures that are part of a process and the API functions with which you (and the operating system) can manipulate processes. You’ve also found out how you can use tools to view how processes interact with your system. But how did those processes come into being, and how do they exit once they’ve fulfilled their purpose? In the following sections, you’ll discover how a Windows process comes to life.
A Windows subsystem process is created when an application calls one of the process creation functions, such as CreateProcess, CreateProcessAsUser, CreateProcessWithTokenW, or CreateProcessWithLogonW. Creating a Windows process consists of several stages carried out in three parts of the operating system: the Windows client-side library Kernel32.dll (in the case of the CreateProcessAsUser, CreateProcessWithTokenW, and CreateProcessWithLogonW routines, part of the work is first done in Advapi32.dll), the Windows executive, and the Windows subsystem process (Csrss).
Because of the multiple environment subsystem architecture of Windows, creating an executive process object (which other subsystems can use) is separated from the work involved in creating a Windows subsystem process. So, although the following description of the flow of the Windows CreateProcess function is complicated, keep in mind that part of the work is specific to the semantics added by the Windows subsystem as opposed to the core work needed to create an executive process object.
The following list summarizes the main stages of creating a process with the Windows CreateProcess function. The operations performed in each stage are described in detail in the subsequent sections. Some of these operations may be performed by CreateProcess itself (or other helper routines in user mode), while others will be performed by NtCreateUserProcess or one of its helper routines in kernel mode. In our detailed analysis to follow, we will differentiate between the two at each step required.
NOTE
Many steps of CreateProcess are related to the setup of the process virtual address space and therefore refer to many memory management terms and structures that are defined in Chapter 9.
Validate parameters; convert Windows subsystem flags and options to their native counterparts; parse, validate, and convert the attribute list to its native counterpart.
Open the image file (.exe) to be executed inside the process.
Create the Windows executive process object.
Create the initial thread (stack, context, and Windows executive thread object).
Perform post-creation, Windows-subsystem-specific process initialization.
Start execution of the initial thread (unless the CREATE_ SUSPENDED flag was specified).
In the context of the new process and thread, complete the initialization of the address space (such as load required DLLs) and begin execution of the program.
Figure 5-5 shows an overview of the stages Windows follows to create a process.
Figure 5-5. The main stages of process creation
Before opening the executable image to run, CreateProcess performs the following steps:
In CreateProcess, the priority class for the new process is specified as independent bits in the CreationFlags parameter. Thus, you can specify more than one priority class for a single CreateProcess call. Windows resolves the question of which priority class to assign to the process by choosing the lowest-priority class set.
If no priority class is specified for the new process, the priority class defaults to Normal unless the priority class of the process that created it is Idle or Below Normal, in which case the priority class of the new process will have the same priority as the creating class.
If a Real-time priority class is specified for the new process and the process’s caller doesn’t have the Increase Scheduling Priority privilege, the High priority class is used instead. In other words, CreateProcess doesn’t fail just because the caller has insufficient privileges to create the process in the Real-time priority class; the new process just won’t have as high a priority as Real-time.
All windows are associated with desktops, the graphical representation of a workspace. If no desktop is specified in CreateProcess, the process is associated with the caller’s current desktop.
If the process is part of a job object, but the creation flags requested a separate virtual DOS machine (VDM), the flag is ignored.
If the caller is sending a handle to a monitor as an output handle instead of a console handle, standard handle flags are ignored.
If the creation flags specify that the process will be debugged, Kernel32 initiates a connection to the native debugging code in Ntdll.dll by calling DbgUiConnectToDbg and gets a handle to the debug object from the thread environment block (TEB) once the function returns.
Kernel32.dll sets the default hard error mode if the creation flags specified one.
The user-specified attribute list is converted from Windows subsystem format to native format, and internal attributes are added to it.
NOTE
The attribute list passed on a CreateProcess call permits passing back to the caller information beyond a simple status code, such as the TEB address of the initial thread or information on the image section. This is necessary for protected processes since the parent cannot query this information after the child is created.
Once these steps are completed, CreateProcess will perform the initial call to NtCreateUser-Process to attempt creation of the process. Because Kernel32.dll has no idea at this point whether the application image name is a real Windows application, or if it might be a POSIX, 16-bit, or DOS application, the call may fail, at which point CreateProcess will look at the error reason and attempt to correct the situation.
As illustrated in Figure 5-6, the first stage in NtCreateUserProcess is to find the appropriate Windows image that will run the executable file specified by the caller and to create a section object to later map it into the address space of the new process. If the call failed for any reason, it will return to CreateProcess with a failure state (see Table 5-6) that will cause CreateProcess to attempt execution again.
If the executable file specified is a Windows .exe, NtCreateUserProcess will try to open the file and create a section object for it. The object isn’t mapped into memory yet, but it is opened. Just because a section object has been successfully created doesn’t mean that the file is a valid Windows image, however; it could be a DLL or a POSIX executable. If the file is a POSIX executable, the image to be run changes to Posix.exe, and CreateProcess restarts from the beginning of Stage 1. If the file is a DLL, CreateProcess fails.
Now that NtCreateUserProcess has found a valid Windows executable image, as part of the process creation code described in Stage 3 it looks in the registry under HKLM\SOFTWARE\Microsoft\Windows NT\CurrentVersion\Image File Execution Options to see whether a subkey with the file name and extension of the executable image (but without the directory and path information—for example, Image.exe) exists there. If it does, PspAllocateProcess looks for a value named Debugger for that key. If this value is present, the image to be run becomes the string in that value and CreateProcess restarts at Stage 1.
TIP
You can take advantage of this process creation behavior and debug the startup code of Windows services processes before they start rather than attach the debugger after starting a service, which doesn’t allow you to debug the startup code.
On the other hand, if the image is not a Windows .exe (for example, if it’s an MS-DOS, Win16, or a POSIX application), CreateProcess goes through a series of steps to find a Windows support image to run it. This process is necessary because non-Windows applications aren’t run directly—Windows instead uses one of a few special support images that in turn are responsible for actually running the non-Windows program. For example, if you attempt to run a POSIX application, CreateProcess identifies it as such and changes the image to be run to the Windows executable file Posix.exe. If you attempt to run an MS-DOS or a Win16 executable, the image to be run becomes the Windows executable Ntvdm.exe. In short, you can’t directly create a process that is not a Windows process. If Windows can’t find a way to resolve the activated image as a Windows process (as shown in Table 5-6), CreateProcess fails.
Figure 5-6. Choosing a Windows image to activate
If the Image . . . | Create State Code | This Image Will Run . . . | . . . and This Will Happen |
---|---|---|---|
Is a POSIX executable file | PsCreateSuccess | Posix.exe | CreateProcess restarts Stage 1. |
Is an MS-DOS application with an .exe, a .com, or a .pif extension | PsCreateFailOnSectionCreate | Ntvdm.exe | CreateProcess restarts Stage 1. |
Is a Win16 application | PsCreateFailOnSectionCreate | Ntvdm.exe | CreateProcess restarts Stage 1. |
Is a Win64 application on a 32-bit system (or a PPC, MIPS, or Alpha Binary) | PsCreateFailMachineMismatch | N/A | CreateProcess will fail. |
Has a Debugger key with another image name | PsCreateFailExeName | Name specified in the Debugger key | CreateProcess restarts Stage 1. |
Is an invalid or damaged Windows EXE | PsCreateFailExeFormat | N/A | CreateProcess will fail. |
Cannot be opened | PsCreateFailOnFileOpen | N/A | CreateProcess will fail. |
Is a command procedure (application with a .bat or a .cmd extension) | PsCreateFailOnSectionCreate | Cmd.exe | CreateProcess restarts Stage 1. |
If the image header characteristics IMAGE_FILE_UP_SYSTEM_ONLY flag is set (indicating that the image can run only on a uniprocessor system), a single CPU is chosen for all the threads in this new process to run on. The selection process is performed by simply cycling through the available processors—each time this type of image is run, the next processor is used. In this way, these types of images are spread evenly across the processors.
If the image specifies an explicit processor affinity mask (for example, a field in the configuration header), this value is copied to the PEB and later set as the default process affinity mask.
Before the handle to the new process can be returned, a few final setup steps must be completed, which are performed by PspInsertProcess and its helper functions:
If systemwide auditing of processes is enabled (either as a result of local policy settings or group policy settings from a domain controller), the process’s creation is written to the Security event log.
If the parent process was contained in a job, the job is recovered from the job level set of the parent and then bound to the session of the newly created process. Finally, the new process is added to the job.
PspInsertProcess inserts the new process block at the end of the Windows list of active processes (PsActiveProcessHead).
The process debug port of the parent process is copied to the new child process, unless the NoDebugInherit flag is set (which can be requested when creating the process). If a debug port was specified, it is attached to the new process at this time.
Finally, PspInsertProcess notifies any registered callback routines, creates a handle for the new process by calling ObOpenObjectByPointer, and then returns this handle to the caller.
At this point, the Windows executive process object is completely set up. It still has no thread, however, so it can’t do anything yet. It’s now time to start that work. Normally, the PspCreateThread routine is responsible for all aspects of thread creation and is called by NtCreateThread when a new thread is being created. However, because the initial thread is created internally by the kernel without user-mode input, the two helper routines that PspCreateThread relies on are used instead: PspAllocateThread and PspInsertThread.
PspAllocateThread handles the actual creation and initialization of the executive thread object itself, while PspInsertThread handles the creation of the thread handle and security attributes and the call to KeStartThread to turn the executive object into a schedulable thread on the system. However, the thread won’t do anything yet—it is created in a suspended state and isn’t resumed until the process is completely initialized (as described in Stage 5).
NOTE
The thread parameter (which can’t be specified in CreateProcess but can be specified in CreateThread) is the address of the PEB. This parameter will be used by the initialization code that runs in the context of this new thread (as described in Stage 6).
PspAllocateThread performs the following steps:
An executive thread block (ETHREAD) is created and initialized.
Before the thread can execute, it needs a stack and a context in which to run, so these are set up. The stack size for the initial thread is taken from the image—there’s no way to specify another size.
The thread environment block (TEB) is allocated for the new thread.
The user-mode thread start address is stored in the ETHREAD. This is the system-supplied thread startup function in Ntdll.dll (RtlUserThreadStart). The user’s specified Windows start address is stored in the ETHREAD block in a different location so that debugging tools such as Process Explorer can query the information.
KeInitThread is called to set up the KTHREAD block. The thread’s initial and current base priorities are set to the process’s base priority, and its affinity and quantum are set to that of the process. This function also sets the initial thread ideal processor. (See the section Ideal and Last Processor for a description of how this is chosen.) KeInitThread next allocates a kernel stack for the thread and initializes the machine-dependent hardware context for the thread, including the context, trap, and exception frames. The thread’s context is set up so that the thread will start in kernel mode in KiThreadStartup. Finally, KeInitThread sets the thread’s state to Initialized and returns to PspAllocateThread.
Once that work is finished, NtCreateUserProcess will call PspInsertThread to perform the following steps:
A thread ID is generated for the new thread.
The thread count in the process object is incremented, and the thread is added into the process thread list.
The thread is put into a suspended state.
The object is inserted and any registered thread callbacks are called.
The handle is created with ObOpenObjectByName.
The thread is readied for execution by calling KeStartThread.
Once NtCreateUserProcess returns with a success code, all the necessary executive process and thread objects have been created. Kernel32.dll will now perform various operations related to Windows subsystem–specific operations to finish initializing the process.
First of all, various checks are made for whether Windows should allow the executable to run. These checks includes validating the image version in the header and checking whether Windows application certification has blocked the process (through a group policy). On specialized editions of Windows Server 2008, such as Windows Web Server 2008 and Windows HPC Server 2008, additional checks are made to see if the application imports any disallowed APIs.
If software restriction policies dictate, a restricted token is created for the new process. Afterward, the application compatibility database is queried to see if an entry exists in either the registry or system application database for the process. Compatibility shims will not be applied at this point—the information will be stored in the PEB once the initial thread starts executing (Stage 6).
At this point, Kernel32.dll sends a message to the Windows subsystem so that it can set up SxS information (see the end of this section for more information on side-by-side assemblies) such as manifest files, DLL redirection paths, and out-of-process execution for the new process. It also initializes the Windows subsystem structures for the process and initial thread. The message includes the following information:
Process and thread handles
Entries in the creation flags
ID of the process’s creator
Flag indicating whether the process belongs to a Windows application (so that Csrss can determine whether or not to show the startup cursor)
UI language Information
DLL redirection and .local flags
Manifest file information
The Windows subsystem performs the following steps when it receives this message:
CsrCreateProcess duplicates a handle for the process and thread. In this step, the usage count of the process and the thread is incremented from 1 (which was set at creation time) to 2.
If a process priority class isn’t specified, CsrCreateProcess sets it according to the algorithm described earlier in this section.
The Csrss process block is allocated.
The new process’s exception port is set to be the general function port for the Windows subsystem so that the Windows subsystem will receive a message when a second chance exception occurs in the process. (For further information on exception handling, see Chapter 3.)
The Csrss thread block is allocated and initialized.
CsrCreateThread inserts the thread in the list of threads for the process.
The count of processes in this session is incremented.
The process shutdown level is set to 0x280 (the default process shutdown level—see SetProcessShutdownParameters in the MSDN Library documentation for more information).
The new process block is inserted into the list of Windows subsystem-wide processes.
The per-process data structure used by the kernel-mode part of the Windows subsystem (W32PROCESS structure) is allocated and initialized.
The application start cursor is displayed. This cursor is the familiar rolling doughnut shape—the way that Windows says to the user, “I’m starting something, but you can use the cursor in the meantime.” If the process doesn’t make a GUI call after 2 seconds, the cursor reverts to the standard pointer. If the process does make a GUI call in the allotted time, CsrCreateProcess waits 5 seconds for the application to show a window. After that time, CsrCreateProcess will reset the cursor again.
After Csrss has performed these steps, CreateProcess checks whether the process was run elevated (which means it was executed through ShellExecute and elevated by the AppInfo service after the consent dialog box was shown to the user). This includes checking whether the process was a setup program. If it was, the process’s token is opened, and the virtualization flag is turned on so that the application is virtualized. (See the information on UAC and virtualization in Chapter 6.) If the application contained elevation shims or had a requested elevation level in its manifest, the process is destroyed and an elevation request is sent to the AppInfo service. (See Chapter 6 for more information on elevation.)
Note that most of these checks are not performed for protected processes; because these processes must have been designed for Windows Vista or later, there’s no reason why they should require elevation, virtualization, or application compatibility checks and processing. Additionally, allowing mechanisms such as the shim engine to use its usual hooking and memory patching techniques on a protected process would result in a security hole if someone could figure how to insert arbitrary shims that modify the behavior of the protected process.
At this point, the process environment has been determined, resources for its threads to use have been allocated, the process has a thread, and the Windows subsystem knows about the new process. Unless the caller specified the CREATE_ SUSPENDED flag, the initial thread is now resumed so that it can start running and perform the remainder of the process initialization work that occurs in the context of the new process (Stage 7).
The new thread begins life running the kernel-mode thread startup routine KiThreadStartup. KiThreadStartup lowers the thread’s IRQL level from DPC/dispatch level to APC level and then calls the system initial thread routine, PspUserThreadStartup. The user-specified thread start address is passed as a parameter to this routine.
First, this function sets the Locale ID and the ideal processor in the TEB, based on the information present in kernel-mode data structures, and then it checks if thread creation actually failed. Next it calls DbgkCreateThread, which checks if image notifications were sent for the new process. If they weren’t, and notifications are enabled, an image notification is sent first for the process and then for the image load of Ntdll.dll. Note that this is done in this stage rather than when the images were first mapped, because the process ID (which is required for the callouts) is not yet allocated at that time.
Once those checks are completed, another check is performed to see whether the process is a debuggee. If it is, then PspUserThreadStartup checks if the debugger notifications have already been sent for this process. If not, then a create process message is sent through the debug object (if one is present) so that the process startup debug event (CREATE_PROCESS_DEBUG_INFO) can be sent to the appropriate debugger process. This is followed by a similar thread startup debug event and by another debug event for the image load of Ntdll.dll. DbgkCreateThread then waits for the Windows subsystem to get the reply from the debugger (via the ContinueDebugEvent function).
Now that the debugger has been notified, PspUserThreadStartup looks at the result of the initial check on the thread’s life. If it was killed on startup, the thread is terminated. This check is done after the debugger and image notifications to be sure that the kernel-mode and user-mode debuggers don’t miss information on the thread, even if the thread never got a chance to run.
Otherwise, the routine checks whether application prefetching is enabled on the system and, if so, calls the prefetcher (and Superfetch) to process the prefetch instruction file (if it exists) and prefetch pages referenced during the first 10 seconds the last time the process ran. (For details on the prefetcher and Superfetch, see Chapter 9.)
PspUserThreadStartup then checks if the systemwide cookie in the SharedUserData structure has been set up yet. If it hasn’t, it generates it based on a hash of system information such as the number of interrupts processed, DPC deliveries, and page faults. This systemwide cookie is used in the internal decoding and encoding of pointers, such as in the heap manager (for more information on heap manager security, see Chapter 9), to protect against certain classes of exploitation.
Finally, PspUserThreadStartup sets up the initial thunk context to run the image loader initialization routine (LdrInitializeThunk in Ntdll.dll), as well as the systemwide thread startup stub (RtlUserThreadStart in Ntdll.dll). These steps are done by editing the context of the thread in place and then issuing an exit from system service operation, which will load the specially crafted user context. The LdrInitializeThunk routine initializes the loader, heap manager, NLS tables, thread-local storage (TLS) and fiber-local storage (FLS) array, and critical section structures. It then loads any required DLLs and calls the DLL entry points with the DLL_PROCESS_ ATTACH function code. (See the sidebar “Side-by-Side Assemblies” for a description of a mechanism Windows uses to address DLL versioning problems.)
Once the function returns, NtContinue will restore the new user context and return back to user mode—thread execution now truly starts.
RtlUserThreadStart will use the address of the actual image entry point and the start parameter and call the application. These two parameters have also already been pushed onto the stack by the kernel. This complicated series of events has two purposes. First of all, it allows the image loader inside Ntdll.dll to set up the process internally and behind the scenes so that other user-mode code can run properly (otherwise, it would have no heap, no thread local storage, and so on).
Second, having all threads begin in a common routine allows them to be wrapped in exception handling, so that when they crash, Ntdll.dll is aware of that and can call the unhandled exception filter inside Kernel32.dll. It is also able to coordinate thread exit on return from the thread’s start routine and to perform various cleanup work. Application developers can also call SetUnhandledExceptionFilter to add their own unhandled exception handling code.
Side-by-Side Assemblies
In order to isolate DLLs distributed with applications from DLLs that ship with the operating system, Windows allows applications to use private copies of these core DLLs. To use a private copy of a DLL instead of the one in the system directory, an application’s installation must include a file named Application.exe.local (where Application is the name of the application’s executable), which directs the loader to first look for DLLs in that directory. Note that any DLLs that are loaded from the list of KnownDLLs (DLLs that are permanently mapped into memory) or that are loaded by those DLLs cannot be redirected using this mechanism.
To further address application and DLL compatibility while allowing sharing, Windows implements the concept of shared assemblies. An assembly consists of a group of resources, including DLLs, and an XML manifest file that describes the assembly and its contents. An application references an assembly through the existence of its own XML manifest. The manifest can be a file in the application’s installation directory that has the same name as the application with “.manifest” appended (for example, application.exe.manifest), or it can be linked into the application as a resource. The manifest describes the application and its dependence on assemblies.
There are two types of assemblies: private and shared. The difference between the two is that shared assemblies are digitally signed so that corruption or modification of their contents can be detected. In addition, shared assemblies are stored under the \Windows\Winsxs directory, whereas private assemblies are stored in an application’s installation directory. Thus, shared assemblies also have an associated catalog file (.cat) that contains its digital signature information. Shared assemblies can be “side-by-side” assemblies because multiple versions of a DLL can reside on a system simultaneously, with applications dependent on a particular version of a DLL always using that particular version.
An assembly’s manifest file typically has a name that includes the name of the assembly, version information, some text that represents a unique signature, and the extension “.manifest”. The manifests are stored in \Windows\Winsxs\Manifests, and the rest of the assembly’s resources are stored in subdirectories of \Windows\Winsxs that have the same name as the corresponding manifest files, with the exception of the trailing .manifest extension.
An example of a shared assembly is version 6 of the Windows common controls DLL, comctl32.dll. Its manifest file is named
\Windows\Winsxs\Manifests\x86_Microsoft.Windows.Common-Controls_6595b64144ccf1df_6.0.0.0_x-ww_1382d70a.manifest. It has an associated catalog file (which is the same name with the .cat extension) and a subdirectory of Winsxs that includes comctl32.dll.
Version 6 of Comctl32.dll added integration with Windows themes, and because applications not written with theme support in mind might not appear correctly with the new DLL, it’s available only to applications that explicitly reference the shared assembly containing it—the version of Comctl32.dll installed in \Windows\System32 is an instance of version 5.x, which is not theme aware. When an application loads, the loader looks for the application’s manifest, and if one exists, loads the DLLs from the assemblies specified. DLLs not included in assemblies referenced in the manifest are loaded in the traditional way. Legacy applications, therefore, link against the version in \Windows\System32, whereas theme-aware applications can specify the new version in their manifest.
A final advantage that shared assemblies have is that a publisher can issue a publisher configuration, which can redirect all applications that use a particular assembly to use an updated version. Publishers would do this if they were preserving backward compatibility while addressing bugs. Ultimately, however, because of the flexibility inherent in the assembly model, an application could decide to override the new setting and continue to use an older version.
EXPERIMENT: Tracing Process Startup
Now that we’ve looked in detail at how a process starts up and the different operations required to begin executing an application, we’re going to use Process Monitor to take a look at some of the file I/O and registry keys that are accessed during this process.
Although this experiment will not provide a complete picture of all the internal steps we’ve described, you’ll be able to see several parts of the system in action, notably Prefetch and Superfetch, image file execution options and other compatibility checks, and the image loader’s DLL mapping.
We’re going to be looking at a very simple executable—Notepad.exe—and we will be launching it from a Command Prompt window (Cmd.exe). It’s important that we look both at the operations inside Cmd.exe and those inside Notepad.exe. Recall that a lot of the user-mode work is performed by CreateProcess, which is called by the parent process before the kernel has created a new process object.
To set things up correctly, add two filters to Process Monitor: one for Cmd.exe, and one for Notepad.exe—these are the only two processes we want to include. It will be helpful to be sure that you don’t have any currently running instances of these two processes so that you know you’re looking at the right events. The filter window should look like this:
Next, make sure that event logging is currently disabled (clear File, Capture Events), and then start up the command prompt. Enable event logging (using the File menu again, or simply press CTRL+E or click the magnifying glass icon on the toolbar) and then enter Notepad.exe and press Enter. On a typical Windows Vista system, you should see anywhere between 500 and 1500 events appear. Go ahead and hide the Sequence and Time Of Day columns so that we can focus our attention on the columns of interest. Your window should look similar to the one shown next.
Just as described in Stage 1 of the CreateProcess flow, one of the first things to notice is that just before the process is started and the first thread is created, Cmd.exe does a registry read at HKLM\SOFTWARE\Microsoft\Windows NT\CurrentVersion\Image File Execution Options. Because there were no image execution options associated with Notepad.exe, the process was created as is.
As with this and any other event in Process Monitor’s log, you have the ability to see whether each part of the process creation flow was performed in user mode or kernel mode, and by which routines, by looking at the stack of the event. To do this, double-click on the RegOpenKey event mentioned and switch to the Stack tab. The following screen shows the standard stack on a 32-bit Windows Vista machine.
This stack shows that we have already reached the part of process creation performed in kernel mode (through NtCreateUserProcess) and that the helper routine PspAllocateProcess is responsible for this check.
Going down the list of events after the thread and process have been created, you will notice three groups of events. The first is a simple check for application compatibility flags, which will let the user-mode process creation code know if checks inside the application compatibility database are required through the shim engine.
This check is followed by multiple reads to Side-By-Side, Manifest, and MUI/Language keys, which are part of the assembly framework mentioned earlier. Finally, you may see file I/O to one or more .sdb files, which are the application compatibility databases on the system. This I/O is where additional checks are done to see if the shim engine needs to be invoked for this application. Since Notepad is a well behaved Microsoft program, it doesn’t require any shims.
The following screen shows the next series of events, which happen inside the Notepad process itself. These are actions initiated by the user-mode thread startup wrapper in kernel mode, which performs the actions described earlier. The first two are the Notepad.exe and Ntdll.dll image load debug notification messages, which can only be generated now that code is running inside Notepad’s process context and not the context for the command prompt.
Next, the prefetcher kicks in, looking for a prefetch database file that has already been generated for Notepad. (For more information on the prefetcher, see Chapter 9). On a system where Notepad has already been run at least once, this database will exist, and the prefetcher will begin executing the commands specified inside it. If this is the case, scrolling down you will see multiple DLLs being read and queried. Unlike typical DLL loading, which is done by the user-mode image loader by looking at the import tables or when an application manually loads a DLL, these events are being generated by the prefetcher, which is already aware of the libraries that Notepad will require. Typical image loading of the DLLs required happens next, and you will see events similar to the ones shown here.
These events are now being generated from code running inside user mode, which was called once the kernel-mode wrapper function finished its work. Therefore, these are the first events coming from LdrpInitializeProcess, which we mentioned is the internal system wrapper function for any new process, before the start address wrapper is called. You can confirm this on your own by looking at the stack of these events; for example, the kernel32.dll image load event, which is shown in the next screen.
Further events are generated by this routine and its associated helper functions until you finally reach events generated by the WinMain function inside Notepad, which is where code under the developer’s control is now being executed. Describing in detail all the events and user-mode components that come into play during process execution would fill up this entire chapter, so exploration of any further events is left as an exercise for the reader.
- Like us on Facebook
- Follow us on Twitter
<a href="https://www.microsoftpressstore.com/articles/article.aspx?p=2233328&seqNum=3#addToWishList" class="left jsLoadFrame dialogOpener"><div class="icons-save-container"><i class="icons-save-content left"></i>Save to your account</div></a>
<nav class="right pagination" aria-label="Pagination" id="pagination">
<ul class="inline-pagination">
<li><a class="m-r-1 btn-back" aria-label="Back" href="https://www.microsoftpressstore.com/articles/article.aspx?p=2233328&seqNum=2">Back</a></li>
<li><a href="https://www.microsoftpressstore.com/articles/article.aspx?p=2233328&seqNum=3#" onclick="event.preventDefault();" aria-label="Page 3 of 9" aria-current="page">Page 3 of 9</a></li>
<li><a class="m-l-1 btn-next" aria-label="Next" href="https://www.microsoftpressstore.com/articles/article.aspx?p=2233328&seqNum=4">Next</a></li>
<ul>
</ul></ul></nav>
</div>
</div>
<div class="five columns l-offset-1 omega">
<div class="m-t-2 m-b-3 clearfix">
<h3 class="chapterHeading m-b-1">This chapter is from the book</h3>
<div class="append-article-associated-wide" data-set="append-article-associated">
<div id="swapper-associated">
<div class="buckets-image-51 left clearfix">
<a href="https://www.microsoftpressstore.com/store/windows-internals-9780735625303?w_ptgrevartcl=Processes%2c+Threads%2c+and+Jobs+in+the+Windows+Operating+System_2233328" tabindex="-1" aria-hidden="true">
<img src="https://www.microsoftpressstore.com/ShowCover.aspx?isbn=9780735625303&type=d" alt="Windows Internals, 5th Edition" class="product">
</a>
<p class="p4 m-b-1">
<a href="https://www.microsoftpressstore.com/store/windows-internals-9780735625303?w_ptgrevartcl=Processes%2c+Threads%2c+and+Jobs+in+the+Windows+Operating+System_2233328" class="title">Windows Internals, 5th Edition</a>
</p>
<ul class="list-plain left">
<li class="inline">
<a href="https://www.microsoftpressstore.com/store/windows-internals-9780735625303?w_ptgrevartcl=Processes%2c+Threads%2c+and+Jobs+in+the+Windows+Operating+System_2233328" class="button left m-b-1">Learn more</a>
<a href="https://www.microsoftpressstore.com/buy.aspx?isbn=9780735625303&w_ptgrevartcl=Processes%2c+Threads%2c+and+Jobs+in+the+Windows+Operating+System_2233328" class="buy button left clear">Buy</a>
</li>
</ul>
</div>
</div>
</div>
</div>
<div class="append-article-resources-wide" data-set="append-article-resources">
<div id="swapper-resources">
<div class="buckets-image-51 m-b-1 clearfix">
<a href="https://www.microsoftpressstore.com/store/windows-internals-part-2-9780735665866?w_ptgrevartcl=Windows+Internals%2c+Part+2_2189439" aria-hidden="true" tabindex="-1">
<img src="https://www.microsoftpressstore.com/ShowCover.aspx?isbn=0735665869&type=e" alt="Windows Internals, Part 2" class="product">
</a>
<p class="p4 m-b-1"><a href="https://www.microsoftpressstore.com/store/windows-internals-part-2-9780735665866" class="title">Windows Internals, Part 2, 6th Edition</a></p>
<ul class="list-plain left">
<li class="m-b-1">By <a href="https://www.microsoftpressstore.com/authors/bio.aspx?a=d8f472e1-f654-4431-8c27-a6ae1f7c0631">Mark E. Russinovich</a>, <a href="https://www.microsoftpressstore.com/authors/bio.aspx?a=c86e0ba3-cbe6-465d-91be-a6ea013300b8">David A. Solomon</a>, <a href="https://www.microsoftpressstore.com/authors/bio.aspx?a=ade9a0f4-67a5-41b3-9f5b-aa48f49652fc">Alex Ionescu</a></li>
<li>eBook (Watermarked) $31.99</li>
</ul>
</div>
<div class="buckets-image-51 m-b-1 clearfix">
<a href="https://www.microsoftpressstore.com/store/windows-internals-part-2-9780735665873?w_ptgrevartcl=Windows+Internals%2c+Part+2_2187796" aria-hidden="true" tabindex="-1">
<img src="https://www.microsoftpressstore.com/ShowCover.aspx?isbn=0735665877&type=e" alt="Windows Internals, Part 2" class="product">
</a>
<p class="p4 m-b-1"><a href="https://www.microsoftpressstore.com/store/windows-internals-part-2-9780735665873" class="title">Windows Internals, Part 2, 6th Edition</a></p>
<ul class="list-plain left">
<li class="m-b-1">By <a href="https://www.microsoftpressstore.com/authors/bio.aspx?a=d8f472e1-f654-4431-8c27-a6ae1f7c0631">Mark E. Russinovich</a>, <a href="https://www.microsoftpressstore.com/authors/bio.aspx?a=c86e0ba3-cbe6-465d-91be-a6ea013300b8">David A. Solomon</a>, <a href="https://www.microsoftpressstore.com/authors/bio.aspx?a=ade9a0f4-67a5-41b3-9f5b-aa48f49652fc">Alex Ionescu</a></li>
<li>Book $39.99</li>
</ul>
</div>
<div class="buckets-image-51 m-b-1 clearfix">
<a href="https://www.microsoftpressstore.com/store/windows-7-inside-out-deluxe-edition-9780735656925?w_ptgrevartcl=Windows+7+Inside+Out%2c+Deluxe+Edition_2188453" aria-hidden="true" tabindex="-1">
<img src="https://www.microsoftpressstore.com/ShowCover.aspx?isbn=0735656924&type=e" alt="Windows 7 Inside Out, Deluxe Edition" class="product">
</a>
<p class="p4 m-b-1"><a href="https://www.microsoftpressstore.com/store/windows-7-inside-out-deluxe-edition-9780735656925" class="title">Windows 7 Inside Out, Deluxe Edition</a></p>
<ul class="list-plain left">
<li class="m-b-1">By <a href="https://www.microsoftpressstore.com/authors/bio.aspx?a=6348c37e-f6bc-4ead-8ceb-f9d2e4d7688c">Ed Bott</a>, <a href="https://www.microsoftpressstore.com/authors/bio.aspx?a=f580285b-7cd3-4fea-a404-938339a9ed42">Carl Siechert</a>, <a href="https://www.microsoftpressstore.com/authors/bio.aspx?a=01d2cd86-41dd-4af5-8892-1947bca68fa5">Craig Stinson</a></li>
<li>Book $47.99</li>
</ul>
</div>
<p class="right">
<a href="https://www.microsoftpressstore.com/store">See related titles</a>
</p>
</div>
- © 2022 Pearson Education. All rights reserved.
SPECIAL OFFER
Use code BACKTOLEARN during checkout to save 45% on books & eBooks. Shop now. Sign in Your cart Pearson test
The Microsoft Press Store by Pearson Search Microsoft Press Store
Topics
Formats
Series
[Authors](https://www.microsoftpressstore.com/authors/)
[Specials](https://www.microsoftpressstore.com/promotions/index.aspx)
More
[Home](https://www.microsoftpressstore.com/) [Sample chapters](https://www.microsoftpressstore.com/articles/index.aspx)
Processes, Threads, and Jobs in the Windows Operating System
6/17/2009
Like us on Facebook
Follow us on Twitter
[Back](https://www.microsoftpressstore.com/articles/article.aspx?p=2233328&seqNum=2) [Page 3 of 9](https://www.microsoftpressstore.com/articles/article.aspx?p=2233328&seqNum=3#) [Next](https://www.microsoftpressstore.com/articles/article.aspx?p=2233328&seqNum=4)
Flow of CreateProcess
So far in this chapter, you’ve seen the structures that are part of a process and the API functions with which you (and the operating system) can manipulate processes. You’ve also found out how you can use tools to view how processes interact with your system. But how did those processes come into being, and how do they exit once they’ve fulfilled their purpose? In the following sections, you’ll discover how a Windows process comes to life.
A Windows subsystem process is created when an application calls one of the process creation functions, such as CreateProcess, CreateProcessAsUser, CreateProcessWithTokenW, or CreateProcessWithLogonW. Creating a Windows process consists of several stages carried out in three parts of the operating system: the Windows client-side library Kernel32.dll (in the case of the CreateProcessAsUser, CreateProcessWithTokenW, and CreateProcessWithLogonW routines, part of the work is first done in Advapi32.dll), the Windows executive, and the Windows subsystem process (Csrss).
Because of the multiple environment subsystem architecture of Windows, creating an executive process object (which other subsystems can use) is separated from the work involved in creating a Windows subsystem process. So, although the following description of the flow of the Windows CreateProcess function is complicated, keep in mind that part of the work is specific to the semantics added by the Windows subsystem as opposed to the core work needed to create an executive process object.
The following list summarizes the main stages of creating a process with the Windows CreateProcess function. The operations performed in each stage are described in detail in the subsequent sections. Some of these operations may be performed by CreateProcess itself (or other helper routines in user mode), while others will be performed by NtCreateUserProcess or one of its helper routines in kernel mode. In our detailed analysis to follow, we will differentiate between the two at each step required.
NOTE
Many steps of CreateProcess are related to the setup of the process virtual address space and therefore refer to many memory management terms and structures that are defined in Chapter 9.
Validate parameters; convert Windows subsystem flags and options to their native counterparts; parse, validate, and convert the attribute list to its native counterpart.
Open the image file (.exe) to be executed inside the process.
Create the Windows executive process object.
Create the initial thread (stack, context, and Windows executive thread object).
Perform post-creation, Windows-subsystem-specific process initialization.
Start execution of the initial thread (unless the CREATE_ SUSPENDED flag was specified).
In the context of the new process and thread, complete the initialization of the address space (such as load required DLLs) and begin execution of the program.
Figure 5-5 shows an overview of the stages Windows follows to create a process. Figure 5-5
Figure 5-5. The main stages of process creation Stage 1: Converting and Validating Parameters and Flags
Before opening the executable image to run, CreateProcess performs the following steps:
In CreateProcess, the priority class for the new process is specified as independent bits in the CreationFlags parameter. Thus, you can specify more than one priority class for a single CreateProcess call. Windows resolves the question of which priority class to assign to the process by choosing the lowest-priority class set.
If no priority class is specified for the new process, the priority class defaults to Normal unless the priority class of the process that created it is Idle or Below Normal, in which case the priority class of the new process will have the same priority as the creating class.
If a Real-time priority class is specified for the new process and the process’s caller doesn’t have the Increase Scheduling Priority privilege, the High priority class is used instead. In other words, CreateProcess doesn’t fail just because the caller has insufficient privileges to create the process in the Real-time priority class; the new process just won’t have as high a priority as Real-time.
All windows are associated with desktops, the graphical representation of a workspace. If no desktop is specified in CreateProcess, the process is associated with the caller’s current desktop.
If the process is part of a job object, but the creation flags requested a separate virtual DOS machine (VDM), the flag is ignored.
If the caller is sending a handle to a monitor as an output handle instead of a console handle, standard handle flags are ignored.
If the creation flags specify that the process will be debugged, Kernel32 initiates a connection to the native debugging code in Ntdll.dll by calling DbgUiConnectToDbg and gets a handle to the debug object from the thread environment block (TEB) once the function returns.
Kernel32.dll sets the default hard error mode if the creation flags specified one.
The user-specified attribute list is converted from Windows subsystem format to native format, and internal attributes are added to it.
NOTE
The attribute list passed on a CreateProcess call permits passing back to the caller information beyond a simple status code, such as the TEB address of the initial thread or information on the image section. This is necessary for protected processes since the parent cannot query this information after the child is created.
Once these steps are completed, CreateProcess will perform the initial call to NtCreateUser-Process to attempt creation of the process. Because Kernel32.dll has no idea at this point whether the application image name is a real Windows application, or if it might be a POSIX, 16-bit, or DOS application, the call may fail, at which point CreateProcess will look at the error reason and attempt to correct the situation. Stage 2: Opening the Image to Be Executed
As illustrated in Figure 5-6, the first stage in NtCreateUserProcess is to find the appropriate Windows image that will run the executable file specified by the caller and to create a section object to later map it into the address space of the new process. If the call failed for any reason, it will return to CreateProcess with a failure state (see Table 5-6) that will cause CreateProcess to attempt execution again.
If the executable file specified is a Windows .exe, NtCreateUserProcess will try to open the file and create a section object for it. The object isn’t mapped into memory yet, but it is opened. Just because a section object has been successfully created doesn’t mean that the file is a valid Windows image, however; it could be a DLL or a POSIX executable. If the file is a POSIX executable, the image to be run changes to Posix.exe, and CreateProcess restarts from the beginning of Stage 1. If the file is a DLL, CreateProcess fails.
Now that NtCreateUserProcess has found a valid Windows executable image, as part of the process creation code described in Stage 3 it looks in the registry under HKLM\SOFTWARE\Microsoft\Windows NT\CurrentVersion\Image File Execution Options to see whether a subkey with the file name and extension of the executable image (but without the directory and path information—for example, Image.exe) exists there. If it does, PspAllocateProcess looks for a value named Debugger for that key. If this value is present, the image to be run becomes the string in that value and CreateProcess restarts at Stage 1.
TIP
You can take advantage of this process creation behavior and debug the startup code of Windows services processes before they start rather than attach the debugger after starting a service, which doesn’t allow you to debug the startup code.
On the other hand, if the image is not a Windows .exe (for example, if it’s an MS-DOS, Win16, or a POSIX application), CreateProcess goes through a series of steps to find a Windows support image to run it. This process is necessary because non-Windows applications aren’t run directly—Windows instead uses one of a few special support images that in turn are responsible for actually running the non-Windows program. For example, if you attempt to run a POSIX application, CreateProcess identifies it as such and changes the image to be run to the Windows executable file Posix.exe. If you attempt to run an MS-DOS or a Win16 executable, the image to be run becomes the Windows executable Ntvdm.exe. In short, you can’t directly create a process that is not a Windows process. If Windows can’t find a way to resolve the activated image as a Windows process (as shown in Table 5-6), CreateProcess fails. Figure 5-6
Figure 5-6. Choosing a Windows image to activate Table 5-6. Decision Tree for Stage 1 of CreateProcess
If the Image . . .
Create State Code
This Image Will Run . . .
. . . and This Will Happen
Is a POSIX executable file
PsCreateSuccess
Posix.exe
CreateProcess restarts Stage 1.
Is an MS-DOS application with an .exe, a .com, or a .pif extension
PsCreateFailOnSectionCreate
Ntvdm.exe
CreateProcess restarts Stage 1.
Is a Win16 application
PsCreateFailOnSectionCreate
Ntvdm.exe
CreateProcess restarts Stage 1.
Is a Win64 application on a 32-bit system (or a PPC, MIPS, or Alpha Binary)
PsCreateFailMachineMismatch
N/A
CreateProcess will fail.
Has a Debugger key with another image name
PsCreateFailExeName
Name specified in the Debugger key
CreateProcess restarts Stage 1.
Is an invalid or damaged Windows EXE
PsCreateFailExeFormat
N/A
CreateProcess will fail.
Cannot be opened
PsCreateFailOnFileOpen
N/A
CreateProcess will fail.
Is a command procedure (application with a .bat or a .cmd extension)
PsCreateFailOnSectionCreate
Cmd.exe
CreateProcess restarts Stage 1.
Specifically, the decision tree that CreateProcess goes through to run an image is as follows:
If the image is an MS-DOS application with an .exe, a .com, or a .pif extension, a message is sent to the Windows subsystem to check whether an MS-DOS support process (Ntvdm.exe, specified in the registry value HKLM\SYSTEM\CurrentControlSet\Control\WOW\cmdline) has already been created for this session. If a support process has been created, it is used to run the MS-DOS application. (The Windows subsystem sends the message to the VDM [Virtual DOS Machine] process to run the new image.) Then CreateProcess returns. If a support process hasn’t been created, the image to be run changes to Ntvdm.exe and CreateProcess restarts at Stage 1.
If the file to run has a .bat or a .cmd extension, the image to be run becomes Cmd.exe, the Windows command prompt, and CreateProcess restarts at Stage 1. (The name of the batch file is passed as the first parameter to Cmd.exe.)
If the image is a Win16 (Windows 3.1) executable, CreateProcess must decide whether a new VDM process must be created to run it or whether it should use the default sessionwide shared VDM process (which might not yet have been created). The CreateProcess flags CREATE_SEPARATE_WOW_VDM and CREATE_SHARED_WOW_VDM control this decision. If these flags aren’t specified, the registry value HKLM\SYSTEM\CurrentControlSet\Control\WOW\DefaultSeparateVDM dictates the default behavior. If the application is to be run in a separate VDM, the image to be run changes to the value of HKLM\SYSTEM\CurrentControlSet\Control\WOW\wowcmdline and CreateProcess restarts at Stage 1. Otherwise, the Windows subsystem sends a message to see whether the shared VDM process exists and can be used. (If the VDM process is running on a different desktop or isn’t running under the same security as the caller, it can’t be used and a new VDM process must be created.) If a shared VDM process can be used, the Windows subsystem sends a message to it to run the new image and CreateProcess returns. If the VDM process hasn’t yet been created (or if it exists but can’t be used), the image to be run changes to the VDM support image and CreateProcess restarts at Stage 1.
Stage 3: Creating the Windows Executive Process Object (PspAllocateProcess)
At this point, NtCreateUserProcess has opened a valid Windows executable file and created a section object to map it into the new process address space. Next it creates a Windows executive process object to run the image by calling the internal system function PspAllocateProcess. Creating the executive process object (which is done by the creating thread) involves the following substages:
Setting up the EPROCESS block
Creating the initial process address space
Initializing the kernel process block (KPROCESS)
Setting up the PEB
Concluding the setup of the process address space (which includes initializing the working set list and virtual address space descriptors and mapping the image into address space)
NOTE
The only time there won’t be a parent process is during system initialization. After that point, a parent process is always required to provide a security context for the new process. Stage 3A: Setting Up the EPROCESS Block
This substage involves the following steps:
Allocate and initialize the Windows EPROCESS block.
Inherit the Windows device namespace (including the definition of drive letters, COM ports, and so on).
Inherit the process affinity mask and page priority from the parent process. If there is no parent process, the default page priority (5) is used, and an affinity mask of all processors (KeActiveProcessors) is used.
Set the new process’s quota block to the address of its parent process’s quota block, and increment the reference count for the parent’s quota block. If the process was created through CreateProcessAsUser, this step won’t occur.
The process minimum and maximum working set size are set to the values of PspMinimumWorkingSet and PspMaximumWorkingSet, respectively. These values can be overridden if performance options were specified in the PerfOptions key part of Image File Execution Options, in which case the maximum working set is taken from there.
Store the parent process’s process ID in the InheritedFromUniqueProcessId field in the new process object.
Attach the process to the session of the parent process.
Initialize the KPROCESS part of the process object. (See Stage 3C.)
Create the process’s primary access token (a duplicate of its parent’s primary token). New processes inherit the security profile of their parents. If the CreateProcessAsUser function is being used to specify a different access token for the new process, the token is then changed appropriately.
The process handle table is initialized. If the inherit handles flag is set for the parent process, any inheritable handles are copied from the parent’s object handle table into the new process. (For more information about object handle tables, see Chapter 3.) A process attribute can also be used to specify only a subset of handles, which is useful when you are using CreateProcessAsUser to restrict which objects should be inherited by the child process.
If performance options were specified through the PerfOptions key, these are now applied. The PerfOptions key includes overrides for the working set limit, I/O priority, page priority, and CPU priority class of the process.
The process priority class and quantum are computed and set.
Set the new process’s exit status to STATUS_PENDING.
Stage 3B: Creating the Initial Process Address Space
The initial process address space consists of the following pages:
Page directory (and it’s possible there’ll be more than one for systems with page tables more than two levels, such as x86 systems in PAE mode or 64-bit systems)
Hyperspace page
Working set list
To create these three pages, the following steps are taken:
Page table entries are created in the appropriate page tables to map the initial pages.
The number of pages is deducted from the kernel variable MmTotalCommittedPages and added to MmProcessCommit.
The systemwide default process minimum working set size (PsMinimumWorkingSet) is deducted from MmResidentAvailablePages.
The page table pages for the nonpaged portion of system space and the system cache are mapped into the process.
Stage 3C: Creating the Kernel Process Block
The next stage of PspAllocateProcess is the initialization of the KPROCESS block. This work is performed by KeInitializeProcess, which contains:
A pointer to a list of kernel threads. (The kernel has no knowledge of handles, so it bypasses the object table.)
A pointer to the process’s page table directory (which is used to keep track of the process’s virtual address space).
The total time the process’s threads have executed.
The number of clock cycles the process’s threads have consumed.
The process’s default base-scheduling priority (which starts as Normal, or 8, unless the parent process was set to Idle or Below Normal, in which case the setting is inherited).
The default processor affinity for the threads in the process.
The process swapping state (resident, out-swapped, or in transition).
The NUMA ideal node (initially set to 0).
The thread seed, based on the ideal processor that the kernel has chosen for this process (which is based on the previously created process’s ideal processor, effectively randomizing this in a round-robin manner). Creating a new process will update the seed in KeNodeBlock (the initial NUMA node block) so that the next new process will get a different ideal processor seed.
The initial value (or reset value) of the process default quantum (which is described in more detail in the Thread Scheduling section later in the chapter), which is hard-coded to 6 until it is initialized later (by PspComputeQuantumAndPriority).
NOTE
The default initial quantum differs between Windows client and server systems. For more information on thread quantums, turn to their discussion in the section Thread Scheduling. Stage 3D: Concluding the Setup of the Process Address Space
Setting up the address space for a new process is somewhat complicated, so let’s look at what’s involved one step at a time. To get the most out of this section, you should have some familiarity with the internals of the Windows memory manager, which are described in Chapter 9.
The virtual memory manager sets the value of the process’s last trim time to the current time. The working set manager (which runs in the context of the balance set manager system thread) uses this value to determine when to initiate working set trimming.
The memory manager initializes the process’s working set list—page faults can now be taken.
The section (created when the image file was opened) is now mapped into the new process’s address space, and the process section base address is set to the base address of the image.
Ntdll.dll is mapped into the process.
NOTE
POSIX processes clone the address space of their parents, so they don’t have to go through these steps to create a new address space. In the case of POSIX applications, the new process’s section base address is set to that of its parent process and the parent’s PEB is cloned for the new process. Stage 3E: Setting Up the PEB
NtCreateUserProcess calls MmCreatePeb, which first maps the systemwide national language support (NLS) tables into the process’s address space. It next calls MiCreatePebOrTeb to allocate a page for the PEB and then initializes a number of fields, which are described in Table 5-7. Table 5-7. Initial Values of the Fields of the PEB
Field
Initial Value
ImageBaseAddress
Base address of section
NumberOfProcessors
KeNumberProcessors kernel variable
NtGlobalFlag
NtGlobalFlag kernel variable
CriticalSectionTimeout
MmCriticalSectionTimeout kernel variable
HeapSegmentReserve
MmHeapSegmentReserve kernel variable
HeapSegmentCommit
MmHeapSegmentCommit kernel variable
HeapDeCommitTotalFreeThreshold
MmHeapDeCommitTotalFreeThreshold kernel variable
HeapDeCommitFreeBlockThreshold
MmHeapDeCommitFreeBlockThreshold kernel variable
NumberOfHeaps
0
MaximumNumberOfHeaps
(Size of a page – size of a PEB) / 4
ProcessHeaps
First byte after PEB
MinimumStackCommit
MmMinimumStackCommitInBytes kernel variable
ImageProcessAffinityMask
KeActiveProcessors or 1 << MmRotatingUniprocessorNumber kernel variable (for uniprocessor-only images)
SessionId
Result of MmGetSessionId
ImageSubSystem
OptionalHeader.Subsystem
ImageSubSystemMajorVersion
OptionalHeader.MajorSubsystemVersion
ImageSubSystemMinorVersion
OptionalHeader.MinorSubsystemVersion
OSMajorVersion
NtMajorVersion kernel variable
OSMinorVersion
NtMinorVersion kernel variable
OSBuildNumber
NtBuildNumber kernel variable & 0x3FFF, combined with CmNtCSDVersion for service packs
OSPlatformId
2
However, if the image file specifies explicit Windows version or affinity values, this information replaces the initial values shown in Table 5-7. The mapping from image information fields to PEB fields is described in Table 5-8. Table 5-8. Windows Replacements for Initial PEB Values
Field Name
Value Taken from Image Header
OSMajorVersion
OptionalHeader.Win32VersionValue & 0xFF
OSMinorVersion
(OptionalHeader.Win32VersionValue >> 8) & 0xFF
OSBuildNumber
(OptionalHeader.Win32VersionValue >> 16) & 0x3FFF, combined with ImageLoadConfigDirectory.CSDVersion
OSPlatformId
(OptionalHeader.Win32VersionValue >> 30) ^ 0x2
ImageProcessAffinityMask
ImageLoadConfigDirectory.ProcessAffinityMask
If the image header characteristics IMAGE_FILE_UP_SYSTEM_ONLY flag is set (indicating that the image can run only on a uniprocessor system), a single CPU is chosen for all the threads in this new process to run on. The selection process is performed by simply cycling through the available processors—each time this type of image is run, the next processor is used. In this way, these types of images are spread evenly across the processors.
If the image specifies an explicit processor affinity mask (for example, a field in the configuration header), this value is copied to the PEB and later set as the default process affinity mask. Stage 3F: Completing the Setup of the Executive Process Object (PspInsertProcess)
Before the handle to the new process can be returned, a few final setup steps must be completed, which are performed by PspInsertProcess and its helper functions:
If systemwide auditing of processes is enabled (either as a result of local policy settings or group policy settings from a domain controller), the process’s creation is written to the Security event log.
If the parent process was contained in a job, the job is recovered from the job level set of the parent and then bound to the session of the newly created process. Finally, the new process is added to the job.
PspInsertProcess inserts the new process block at the end of the Windows list of active processes (PsActiveProcessHead).
The process debug port of the parent process is copied to the new child process, unless the NoDebugInherit flag is set (which can be requested when creating the process). If a debug port was specified, it is attached to the new process at this time.
Finally, PspInsertProcess notifies any registered callback routines, creates a handle for the new process by calling ObOpenObjectByPointer, and then returns this handle to the caller.
Stage 4: Creating the Initial Thread and Its Stack and Context
At this point, the Windows executive process object is completely set up. It still has no thread, however, so it can’t do anything yet. It’s now time to start that work. Normally, the PspCreateThread routine is responsible for all aspects of thread creation and is called by NtCreateThread when a new thread is being created. However, because the initial thread is created internally by the kernel without user-mode input, the two helper routines that PspCreateThread relies on are used instead: PspAllocateThread and PspInsertThread.
PspAllocateThread handles the actual creation and initialization of the executive thread object itself, while PspInsertThread handles the creation of the thread handle and security attributes and the call to KeStartThread to turn the executive object into a schedulable thread on the system. However, the thread won’t do anything yet—it is created in a suspended state and isn’t resumed until the process is completely initialized (as described in Stage 5).
NOTE
The thread parameter (which can’t be specified in CreateProcess but can be specified in CreateThread) is the address of the PEB. This parameter will be used by the initialization code that runs in the context of this new thread (as described in Stage 6).
PspAllocateThread performs the following steps:
An executive thread block (ETHREAD) is created and initialized.
Before the thread can execute, it needs a stack and a context in which to run, so these are set up. The stack size for the initial thread is taken from the image—there’s no way to specify another size.
The thread environment block (TEB) is allocated for the new thread.
The user-mode thread start address is stored in the ETHREAD. This is the system-supplied thread startup function in Ntdll.dll (RtlUserThreadStart). The user’s specified Windows start address is stored in the ETHREAD block in a different location so that debugging tools such as Process Explorer can query the information.
KeInitThread is called to set up the KTHREAD block. The thread’s initial and current base priorities are set to the process’s base priority, and its affinity and quantum are set to that of the process. This function also sets the initial thread ideal processor. (See the section Ideal and Last Processor for a description of how this is chosen.) KeInitThread next allocates a kernel stack for the thread and initializes the machine-dependent hardware context for the thread, including the context, trap, and exception frames. The thread’s context is set up so that the thread will start in kernel mode in KiThreadStartup. Finally, KeInitThread sets the thread’s state to Initialized and returns to PspAllocateThread.
Once that work is finished, NtCreateUserProcess will call PspInsertThread to perform the following steps:
A thread ID is generated for the new thread.
The thread count in the process object is incremented, and the thread is added into the process thread list.
The thread is put into a suspended state.
The object is inserted and any registered thread callbacks are called.
The handle is created with ObOpenObjectByName.
The thread is readied for execution by calling KeStartThread.
Stage 5: Performing Windows Subsystem–Specific Post-Initialization
Once NtCreateUserProcess returns with a success code, all the necessary executive process and thread objects have been created. Kernel32.dll will now perform various operations related to Windows subsystem–specific operations to finish initializing the process.
First of all, various checks are made for whether Windows should allow the executable to run. These checks includes validating the image version in the header and checking whether Windows application certification has blocked the process (through a group policy). On specialized editions of Windows Server 2008, such as Windows Web Server 2008 and Windows HPC Server 2008, additional checks are made to see if the application imports any disallowed APIs.
If software restriction policies dictate, a restricted token is created for the new process. Afterward, the application compatibility database is queried to see if an entry exists in either the registry or system application database for the process. Compatibility shims will not be applied at this point—the information will be stored in the PEB once the initial thread starts executing (Stage 6).
At this point, Kernel32.dll sends a message to the Windows subsystem so that it can set up SxS information (see the end of this section for more information on side-by-side assemblies) such as manifest files, DLL redirection paths, and out-of-process execution for the new process. It also initializes the Windows subsystem structures for the process and initial thread. The message includes the following information:
Process and thread handles
Entries in the creation flags
ID of the process’s creator
Flag indicating whether the process belongs to a Windows application (so that Csrss can determine whether or not to show the startup cursor)
UI language Information
DLL redirection and .local flags
Manifest file information
The Windows subsystem performs the following steps when it receives this message:
CsrCreateProcess duplicates a handle for the process and thread. In this step, the usage count of the process and the thread is incremented from 1 (which was set at creation time) to 2.
If a process priority class isn’t specified, CsrCreateProcess sets it according to the algorithm described earlier in this section.
The Csrss process block is allocated.
The new process’s exception port is set to be the general function port for the Windows subsystem so that the Windows subsystem will receive a message when a second chance exception occurs in the process. (For further information on exception handling, see Chapter 3.)
The Csrss thread block is allocated and initialized.
CsrCreateThread inserts the thread in the list of threads for the process.
The count of processes in this session is incremented.
The process shutdown level is set to 0x280 (the default process shutdown level—see SetProcessShutdownParameters in the MSDN Library documentation for more information).
The new process block is inserted into the list of Windows subsystem-wide processes.
The per-process data structure used by the kernel-mode part of the Windows subsystem (W32PROCESS structure) is allocated and initialized.
The application start cursor is displayed. This cursor is the familiar rolling doughnut shape—the way that Windows says to the user, “I’m starting something, but you can use the cursor in the meantime.” If the process doesn’t make a GUI call after 2 seconds, the cursor reverts to the standard pointer. If the process does make a GUI call in the allotted time, CsrCreateProcess waits 5 seconds for the application to show a window. After that time, CsrCreateProcess will reset the cursor again.
After Csrss has performed these steps, CreateProcess checks whether the process was run elevated (which means it was executed through ShellExecute and elevated by the AppInfo service after the consent dialog box was shown to the user). This includes checking whether the process was a setup program. If it was, the process’s token is opened, and the virtualization flag is turned on so that the application is virtualized. (See the information on UAC and virtualization in Chapter 6.) If the application contained elevation shims or had a requested elevation level in its manifest, the process is destroyed and an elevation request is sent to the AppInfo service. (See Chapter 6 for more information on elevation.)
Note that most of these checks are not performed for protected processes; because these processes must have been designed for Windows Vista or later, there’s no reason why they should require elevation, virtualization, or application compatibility checks and processing. Additionally, allowing mechanisms such as the shim engine to use its usual hooking and memory patching techniques on a protected process would result in a security hole if someone could figure how to insert arbitrary shims that modify the behavior of the protected process. Stage 6: Starting Execution of the Initial Thread
At this point, the process environment has been determined, resources for its threads to use have been allocated, the process has a thread, and the Windows subsystem knows about the new process. Unless the caller specified the CREATE_ SUSPENDED flag, the initial thread is now resumed so that it can start running and perform the remainder of the process initialization work that occurs in the context of the new process (Stage 7). Stage 7: Performing Process Initialization in the Context of the New Process
The new thread begins life running the kernel-mode thread startup routine KiThreadStartup. KiThreadStartup lowers the thread’s IRQL level from DPC/dispatch level to APC level and then calls the system initial thread routine, PspUserThreadStartup. The user-specified thread start address is passed as a parameter to this routine.
First, this function sets the Locale ID and the ideal processor in the TEB, based on the information present in kernel-mode data structures, and then it checks if thread creation actually failed. Next it calls DbgkCreateThread, which checks if image notifications were sent for the new process. If they weren’t, and notifications are enabled, an image notification is sent first for the process and then for the image load of Ntdll.dll. Note that this is done in this stage rather than when the images were first mapped, because the process ID (which is required for the callouts) is not yet allocated at that time.
Once those checks are completed, another check is performed to see whether the process is a debuggee. If it is, then PspUserThreadStartup checks if the debugger notifications have already been sent for this process. If not, then a create process message is sent through the debug object (if one is present) so that the process startup debug event (CREATE_PROCESS_DEBUG_INFO) can be sent to the appropriate debugger process. This is followed by a similar thread startup debug event and by another debug event for the image load of Ntdll.dll. DbgkCreateThread then waits for the Windows subsystem to get the reply from the debugger (via the ContinueDebugEvent function).
Now that the debugger has been notified, PspUserThreadStartup looks at the result of the initial check on the thread’s life. If it was killed on startup, the thread is terminated. This check is done after the debugger and image notifications to be sure that the kernel-mode and user-mode debuggers don’t miss information on the thread, even if the thread never got a chance to run.
Otherwise, the routine checks whether application prefetching is enabled on the system and, if so, calls the prefetcher (and Superfetch) to process the prefetch instruction file (if it exists) and prefetch pages referenced during the first 10 seconds the last time the process ran. (For details on the prefetcher and Superfetch, see Chapter 9.)
PspUserThreadStartup then checks if the systemwide cookie in the SharedUserData structure has been set up yet. If it hasn’t, it generates it based on a hash of system information such as the number of interrupts processed, DPC deliveries, and page faults. This systemwide cookie is used in the internal decoding and encoding of pointers, such as in the heap manager (for more information on heap manager security, see Chapter 9), to protect against certain classes of exploitation.
Finally, PspUserThreadStartup sets up the initial thunk context to run the image loader initialization routine (LdrInitializeThunk in Ntdll.dll), as well as the systemwide thread startup stub (RtlUserThreadStart in Ntdll.dll). These steps are done by editing the context of the thread in place and then issuing an exit from system service operation, which will load the specially crafted user context. The LdrInitializeThunk routine initializes the loader, heap manager, NLS tables, thread-local storage (TLS) and fiber-local storage (FLS) array, and critical section structures. It then loads any required DLLs and calls the DLL entry points with the DLL_PROCESS_ ATTACH function code. (See the sidebar “Side-by-Side Assemblies” for a description of a mechanism Windows uses to address DLL versioning problems.)
Once the function returns, NtContinue will restore the new user context and return back to user mode—thread execution now truly starts.
RtlUserThreadStart will use the address of the actual image entry point and the start parameter and call the application. These two parameters have also already been pushed onto the stack by the kernel. This complicated series of events has two purposes. First of all, it allows the image loader inside Ntdll.dll to set up the process internally and behind the scenes so that other user-mode code can run properly (otherwise, it would have no heap, no thread local storage, and so on).
Second, having all threads begin in a common routine allows them to be wrapped in exception handling, so that when they crash, Ntdll.dll is aware of that and can call the unhandled exception filter inside Kernel32.dll. It is also able to coordinate thread exit on return from the thread’s start routine and to perform various cleanup work. Application developers can also call SetUnhandledExceptionFilter to add their own unhandled exception handling code.
Side-by-Side Assemblies
In order to isolate DLLs distributed with applications from DLLs that ship with the operating system, Windows allows applications to use private copies of these core DLLs. To use a private copy of a DLL instead of the one in the system directory, an application’s installation must include a file named Application.exe.local (where Application is the name of the application’s executable), which directs the loader to first look for DLLs in that directory. Note that any DLLs that are loaded from the list of KnownDLLs (DLLs that are permanently mapped into memory) or that are loaded by those DLLs cannot be redirected using this mechanism.
To further address application and DLL compatibility while allowing sharing, Windows implements the concept of shared assemblies. An assembly consists of a group of resources, including DLLs, and an XML manifest file that describes the assembly and its contents. An application references an assembly through the existence of its own XML manifest. The manifest can be a file in the application’s installation directory that has the same name as the application with “.manifest” appended (for example, application.exe.manifest), or it can be linked into the application as a resource. The manifest describes the application and its dependence on assemblies.
There are two types of assemblies: private and shared. The difference between the two is that shared assemblies are digitally signed so that corruption or modification of their contents can be detected. In addition, shared assemblies are stored under the \Windows\Winsxs directory, whereas private assemblies are stored in an application’s installation directory. Thus, shared assemblies also have an associated catalog file (.cat) that contains its digital signature information. Shared assemblies can be “side-by-side” assemblies because multiple versions of a DLL can reside on a system simultaneously, with applications dependent on a particular version of a DLL always using that particular version.
An assembly’s manifest file typically has a name that includes the name of the assembly, version information, some text that represents a unique signature, and the extension “.manifest”. The manifests are stored in \Windows\Winsxs\Manifests, and the rest of the assembly’s resources are stored in subdirectories of \Windows\Winsxs that have the same name as the corresponding manifest files, with the exception of the trailing .manifest extension.
An example of a shared assembly is version 6 of the Windows common controls DLL, comctl32.dll. Its manifest file is named \Windows\Winsxs\Manifests\x86_Microsoft.Windows.Common-Controls_6595b64144ccf1df_6.0.0.0_x-ww_1382d70a.manifest. It has an associated catalog file (which is the same name with the .cat extension) and a subdirectory of Winsxs that includes comctl32.dll.
Version 6 of Comctl32.dll added integration with Windows themes, and because applications not written with theme support in mind might not appear correctly with the new DLL, it’s available only to applications that explicitly reference the shared assembly containing it—the version of Comctl32.dll installed in \Windows\System32 is an instance of version 5.x, which is not theme aware. When an application loads, the loader looks for the application’s manifest, and if one exists, loads the DLLs from the assemblies specified. DLLs not included in assemblies referenced in the manifest are loaded in the traditional way. Legacy applications, therefore, link against the version in \Windows\System32, whereas theme-aware applications can specify the new version in their manifest.
A final advantage that shared assemblies have is that a publisher can issue a publisher configuration, which can redirect all applications that use a particular assembly to use an updated version. Publishers would do this if they were preserving backward compatibility while addressing bugs. Ultimately, however, because of the flexibility inherent in the assembly model, an application could decide to override the new setting and continue to use an older version.
EXPERIMENT: Tracing Process Startup
Now that we’ve looked in detail at how a process starts up and the different operations required to begin executing an application, we’re going to use Process Monitor to take a look at some of the file I/O and registry keys that are accessed during this process.
Although this experiment will not provide a complete picture of all the internal steps we’ve described, you’ll be able to see several parts of the system in action, notably Prefetch and Superfetch, image file execution options and other compatibility checks, and the image loader’s DLL mapping.
We’re going to be looking at a very simple executable—Notepad.exe—and we will be launching it from a Command Prompt window (Cmd.exe). It’s important that we look both at the operations inside Cmd.exe and those inside Notepad.exe. Recall that a lot of the user-mode work is performed by CreateProcess, which is called by the parent process before the kernel has created a new process object.
To set things up correctly, add two filters to Process Monitor: one for Cmd.exe, and one for Notepad.exe—these are the only two processes we want to include. It will be helpful to be sure that you don’t have any currently running instances of these two processes so that you know you’re looking at the right events. The filter window should look like this: httpatomoreillycomsourcemspimages892266.jpg
Next, make sure that event logging is currently disabled (clear File, Capture Events), and then start up the command prompt. Enable event logging (using the File menu again, or simply press CTRL+E or click the magnifying glass icon on the toolbar) and then enter Notepad.exe and press Enter. On a typical Windows Vista system, you should see anywhere between 500 and 1500 events appear. Go ahead and hide the Sequence and Time Of Day columns so that we can focus our attention on the columns of interest. Your window should look similar to the one shown next. httpatomoreillycomsourcemspimages892268.png
Just as described in Stage 1 of the CreateProcess flow, one of the first things to notice is that just before the process is started and the first thread is created, Cmd.exe does a registry read at HKLM\SOFTWARE\Microsoft\Windows NT\CurrentVersion\Image File Execution Options. Because there were no image execution options associated with Notepad.exe, the process was created as is.
As with this and any other event in Process Monitor’s log, you have the ability to see whether each part of the process creation flow was performed in user mode or kernel mode, and by which routines, by looking at the stack of the event. To do this, double-click on the RegOpenKey event mentioned and switch to the Stack tab. The following screen shows the standard stack on a 32-bit Windows Vista machine. httpatomoreillycomsourcemspimages892270.png
This stack shows that we have already reached the part of process creation performed in kernel mode (through NtCreateUserProcess) and that the helper routine PspAllocateProcess is responsible for this check.
Going down the list of events after the thread and process have been created, you will notice three groups of events. The first is a simple check for application compatibility flags, which will let the user-mode process creation code know if checks inside the application compatibility database are required through the shim engine.
This check is followed by multiple reads to Side-By-Side, Manifest, and MUI/Language keys, which are part of the assembly framework mentioned earlier. Finally, you may see file I/O to one or more .sdb files, which are the application compatibility databases on the system. This I/O is where additional checks are done to see if the shim engine needs to be invoked for this application. Since Notepad is a well behaved Microsoft program, it doesn’t require any shims.
The following screen shows the next series of events, which happen inside the Notepad process itself. These are actions initiated by the user-mode thread startup wrapper in kernel mode, which performs the actions described earlier. The first two are the Notepad.exe and Ntdll.dll image load debug notification messages, which can only be generated now that code is running inside Notepad’s process context and not the context for the command prompt. httpatomoreillycomsourcemspimages892272.png
Next, the prefetcher kicks in, looking for a prefetch database file that has already been generated for Notepad. (For more information on the prefetcher, see Chapter 9). On a system where Notepad has already been run at least once, this database will exist, and the prefetcher will begin executing the commands specified inside it. If this is the case, scrolling down you will see multiple DLLs being read and queried. Unlike typical DLL loading, which is done by the user-mode image loader by looking at the import tables or when an application manually loads a DLL, these events are being generated by the prefetcher, which is already aware of the libraries that Notepad will require. Typical image loading of the DLLs required happens next, and you will see events similar to the ones shown here. httpatomoreillycomsourcemspimages892274.png
These events are now being generated from code running inside user mode, which was called once the kernel-mode wrapper function finished its work. Therefore, these are the first events coming from LdrpInitializeProcess, which we mentioned is the internal system wrapper function for any new process, before the start address wrapper is called. You can confirm this on your own by looking at the stack of these events; for example, the kernel32.dll image load event, which is shown in the next screen. httpatomoreillycomsourcemspimages892276.png
Further events are generated by this routine and its associated helper functions until you finally reach events generated by the WinMain function inside Notepad, which is where code under the developer’s control is now being executed. Describing in detail all the events and user-mode components that come into play during process execution would fill up this entire chapter, so exploration of any further events is left as an exercise for the reader.