Vlad Ioan Topan

My playground

Posts Tagged ‘C

Compiling a tiny executable (or DLL) in FASM (or MSVC) + templates – part II: MSVC

with one comment

Since creating tiny executables and DLLs in FASM was pretty straight-forward, we’ll challenge ourselves a bit: we’ll do it in MSVC. To be fully honest, creating a tiny executable using MSVC is actually more than a bit of a challenge, but it is doable.

An empty “int main(…) {return 0;}” program compiled with Microsoft’s Visual C++ compiler is already around 38 KB (using VS 2008, but other versions should yield similar results) due to the CRT (also calledcrt0, which implements (parts of) the C language runtime) code included in the executable. The CRT is responsible for things such as implementing the C API for working with files (fopen & co.), strings (strcpy & co.) etc., the malloc-based memory allocator, the well-known int main(…) entrypoint, etc.

[Sidenote] All Windows (PE) executables actually have a WinMain entrypoint, where a CRT stub is placed which, among many other things, parses the single-string command line and splits it into “parameters” passed as argv[] to main().

If we remove the CRT, we have to use Windows API functions to replace it (on Windows, obviously).

[Sidenote] We could alternatively link against a dynamic version of the runtime, but the code would still be much bigger than it needs to be, with the added pain of having a dependency on the Microsoft Visual C++ Redistributables, which have caused plenty of annoyance due to the somewhat cryptic error messages reported about some MSVCR##.DLL missing when the packages aren’t installed on the user’s machine.

The file management functions have good equivalents in the Windows API (start from CreateFile). The strings APIs are somewhat harder to find, but present. Memory allocation APIs are plentiful (LocalAlloc or HeapAlloc are good starts, or even VirtualAlloc for more complicated things). The fact that the WinMain() standard entrypoint does not provide “digested” arguments (like C’s argv/argc main() arguments) can also be handled using Windows API calls, but for some reason only the WideChar variant of CommandLineToArgv (i.e. CommandLineToArgvW) is implemented, so we’ll work with WideChar functions in the example.

Let’s create a 64-bit basic DLL loader and a sample DLL. First, the loader’s source code:

#include <windows.h>

    _In_ HINSTANCE hInstance,
    _In_ HINSTANCE hPrevInstance,
    _In_ LPSTR lpCmdLine,
    _In_ int nCmdShow
    WCHAR *cmdline;
    int argc;
    WCHAR **argv;
    /// we aren't interested in any of the standard WinMain parameters
    /// get the WideChar command line, split it
    cmdline = GetCommandLineW();
    argv = CommandLineToArgvW(cmdline, &argc);
    /// assume the first parameter is a DLL name, load it

    /// free the "digested" parameters

In the interest of keeping things simple, this code is fugly and evil (it doesn’t check error codes), but it does work.

We have a standard WinMain entrypoint. We get the WideChar version of the command line using GetCommandLineW(), then
split it using CommandLineToArgvW() into the more familiar (to C devs at least) argc/argv pair. We call LoadLibraryW on
the first argument, which we assume is the DLL name, then free the argv obtained from CommandLineToArgvW() and exit.

The DLL is as basic as possible, just a header and a source (the header is given to serve as a template, it’s not actually necessary in this case).

The win64dll.h file:

#pragma once

#define AWESOME_API __declspec(dllexport)
#define AWESOME_API __declspec(dllimport)

And the .c:

#include "win64dll.h"
#include <windows.h>

    HMODULE hModule,
    DWORD  dwReason,
    LPVOID lpReserved

    switch (dwReason)
        case DLL_PROCESS_ATTACH:
            MessageBox(0, "Hello!", "Hello!", MB_OK);
        case DLL_THREAD_ATTACH:
        case DLL_THREAD_DETACH:
        case DLL_PROCESS_DETACH:
    return TRUE;

Now for the hard part: compiling without the CRT. Since it’s easier to compile from the command line, or at least easier to write command line arguments than project settings/checkboxes in an article, let’s start off with compiling the loader above (presumably called loader.c) with the CRT and go from there:

cl.exe loader.c /link shell32.lib

Assuming your environment is set up correctly (see sidenote below about setting up the environment), this will produce a ~32 KB executable (on VS 2010 at least). The /link shell32.lib parameter is necessary because that lib contains the CommandLineToArgvW function we used. To get rid of the CRT, the /NODEFAULTLIB parameter is used. We’ll also need to explicitly define WinMain as the entrypoint, and add kernel32.lib to the library list, since it’s one of the “default” libs which is no longer automatically linked:

cl.exe loader.c /link shell32.lib kernel32.lib /nodefaultlib /entry:WinMain

[Sidenote] The linker parameters (the ones after /link) are NOT case-sensitive, either /nodefaultlib or /NODEFAULTLIB will work.

This should produce a 2560 byte executable, more than 12 times smaller than the CRT-based one, and doing exactly the same thing.

[Sidenote Q&A] Q. How to tell which .lib files are needed?
A. When the necessary .lib is not given as a command line argument, the compiler complains with an error message such as:

loader.obj : error LNK2019: unresolved external symbol __imp__LoadLibraryW@4 referenced in function _WinMain@16

This means the LoadLibraryW() function is missing it’s .lib; to find it, simply search for the LoadLibraryW text in the lib folder of the SDK (in a path similar to: “C:\Program Files (x86)\Microsoft SDKs\Windows\v7.1\Lib“) and use the lib it’s found in (in this case, kernel32.lib). As an alternative to searching in the (binary) .lib files, you can check the MSDN reference for the given function (search online for something like “MSDN LoadLibraryW”), which gives the lib filename in the “Library” section of the “Requirements” table at the bottom.

Compiling the DLL needs a few more paramters:

cl.exe win64dll.c /link shell32.lib kernel32.lib user32.lib /nodefaultlib /entry:DllMain /dll /out:win64dll.dll
  • user32.lib is needed for MessageBoxA
  • the /dll switch tells the linker to create a DLL
  • the exact name must be specified (via /out:win64dll.dll), because the linker defaults to creating .exe-s

This should yield a 3072 byte DLL. It’s slightly larger than the executable because of the export section (and the fact that each section’s size is rounded up to the FileAlignment (PE header) value, which defaults to 0x200 == 512, exactly the difference which we got, which also covers the small difference in size between the actual code produced).

After building the executable and the DLL, running:

loader.exe win64dll.dll

should pop up a “Hello!” message box, then exit.

[Sidenote] Setting up the cl.exe environment involves:

  • adding the VC\BIN subfolder of the MS Visual Studio installation to the %PATH% environment variable (should look like “C:\Program Files (x86)\Microsoft Visual Studio 10.0\VC\bin”)
  • adding to the %INCLUDE% environment variable the INCLUDE subfolders of:
    • the MS Platform SDK installation (e.g. “C:\Program Files (x86)\Microsoft SDKs\Windows\v7.0A\Include”)
    • the MS Visual Studio installation (e.g. “C:\Program Files (x86)\Microsoft Visual Studio 10.0\VC\Include”)
  • adding to the %LIB% environment variable the LIB subfolders of the VS and PSDK

The easiest way to do this is via a setupvc.bat file containing:

@echo off
SET VS_PATH=C:\Program Files (x86)\Microsoft Visual Studio 10.0
SET SDK_PATH=C:\Program Files (x86)\Microsoft SDKs\Windows\v7.0A

After adjusting the paths as needed by your setup, run this before running cl.exe from the command line.

The first part of this text deals with tiny executables in FASM.

Written by vtopan

February 17, 2014 at 10:23 PM

Posted in C, programming, Snippets

Tagged with , , , , ,

Listing the modules (DLLs) of the current process without API functions

with one comment

There is a simple way to walk the list of loaded modules (DLLs) of the current Windows process without calling any API functions. It also works with other processes, but it obviously involves reading the respective process’ memory space, which in turn involves using the aptly-named ReadProcessMemory function, defeating the whole purpose of not using APIs.

The PEB and the TIB

The PEB (Process Enviroment Block), briefly documented here (the page which passes as “documentation” on MSDN just… isn’t) is a structure (stored for each process in it’s own memory space) which holds various process parameters used by the OS such as the PID, a flag set if the process is being debugged, some localization information etc. The PEB is pointed to by the TIB (Thread Information Block), which is always located at FS:[0] (if that sounds like black magic, either pick up a book on assembly language and/or x86 processor architecture, or ignore that piece of information and use the code below to get to it).

One of the PEB entries is a list of, well, the loaded modules of the process. Actually, a PEB member points to a structure called LoaderData (it’s type being PEB_LDR_DATA), again “almost documented” by MS, which points to the list we’re interested in, looking something like this:

typedef struct LDR_DATA_ENTRY {
  LIST_ENTRY              InMemoryOrderModuleList;
  PVOID                   BaseAddress;
  PVOID                   EntryPoint;
  ULONG                   SizeOfImage;
  UNICODE_STRING          FullDllName;
  UNICODE_STRING          BaseDllName;
  ULONG                   Flags;
  SHORT                   LoadCount;
  SHORT                   TlsIndex;
  LIST_ENTRY              HashTableEntry;
  ULONG                   TimeDateStamp;

It’s a linked list (the LIST_ENTRY structure contains the forward pointer called Flink) which contains serveral useful pieces of information about each module, among which it’s image base (BaseAddress), virtual address of the entrypoint (EntryPoint) (NOT the RVA you find in the PE header, but the actual computed VA) and DLL name with (FullDllName) and without (BaseDllName) the path, as UNICODE_STRINGs. The list is circularly linked and has a sentinel element with the BaseAddress member set to 0.

The list entries should be of the LDR_DATA_TABLE_ENTRY data type described by MS here, but the actual structure found in-memory doesn’t have the first member (BYTE Reserved1[2]). An alternative (and more complete) definition of the structure can be found here, but it has three LIST_ENTRY members at the beginning instead of just one.

Getting a pointer to the list

We retrieve the pointer to the list from the PEB using inline assembly from the C source code:

PLDR_DATA_ENTRY firstLdrDataEntry() {
   __asm {
      mov eax, fs:[0x30]  // PEB
      mov eax, [eax+0x0C] // PEB_LDR_DATA
      mov eax, [eax+0x1C] // InInitializationOrderModuleList

There are actually three lists of modules, sorted by three different criterias; the one we’re using is sorted by the order in which the modules were loaded.


Using the structure defined above and the firstLdrDataEntry function, it becomes trivial to walk the list of loaded modules:

void main() {
   PLDR_DATA_ENTRY cursor;
   cursor = firstLdrDataEntry();
   while (cursor->BaseAddress) {
      printf( "Module [%S] loaded at [%p] with entrypoint at [%p]\n",
              cursor->BaseDllName.Buffer, cursor->BaseAddress,
      cursor = (PLDR_DATA_ENTRY)cursor->InMemoryOrderModuleList.Flink;

Written by vtopan

May 27, 2009 at 11:22 PM

Translating headers: struct / record field alignment in C / Delphi

with 4 comments

If you’ve ever taken a shot at porting headers from C to Delphi, you may have noticed the different way in which the struct/record fields are aligned, but most likely you havent. And if you had the bad luck of stumbling over a struct that gets aligned differently when ported to Delphi, even though the data types were translated correctly, you’re probably interested in what happens behind the scenes and how to fix it. Well, here it goes.

Why variable addresses are aligned in memory

As explained in depth here, because of it’s internal structure, the time required by the CPU to access data from the RAM depends on the alignment of the address it requests (the “aligned” memory addresses being accessed faster). Aligned means that the address should be a multiple of the CPU’s memory word (not to be confused with the WORD data type in C/Delphi, which is always 2 bytes), which is the least amount of memory the CPU can get from RAM at one time. It’s also the size of the general purpose registers on the CPU, and it represents the “chunk” of data the CPU is most “comfortable” working with. On x86 architectures it’s 32 bits, which means 4 bytes (on 64 bit architectures it’s… well, 64 bits = 8 bytes). So, the memory addresses should be multiples of 4 on x86 systems for faster access.

Nitpicking corner: why is the WORD type 2 bytes long then, if the memory word on x86 is 4 bytes? Well, it’s not actually *all* the x86 architectures that have a 4 byte memory word, just the 386-and-higher ones (the aptly-named 32-bit architectures). But back when dinosaurs still roamed the Earth and “640K of memory were enough for anybody” (to paraphrase (?) one of the foremost computer visionaries), the computer word was just 16 bits, or 2 bytes, and that’s when the WORD data type was coined (without very much foresight, truth be told).

Struct field alignment in C

According to this paragraph on Wikipedia, struct member fields are aligned based on their size by padding them with, basically, their size in bytes. So, assuming we declare the structure:

typedef struct {
} foo;

you might be tempted to assume it will take 8 bytes of memory. But, due to the afore-mentioned alignment, b will be aligned on a 2-byte boundary, and c on a 4-byte boundary, so that structure will actually be equivalent to this one:

typedef struct {
BYTE padding_for_b[1]; // 1-byte padding to align .b
BYTE padding_for_d[3]; // 3-byte padding to align .d
} foo;

and take precisely 12 bytes. You can force C’s hand not to align the members using the #pragma pack(1) directive, but (a) the code will be slower and (b) you can’t do that to standard structures in the Windows API for obvious reasons.

Record field alignment in Delphi

As explained in the “align fields compiler directive” help entry, in Delphi the record field alignment is not based on the field’s data type size, but on a constant specified by the ALIGN (A) directive (which can also be set for the entire project from Project->Options->Compiler->Record field alignment.
The default is {$ALIGN 8} (short: {$A8}), which means alignment to an 8-byte boundary. However, that’s how things work only in theory. In practice, for some mysterious reason, Delphi does exactly what C does: unless the alignment is set to 1 (eg. by {$A1}), it aligns record fields based on their size.

The difference

But if Delphi behaves just like C, why do things sometimes still crash? They do because of one exception to the rule: enumerations are treated differently. In C, they are 4 bytes in size by default (this can be changed using the #pragma enum directive). In Delphi, the enumeration size is controlled by the $Z directive, and by default it’s set to 1. Therefore, in order to get the proper translation of API structures from C to Delphi, you should use the {$Z4} directive.

Written by vtopan

March 10, 2009 at 11:37 PM

Posted in Delphi

Tagged with , , , , ,

Converting C structs to Delphi

leave a comment »

In the process of writing a Process Explorer meets HijackThis one-size-fits-all monster app, I faced the rather tedious task of porting Windows SDK headers (some less documented than others) to Delphi. As expected, I ran into annoying gotchas; here are some of them.

Boolean type in C / Windows SDK

The C specs didn’t define a boolean type up until C99 (which introduced the bool data type), which led to a variety of “standards”. The SDK uses the BOOLEAN type, which is actually a BYTE, and should be mapped to the Boolean type in Delphi, not BOOL/LongBool (yes, I am well aware that this may be a “duh!” for some).

Other type mappings

UCHAR is unsigned char, not “unicode char” as some might expect, so it translates to BYTE, not WORD.

Struct to record mappings

See this post.

[to be continued]

Written by vtopan

March 10, 2009 at 2:49 AM

Posted in Delphi

Tagged with , , ,