Analyzing x64 and x86 executables - itzjac/cpplearning GitHub Wiki

This section is completely optional and exclusive to Windows.

We provide the tools and methods to use AT&T (Linux default) and Intel (Windows default) syntax inside Windows. Similar approach can be followed if running in Linux. That's left as an exercise.

Reviewing the process to compile a program

Compilation

we will analyze the assembly code generated through the obj files. This process known as dissassembly because we are recovering from a binary file, the obj, the assembly code that was generated to produce it. We keep using the objdump command as in previous lesson.

Analyzing empty body in main

Pre-processed files contain same results for both platforms.

# 1 "main.cpp"
# 1 "<built-in>"
# 1 "<command-line>"
# 1 "main.cpp"
int main()
{
    return 0;
}

Disassembled objs. emptrydissasembly

Skimming through the assembly code.

  1. Save a BasePointer BP in a stack, when entering to the main scope (EBP and RBP respectively)

  2. Assign the new BP with the current StackPointer SP (ESP, RSP respectively)

  3. BP <= SP, remember with AT&T syntax the assignation is done on the right side

  4. The SP is aligned to a memory address (which can vary on different compilers)

  5. ESP is aligned to multiple of 16 addresses

  6. RSP allocates 32 bytes of memory, which is also done in order to align to multiple of 32 addresses

  7. Executes next instruction after the CALL with realignment in place. Intel manual E8 opcode IntelManual opcodes page 180 the E8 cd opcode (the opcode E8 00 00 00 00 is 5 bytes long). A great explanaition on how to interpret opcode using Intel manuals Instruction-Encoding-Revealed-Bit-Twiddling

a. In x86 the relative address to main is a 1 byte number + 1 byte of offset: the result is specified in E8 like => E8 00 00 00 0B b. In x64 the relative address is a 2 byte number+ 1 byte of offset: the result is specified in E8 => E8 00 00 00 0D

  1. The EAX register is cleaned with zero, that's our return value in main (as an experiment, change the return value which will definitely be reflected in this section of the asm, remember that returning any other number than zero from main, is considered an abnormal termination or an error code)

  2. Finally when leaving the function, ret

  3. In x86, leave instruction restores the stack frame SP<=BP and returns from this call by restoring a previously saved EIP

  4. RSP releases 32 bytes of memory those needed for the alignment, pops from the stack pointer BP resulting in SP<=BP and returns from this call by restoring a previously saved EIP

A notorious difference is in the step 3, where the RSP allocates 32 bytes of memory (0x20 in hexadecimal) while ESP perfoms an AND operation to achieve similar behavior. This alignment specifications can be found in the www.x86-64.org section 3.2.2 Stack Frame. The goal of both alignments, ESP and RSP, is to provide memory alignment in one case a 16 byte and in the other a 32 byte alignment. Not all compilers follow this specifications, be aware you will find different types of alignments for other compilers.

Another notorious difference is the executables sizes

x64 129Kb

x86 102Kb

Can you answer without further analyzing the dissassembled code why this happens?

Main using iostream

#include <iostream>
int main()
{
    std::cout<<"Comparing platforms"<<std::endl;
    return 0;
}

Pre-processor files are now using different libraries, therefore

preprocessor

Dissasembled objs

dissasembled

And lastly the executables sizes will again show differences.

x64 2.645Kb

x86 1,583Kb

Answering the question in previous analysis, remember that the size of pointers on each architecture and different, 4 bytes and 8 bytes respectively, this can add up quickly and

is obviously reflected in the resulting executable.

objdump: Changing AT&T syntax to Intel

If you feel annoyed by the AT&T syntax (I did), it is possible to switch to the Intel syntax --disassembler-options=intel. Add this option in your bat file where the objdump is produced.

objdump  -D --disassembler-options=intel main"%platformname%".obj > main"%platformname%".asm

The resulting asm.

mainx64.obj:     file format pe-x86-64
Disassembly of section .text:
0000000000000000 <main>:
   0:	55                   	push   rbp
   1:	48 89 e5             	mov    rbp,rsp
   4:	48 83 ec 20          	sub    rsp,0x20
   8:	e8 00 00 00 00       	call   d <main+0xd>
   d:	48 8d 15 00 00 00 00 	lea    rdx,[rip+0x0]        # 14 <main+0x14>
  14:	48 8b 0d 00 00 00 00 	mov    rcx,QWORD PTR [rip+0x0]        # 1b <main+0x1b>
  1b:	e8 00 00 00 00       	call   20 <main+0x20>
  20:	48 8b 15 00 00 00 00 	mov    rdx,QWORD PTR [rip+0x0]        # 27 <main+0x27>
  27:	48 89 c1             	mov    rcx,rax
  2a:	e8 00 00 00 00       	call   2f <main+0x2f>
  2f:	b8 00 00 00 00       	mov    eax,0x0
  34:	48 83 c4 20          	add    rsp,0x20
  38:	5d                   	pop    rbp
  39:	c3                   	ret    

Compiler flags

Optimizing flags

g++ contains 4 levels of optimization levels, o0 to o3. Let's look how the optimized code level -O3 specified through the g++ command line. We will generate optimized assembly code for the following program.

#include <iostream>
int main()
{
    std::cout<<"Optimization flag"<<std::endl;
    return 0;
}

To generate the level -O3 of optimization the g++ needs to use the following syntax


BATCH
g++ -c -O3 main.cpp -o main.obj
objdump  -D --disassembler-options=intel main.obj > main.asm
g++ -o main.exe main.obj 

From the Switching Platform section in the previous chapter, you can extend this lines to create an script to generate both x86 and x64.

optimization

For such a short program, we can easily identify an optimization at assembly level. The operation mov eax, 0x0 5 bytes size, is replaced with xor eax, eax, 2 bytes size. A rule of thumb is optimized code means less code. But also, optimized versions of opcode are provided if possible, such as in the example above.

Debugging symbols

Warnings

⚠️ **GitHub.com Fallback** ⚠️