x86 assembly on a 64 bit Linux Part 2 - itzjac/cpplearning GitHub Wiki
Analyze the x86 asm generated by TCC when compiling a C file and how to create executables with assembly language.
Requirements
- Basic linux usage and terminal
- Configuring, Compiling, linking C programs
- objdump, nasm
TCC doesn't allow you to generate disassembly instructions directly. But we can use the utility command objdump. With ELF 32-bit file generated in Part 1, we can create the disassembly code
$objdump -d example1 > example1.asm
The asm file contains the full disassembly code of the executable example1.
example1: file format elf32-i386
Disassembly of section .text:
# removed code for readability
..
08048245 <sub>:
8048245: 55 push %ebp
8048246: 89 e5 mov %esp,%ebp
8048248: 81 ec 00 00 00 00 sub $0x0,%esp
804824e: 8b 45 08 mov 0x8(%ebp),%eax
8048251: c1 e0 01 shl $0x1,%eax
8048254: 8b 4d 0c mov 0xc(%ebp),%ecx
8048257: 01 c8 add %ecx,%eax
8048259: c9 leave
804825a: c3 ret
0804825b <main>:
804825b: 55 push %ebp
804825c: 89 e5 mov %esp,%ebp
804825e: 81 ec 04 00 00 00 sub $0x4,%esp
8048264: 8b 45 0c mov 0xc(%ebp),%eax
8048267: 83 c0 04 add $0x4,%eax
804826a: 8b 08 mov (%eax),%ecx
804826c: 51 push %ecx
804826d: e8 ee 01 00 00 call 8048460 <atoi@plt>
8048272: 83 c4 04 add $0x4,%esp
8048275: 89 45 fc mov %eax,-0x4(%ebp)
8048278: 8b 45 fc mov -0x4(%ebp),%eax
804827b: 50 push %eax
804827c: 8b 45 08 mov 0x8(%ebp),%eax
804827f: 50 push %eax
8048280: e8 c0 ff ff ff call 8048245 <sub>
8048285: 83 c4 08 add $0x8,%esp
8048288: c9 leave
8048289: c3 ret
804828a: 00 00 add %al,(%eax)
804828c: 00 00 add %al,(%eax)
...
By inspecting the first line of the asm, we can again confirm, the ELF is a 32-bit executable.
Have you noticed the AT&T syntax? objdump command dumps assembly instructions with the AT&T syntax in the linux OSs by default. If an Intel syntax is more convenient to read, we can still use objdump to generate the same assembly code.
$ objdump -M intel -d example1 > example1.asm
Confirming the opcode for each line of asm code (in the second column), is exactly the same as with the previous dump file, i.e. both files contain exactly the same executable code.
example1: file format elf32-i386
Disassembly of section .text:
# removed code for readability
....
08048245 <sub>:
8048245: 55 push ebp
8048246: 89 e5 mov ebp,esp
8048248: 81 ec 00 00 00 00 sub esp,0x0
804824e: 8b 45 08 mov eax,DWORD PTR [ebp+0x8]
8048251: c1 e0 01 shl eax,0x1
8048254: 8b 4d 0c mov ecx,DWORD PTR [ebp+0xc]
8048257: 01 c8 add eax,ecx
8048259: c9 leave
804825a: c3 ret
0804825b <main>:
804825b: 55 push ebp
804825c: 89 e5 mov ebp,esp
804825e: 81 ec 04 00 00 00 sub esp,0x4
8048264: 8b 45 0c mov eax,DWORD PTR [ebp+0xc]
8048267: 83 c0 04 add eax,0x4
804826a: 8b 08 mov ecx,DWORD PTR [eax]
804826c: 51 push ecx
804826d: e8 ee 01 00 00 call 8048460 <atoi@plt>
8048272: 83 c4 04 add esp,0x4
8048275: 89 45 fc mov DWORD PTR [ebp-0x4],eax
8048278: 8b 45 fc mov eax,DWORD PTR [ebp-0x4]
804827b: 50 push eax
804827c: 8b 45 08 mov eax,DWORD PTR [ebp+0x8]
804827f: 50 push eax
8048280: e8 c0 ff ff ff call 8048245 <sub>
8048285: 83 c4 08 add esp,0x8
8048288: c9 leave
8048289: c3 ret
804828a: 00 00 add BYTE PTR [eax],al
804828c: 00 00 add BYTE PTR [eax],al
...
Let's create a simple asm program that prints the cpu id. Because we are using a linux OS and TCC, the asm source needs to be written in AT&T syntax. Though with more complex and complete compilers like gcc, it is possible to write it with Intel syntax.
A file cpuid.s
# extract the processor Vendor ID
.section .data
output:
.ascii "The processor Vendor ID is 'xxxxxxxxxxxx'\n"
.section .text
.globl main
main:
movl $0, %eax
cpuid
movl $output, %edi
movl %ebx, 28(%edi)
movl %edx, 32(%edi)
movl %ecx, 36(%edi)
movl $4, %eax
movl $1, %ebx
movl $output, %ecx
movl $42, %edx
int $0x80
movl $1, %eax
movl $0, %ebx
int $0x80
Using TCC to assembly the cpuid asm source file to create an obj
$tcc -m32 -c -o cpuid.obj cpuid.s
Then create the executable with the obj file
$tcc -m32 -o cpuid cpuid.obj
Again to confirm it is another ELF 32-bit file, can be used with objdump with intel syntax to get the full assembly code.
cpuid: file format elf32-i386
Disassembly of section .text:
# removed code for readability
...
08048228 <main>:
8048228: b8 00 00 00 00 mov eax,0x0
804822d: 0f a2 cpuid
804822f: bf 44 94 04 08 mov edi,0x8049444
8048234: 89 5f 1c mov DWORD PTR [edi+0x1c],ebx
8048237: 89 57 20 mov DWORD PTR [edi+0x20],edx
804823a: 89 4f 24 mov DWORD PTR [edi+0x24],ecx
804823d: b8 04 00 00 00 mov eax,0x4
8048242: bb 01 00 00 00 mov ebx,0x1
8048247: b9 44 94 04 08 mov ecx,0x8049444
804824c: ba 2a 00 00 00 mov edx,0x2a
8048251: cd 80 int 0x80
8048253: b8 01 00 00 00 mov eax,0x1
8048258: bb 00 00 00 00 mov ebx,0x0
804825d: cd 80 int 0x80
804825f: 00 f3 add bl,dh
8048261: 0f 1e fb nop ebx
8048264: 55 push ebp
8048265: e8 a2 01 00 00 call 804840c <_init+0x48>
804826a: 81 c5 5a 12 00 00 add ebp,0x125a
8048270: 57 push edi
8048271: 56 push esi
8048272: 53 push ebx
8048273: 83 ec 0c sub esp,0xc
8048276: 89 eb mov ebx,ebp
8048278: 8b 7c 24 28 mov edi,DWORD PTR [esp+0x28]
804827c: e8 af 01 00 00 call 8048430 <_init@plt>
8048281: 8d 9d 4c ef ff ff lea ebx,[ebp-0x10b4]
8048287: 8d 85 4c ef ff ff lea eax,[ebp-0x10b4]
804828d: 29 c3 sub ebx,eax
804828f: c1 fb 02 sar ebx,0x2
8048292: 74 29 je 80482bd <main+0x95>
8048294: 31 f6 xor esi,esi
8048296: 8d b4 26 00 00 00 00 lea esi,[esi+eiz*1+0x0]
804829d: 8d 76 00 lea esi,[esi+0x0]
80482a0: 83 ec 04 sub esp,0x4
80482a3: 57 push edi
80482a4: ff 74 24 2c push DWORD PTR [esp+0x2c]
80482a8: ff 74 24 2c push DWORD PTR [esp+0x2c]
80482ac: ff 94 b5 4c ef ff ff call DWORD PTR [ebp+esi*4-0x10b4]
80482b3: 83 c6 01 add esi,0x1
80482b6: 83 c4 10 add esp,0x10
80482b9: 39 f3 cmp ebx,esi
80482bb: 75 e3 jne 80482a0 <main+0x78>
80482bd: 83 c4 0c add esp,0xc
80482c0: 5b pop ebx
80482c1: 5e pop esi
80482c2: 5f pop edi
80482c3: 5d pop ebp
80482c4: c3 ret
80482c5: 8d b4 26 00 00 00 00 lea esi,[esi+eiz*1+0x0]
80482cc: 8d 74 26 00 lea esi,[esi+eiz*1+0x0]
80482d0: f3 0f 1e fb endbr32
80482d4: c3 ret
cpuid execution
An executable program with 2388 bytes and verify we have a GenuineIntel.
If you would prefer using common C, like printf function calls to log the cpu id, instead of the system call (int 0x80), it is possible by linking to the c libraries. Interoperability with C is left as an exercise. Further information on linux system calls
Using Intel asm syntax is possible with NASM
$sudo apt install nasm
hello.asm source with Intel syntax
section .data
msg db 'Owned!!',0xa
section .text
global _start
_start:
;write(int fd, char *msg, unsigned int len)
mov eax, 4
mov ebx, 1
mov ecx, msg
mov edx, 8
int 0x80
;exit(int ret)
mov eax,1
mov ebx,0
int 0x80
Assembly using nasm
$nasm -f elf32 -o hello.obj hello.asm
Linking
$ld -m elf_i386 -o hello hello.o
Mind that disassembly hello is by default still going to produce AT&T syntax, but we have a method in objdump to translate.