x86 assembly on a 64 bit Linux Part 2 - itzjac/cpplearning GitHub Wiki
Analyze the x86 asm generated by TCC when compiling a C file and how to create executables with assembly language.
Requirements
- Basic linux usage and terminal
- Configuring, Compiling, linking C programs
- objdump, nasm
TCC doesn't allow you to generate disassembly instructions directly. But we can use the utility command objdump. With ELF 32-bit file generated in Part 1, we can create the disassembly code
$objdump -d example1 > example1.asm
The asm file contains the full disassembly code of the executable example1.
example1:     file format elf32-i386
Disassembly of section .text:
# removed code for readability
..
08048245 <sub>:
 8048245:	55                   	push   %ebp
 8048246:	89 e5                	mov    %esp,%ebp
 8048248:	81 ec 00 00 00 00    	sub    $0x0,%esp
 804824e:	8b 45 08             	mov    0x8(%ebp),%eax
 8048251:	c1 e0 01             	shl    $0x1,%eax
 8048254:	8b 4d 0c             	mov    0xc(%ebp),%ecx
 8048257:	01 c8                	add    %ecx,%eax
 8048259:	c9                   	leave  
 804825a:	c3                   	ret    
0804825b <main>:
804825b:	55                   	push   %ebp
804825c:	89 e5                	mov    %esp,%ebp
804825e:	81 ec 04 00 00 00    	sub    $0x4,%esp
8048264:	8b 45 0c             	mov    0xc(%ebp),%eax
8048267:	83 c0 04             	add    $0x4,%eax
804826a:	8b 08                	mov    (%eax),%ecx
804826c:	51                   	push   %ecx
804826d:	e8 ee 01 00 00       	call   8048460 <atoi@plt>
8048272:	83 c4 04             	add    $0x4,%esp
8048275:	89 45 fc             	mov    %eax,-0x4(%ebp)
8048278:	8b 45 fc             	mov    -0x4(%ebp),%eax
804827b:	50                   	push   %eax
804827c:	8b 45 08             	mov    0x8(%ebp),%eax
804827f:	50                   	push   %eax
8048280:	e8 c0 ff ff ff       	call   8048245 <sub>
8048285:	83 c4 08             	add    $0x8,%esp
8048288:	c9                   	leave  
8048289:	c3                   	ret    
804828a:	00 00                	add    %al,(%eax)
804828c:	00 00                	add    %al,(%eax)
  	...
By inspecting the first line of the asm, we can again confirm, the ELF is a 32-bit executable.
Have you noticed the AT&T syntax? objdump command dumps assembly instructions with the AT&T syntax in the linux OSs by default. If an Intel syntax is more convenient to read, we can still use objdump to generate the same assembly code.
$ objdump -M intel -d example1 > example1.asm
Confirming the opcode for each line of asm code (in the second column), is exactly the same as with the previous dump file, i.e. both files contain exactly the same executable code.
example1:     file format elf32-i386
Disassembly of section .text:
# removed code for readability
....
08048245 <sub>:
 8048245:	55                   	push   ebp
 8048246:	89 e5                	mov    ebp,esp
 8048248:	81 ec 00 00 00 00    	sub    esp,0x0
 804824e:	8b 45 08             	mov    eax,DWORD PTR [ebp+0x8]
 8048251:	c1 e0 01             	shl    eax,0x1
 8048254:	8b 4d 0c             	mov    ecx,DWORD PTR [ebp+0xc]
 8048257:	01 c8                	add    eax,ecx
 8048259:	c9                   	leave  
 804825a:	c3                   	ret    
0804825b <main>:
 804825b:	55                   	push   ebp
 804825c:	89 e5                	mov    ebp,esp
 804825e:	81 ec 04 00 00 00    	sub    esp,0x4
 8048264:	8b 45 0c             	mov    eax,DWORD PTR [ebp+0xc]
 8048267:	83 c0 04             	add    eax,0x4
 804826a:	8b 08                	mov    ecx,DWORD PTR [eax]
 804826c:	51                   	push   ecx
 804826d:	e8 ee 01 00 00       	call   8048460 <atoi@plt>
 8048272:	83 c4 04             	add    esp,0x4
 8048275:	89 45 fc             	mov    DWORD PTR [ebp-0x4],eax
 8048278:	8b 45 fc             	mov    eax,DWORD PTR [ebp-0x4]
 804827b:	50                   	push   eax
 804827c:	8b 45 08             	mov    eax,DWORD PTR [ebp+0x8]
 804827f:	50                   	push   eax
 8048280:	e8 c0 ff ff ff       	call   8048245 <sub>
 8048285:	83 c4 08             	add    esp,0x8
 8048288:	c9                   	leave  
 8048289:	c3                   	ret    
 804828a:	00 00                	add    BYTE PTR [eax],al
 804828c:	00 00                	add    BYTE PTR [eax],al
    	...
Let's create a simple asm program that prints the cpu id. Because we are using a linux OS and TCC, the asm source needs to be written in AT&T syntax. Though with more complex and complete compilers like gcc, it is possible to write it with Intel syntax.
A file cpuid.s
# extract the processor Vendor ID
.section .data
output:
    .ascii "The processor Vendor ID is 'xxxxxxxxxxxx'\n"
	
.section .text
.globl main
main:
    movl $0, %eax
    cpuid
    movl $output, %edi
    movl %ebx, 28(%edi)
    movl %edx, 32(%edi)
    movl %ecx, 36(%edi)
    movl $4, %eax
    movl $1, %ebx
    movl $output, %ecx
    movl $42, %edx
    int $0x80
    movl $1, %eax
    movl $0, %ebx
    int $0x80
Using TCC to assembly the cpuid asm source file to create an obj
$tcc -m32 -c -o cpuid.obj cpuid.s
Then create the executable with the obj file
$tcc -m32 -o cpuid cpuid.obj
Again to confirm it is another ELF 32-bit file, can be used with objdump with intel syntax to get the full assembly code.
cpuid:     file format elf32-i386
Disassembly of section .text:
# removed code for readability
...
08048228 <main>:
 8048228:	b8 00 00 00 00       	mov    eax,0x0
 804822d:	0f a2                	cpuid  
 804822f:	bf 44 94 04 08       	mov    edi,0x8049444
 8048234:	89 5f 1c             	mov    DWORD PTR [edi+0x1c],ebx
 8048237:	89 57 20             	mov    DWORD PTR [edi+0x20],edx
 804823a:	89 4f 24             	mov    DWORD PTR [edi+0x24],ecx
 804823d:	b8 04 00 00 00       	mov    eax,0x4
 8048242:	bb 01 00 00 00       	mov    ebx,0x1
 8048247:	b9 44 94 04 08       	mov    ecx,0x8049444
 804824c:	ba 2a 00 00 00       	mov    edx,0x2a
 8048251:	cd 80                	int    0x80
 8048253:	b8 01 00 00 00       	mov    eax,0x1
 8048258:	bb 00 00 00 00       	mov    ebx,0x0
 804825d:	cd 80                	int    0x80
 804825f:	00 f3                	add    bl,dh
 8048261:	0f 1e fb             	nop    ebx
 8048264:	55                   	push   ebp
 8048265:	e8 a2 01 00 00       	call   804840c <_init+0x48>
 804826a:	81 c5 5a 12 00 00    	add    ebp,0x125a
 8048270:	57                   	push   edi
 8048271:	56                   	push   esi
 8048272:	53                   	push   ebx
 8048273:	83 ec 0c             	sub    esp,0xc
 8048276:	89 eb                	mov    ebx,ebp
 8048278:	8b 7c 24 28          	mov    edi,DWORD PTR [esp+0x28]
 804827c:	e8 af 01 00 00       	call   8048430 <_init@plt>
 8048281:	8d 9d 4c ef ff ff    	lea    ebx,[ebp-0x10b4]
 8048287:	8d 85 4c ef ff ff    	lea    eax,[ebp-0x10b4]
 804828d:	29 c3                	sub    ebx,eax
 804828f:	c1 fb 02             	sar    ebx,0x2
 8048292:	74 29                	je     80482bd <main+0x95>
 8048294:	31 f6                	xor    esi,esi
 8048296:	8d b4 26 00 00 00 00 	lea    esi,[esi+eiz*1+0x0]
 804829d:	8d 76 00             	lea    esi,[esi+0x0]
 80482a0:	83 ec 04             	sub    esp,0x4
 80482a3:	57                   	push   edi
 80482a4:	ff 74 24 2c          	push   DWORD PTR [esp+0x2c]
 80482a8:	ff 74 24 2c          	push   DWORD PTR [esp+0x2c]
 80482ac:	ff 94 b5 4c ef ff ff 	call   DWORD PTR [ebp+esi*4-0x10b4]
 80482b3:	83 c6 01             	add    esi,0x1
 80482b6:	83 c4 10             	add    esp,0x10
 80482b9:	39 f3                	cmp    ebx,esi
 80482bb:	75 e3                	jne    80482a0 <main+0x78>
 80482bd:	83 c4 0c             	add    esp,0xc
 80482c0:	5b                   	pop    ebx
 80482c1:	5e                   	pop    esi
 80482c2:	5f                   	pop    edi
 80482c3:	5d                   	pop    ebp
 80482c4:	c3                   	ret    
 80482c5:	8d b4 26 00 00 00 00 	lea    esi,[esi+eiz*1+0x0]
 80482cc:	8d 74 26 00          	lea    esi,[esi+eiz*1+0x0]
 80482d0:	f3 0f 1e fb          	endbr32 
 80482d4:	c3                   	ret    
cpuid execution

An executable program with 2388 bytes and verify we have a GenuineIntel.
If you would prefer using common C, like printf function calls to log the cpu id, instead of the system call (int 0x80), it is possible by linking to the c libraries. Interoperability with C is left as an exercise. Further information on linux system calls
Using Intel asm syntax is possible with NASM
$sudo apt install nasm
hello.asm source with Intel syntax
section .data
msg db 'Owned!!',0xa
section .text
global _start
_start:
;write(int fd, char *msg, unsigned int len)
mov eax, 4
mov ebx, 1
mov ecx, msg
mov edx, 8
int 0x80
;exit(int ret)
mov eax,1
mov ebx,0
int 0x80
Assembly using nasm
$nasm -f elf32 -o hello.obj hello.asm
Linking
$ld -m elf_i386 -o hello hello.o
Mind that disassembly hello is by default still going to produce AT&T syntax, but we have a method in objdump to translate.