Assembly - AbhiAgarwal/notes GitHub Wiki

History

x86 assembly language is a family of backward-compatible assembly languages, which provide some level of compatibility all the way back to the Intel 8008. x86 assembly languages are used to produce object code for the x86 class of processors.

Syntax

x86 assembly language has two main syntax branches: Intel syntax, originally used for documentation of the x86 platform, and AT&T syntax. Intel syntax is dominant in the MS-DOS and Windows world, and AT&T syntax is dominant in the Unix world, since Unix was created at AT&T Bell Labs.

A primary difference between the two is the way the address is written. For example:

AT&T, Unix: Source before the destination MS-DOS, Intel: Destination before source

X86-64

x86-64 (also known as x64, x86_64 and AMD64) is the 64-bit version of the x86 instruction set. It supports vastly larger amounts of virtual memory and physical memory than is possible on its predecessors, allowing programs to store larger amounts of data in memory. x86-64 also provides 64-bit general purpose registers and numerous other enhancements. The original specification was created by AMD, and has been implemented by AMD, Intel, VIA, and others. It is fully backwards compatible with 16-bit and 32-bit x86 code.

x64 is a generic name for the 64-bit extensions to Intel's and AMD's 32-bit x86 instruction set architecture (ISA). AMD introduced the first version of x64, initially called x86-64 and later renamed AMD64. Intel named their implementation IA-32e and then EMT64. X64, AMD64 and x86-64 are names for the same processor type. It's often called AMD64 because AMD came up with it initially. All current general-public 64-bit desktops and servers have an amd64 processor. --> The naming is different - type is the same.

Prior to launch, "x86-64" and "x86_64" were used to refer to the instruction set. Upon release, AMD named it AMD64. Intel initially used the names IA-32e and EM64T before finally settling on Intel 64 for their implementation. Some in the industry, including Apple use x86-64 and x86_64, while others, notably Sun Microsystems (now Oracle Corporation) and Microsoft, use x64 while the BSD family of OSs and several Linux distributions use AMD64.

Regarded as a programming language, assembly coding is machine-specific and low level. Assembly languages are more typically used for detailed and/or time critical applications such as small real-time embedded systems or operating system kernels and device drivers.

x86

x86 assembly has 8 primary registers that we are concerned with: EAX, ECX, EDX, EBX, ESP, EBP, ESI, and EDI. Just as a quick aside, since these are x86 registers, these can only store up to 32-bit values. Anyway, the first four of these registers (EAX, ECX, EDX, and EBX) are what we call general purpose registers and are known as the accumulator, counter, data, and base registers, respectively. These mainly act as temporary storage locations when the CPU is executing a program.

EAX - Accumulator counter ECX - Data EDX - Base Register EBX - Base Register

The other four registers (ESP, EBP, ESI, EDI) are also general purpose registers, but they are usually referred to as pointers and indexes. They are known as the stack pointer, base pointer, source index, and destination index, respectively. The first two of these registers (ESP and EBP) are especially important to program execution. The stack pointer (ESP) points to the last item pushed onto the stack. It is also worth mentioning that the stack grows towards lower memory addresses. The base pointer (EBP) deals with stack frames.

ESP - Stack Pointer EBP - Base Pointer ESI - Source Index EDI - Destination Index

EBP, base pointer, deals with stack frames.

Another extremely important register is the instruction pointer (EIP) register. This register points to the next instruction that the CPU will execute.

EIP - Next Instruction to execute

x86-64

Since the 64-bit registers allow access for many sizes and locations, we define a byte as 8 bits, a word as 16 bits, a double word as 32 bits, a quadword as 64 bits, and a double quadword as 128 bits.

Intel stores bytes "little endian," meaning lower significant bytes are stored in lower memory addresses. Stacks also grow downwards so they start at the highest pointing memory address 2^64 - 1 and then move lower.

When we move from x86, which is 32-bit, to x86_64, which is 64-bit, we change the register names completely to support them being 64-bit. Now they become RAX, RCX, RDX, RBX, RSP, RSI, RDI, and RIP.

By replacing the initial R with an E on the first eight registers, it is possible to access the lower 32 bits (EAX for RAX). Similarly, for RAX, RBX, RCX, and RDX, access to the lower 16 bits is possible by removing the initial R (AX for RAX), and the lower byte of the these by switching the X for L (AL for AX), and the higher byte of the low 16 bits using an H (AH for AX).

The second eight are named R8-R15.

The new registers R8 to R15 can be accessed in a similar manner like this: R8 (qword), R8D (lower dword), R8W (lowest word), R8B (lowest byte MASM style, Intel style R8L). Note there is no R8H.

MIPS

MIPS (Microprocessor without Interlocked Pipeline Stages) is a reduced instruction set computer (RISC) instruction set architecture (ISA) developed by MIPS Technologies (formerly MIPS Computer Systems, Inc.). The early MIPS architectures were 32-bit, with 64-bit versions added later.

Sources