Emulation - adava/DECAF-Selective GitHub Wiki

DECAF performs information and dynamic taint analysis by execution emulation. Emulation, here, means that instead of really executing a program, we update the program state under our controlled environment. In other words, all the CPU registers, memory, network etc. are just software data structures that are updated to reflect the state of the emulated system. Emulation, in system, has been extensively used to run operating systems on an architecture different than their target e.g. running an Android operating system (ARM architecture) on a X86 architecture. In the previous example, the former is called the guest and the latter is called host.

There are several challenges in doing system emulation on a different platform. Firstly, the guest platform instruction set is completely different than the host instruction set. This means the instructions coming from the binary of the guest can not be understood by the host hardware. Secondly, since we are doing emulation, the IO devices are not physically available to the guest. For instance, an Android operating system expects commands from a touchscreen keyboard. Every Android phone has also a camera. It goes unsaid that a Linux host does not have the previous IO devices. Third, since the guest does not have direct access to the host memory, the guest memory virtualization fails if the host doesn’t intermediate.

To address the first challenge, we must somehow translate the guest instructions. Translation is not as intuitive as it may sound. Firstly, we must find an equivalent instruction for every instruction of the guest. However, for instance, the hardware register set of the guest may be different than the host. How should we address this issue? Secondly, our goal from emulation is to watch the execution and probably store different states of the program while execution. This means the guest must run in the same address space of the emulator. This brings another difficulty and that is we must switch back and forth between the emulator code and the guest code in the address space. Finally, an operating system usually runs with high privileges (Ring 0) while an application (emulator in this case) runs on the lowest privilege level possible (Ring 3). This means some instructions are not available to the guest anyway even if they are available in the host platform simply because the guest is not allowed to run them. Above is just a brief list of the problems an emulator must address. There are many other engineering problems that should be addressed.

To address the second and the third challenge, the guest requests for IO and memory should be handled by the guest software instead of the hardware. In order to do this, the requests for the hardware must be redirected to the guest software i.e. the emulator. The general mechanism to do this is to trap the IO/memory requests and add a virtualization layer to handle these requests. In practice, there will be more engineering challenges.

After this brief introduction to emulation, the reader can read the Wiki pages about Qemu that is the DECAF underlying emulator .