QEMU Data flow - adava/DECAF-Selective GitHub Wiki

In this section, we explain how data (guest instructions' opcodes, operands etc.) for emulation flow in QEMU. In the translation process, the dissembler reads the guest code and needs to store the instruction opcodes and the operands for the opcodes for later processing. The opcodes will be stored in an architecture independent manner. An example of an opcode that would be stored is INDEX_op_mov_i32. Opcodes for a block will be stored in an array. We elaborate more on this in Opcodes container section. The operands for an opcode will be stored in a subtle manner. For binary code generation, more information in addition to the value of an operand is required e.g. whether the operand is a register type, memory or constant. Henceforth, an operand would be wrapped in the TCGTemp data structure. We talk more about this data structure in TCGTemp section. An offset to a TCGTemp object containing the operand for an operation will be stored in the operand container. We discuss Operands container in the corresponding section.

Later in the binary code generation process, we read the opcodes one by one from the opcodes container and their operands from the operands container. Generating host binary from the opcodes is not very complicated; we only need to know what the corresponding opcode for the host architecture is; the operands, however, need further processing. For memories and the constants we shall pass the address offset and the value. However, for every instruction, a register [or a set of registers] shall be used. This register allocation needs extra effort. For instance, we need to remember that for the former instruction we used the register EAX and for the current instruction we should use another register. We elaborate more on this in Registers allocation.

TCG data structures

In this section, we talk about the central data structures that are used in the data flow of the QEMU. We especially focus on the guest memory although there are many other important data structures used by Qemu for the translation and binary code generation.

Guest memory

Every memory reference while executing guest code on the host is allocated from the guest memory data structure TCGContext. This data structure is a pool for TCGTemp data structure. In following sections, we talk about these two data structures.

TCGContext

The structure of this data structure and the memory layout for TCGContext is shown below. TCGContext saves the guest variables in TCGTemp objects. Guest variables include global, register and local variables. The first two are stored in the lower offsets and the local variables are stored in the higher offsets of the temps array. Initialization of the global variables including the registers happen in cpu_x86_init. Precisely, registers are allocated from the global pool in optimize_flags_init. We further elaborate on this in Registers allocation. Global variables of a program are initialized in the tcg_context_init function before we start the translation. Henceforth, nb_globals in this function will be finalized. For local variables, QEMU allocates TCGTemp objects after the global variables per function. This means QEMU sets nb_temps to the nb_globals plus 1. For each translation nb_temps, is initialized to nb_globals in cpu_gen_code, tcg_func_start.

Figure 9. Guest memory simulated by TCGContext

	struct TCGContext {
		    uint8_t *pool_cur, *pool_end;
		    TCGPool *pool_first, *pool_current;
		    TCGLabel *labels;
		    int nb_labels;
		    TCGTemp *temps; /* globals first, temps after */ //sina: the name is misleading; the temps store the globals and the registers. Could be that for shadow memory we use larger indexes.
		    int nb_globals;
		    int nb_temps;
		    /* index of free temps, -1 if none */
		    int first_free_temp[TCG_TYPE_COUNT * 2]; 

		    /* goto_tb support */
		    uint8_t *code_buf;
		    unsigned long *tb_next;
		    uint16_t *tb_next_offset;
		    uint16_t *tb_jmp_offset; /* != NULL if USE_DIRECT_JUMP */

		    /* liveness analysis */
		    uint16_t *op_dead_args; /* for each operation, each bit tells if the
					       corresponding argument is dead */
		    
		    /* tells in which temporary a given register is. It does not take
		       into account fixed registers */
		    int reg_to_temp[TCG_TARGET_NB_REGS];
		    TCGRegSet reserved_regs;
		    tcg_target_long current_frame_offset;
		    tcg_target_long frame_start;
		    tcg_target_long frame_end;
		    int frame_reg;

		    uint8_t *code_ptr;
		    TCGTemp static_temps[TCG_MAX_TEMPS];

		    TCGHelperInfo *helpers;
		    int nb_helpers;
		    int allocated_helpers;
		    int helpers_sorted;

		#ifdef CONFIG_PROFILER
		    /* profiling info */
		    int64_t tb_count1;
		    int64_t tb_count;
		    int64_t op_count; /* total insn count */
		    int op_count_max; /* max insn per TB */
		    int64_t temp_count;
		    int temp_count_max;
		    int64_t del_op_count;
		    int64_t code_in_len;
		    int64_t code_out_len;
		    int64_t interm_time;
		    int64_t code_time;
		    int64_t la_time;
		    int64_t restore_count;
		    int64_t restore_time;
		#endif

		#ifdef CONFIG_DEBUG_TCG
		    int temps_in_use;
		#endif
		};

TCGTemp

Variables or more precisely operands for an opcode are managed by Qemu in the TCGTemp data structure; for any variable in the guest executable, a TCGTemp variable is allocated. The allocation and initialization of this data structure is through either of these two functions:

tcg_temp_new_internal

tcg_global_mem_new_internal The former is for the local variables i.e. stack variables and the latter is for the global memory variables. Below is the TCGTemp struct:

  typedef struct TCGTemp {
      TCGType base_type;
      TCGType type;
      int val_type;
      int reg;
      tcg_target_long val;
      int mem_reg;
      tcg_target_long mem_offset;
      unsigned int fixed_reg:1;
      unsigned int mem_coherent:1;
      unsigned int mem_allocated:1;
      unsigned int temp_local:1; /* If true, the temp is saved across
  		                  basic blocks. Otherwise, it is not
  		                  preserved across basic blocks. */
      unsigned int temp_allocated:1; /* never used for code gen */
      /* index of next free temp of same base type, -1 if end */
      int next_free_temp;
      const char *name;
  } TCGTemp;

Below is the complete source code for the tcg_global_mem_new_internal function:

	static inline int tcg_global_mem_new_internal(TCGType type, int reg,//sina: reg is TCG_AREG0 for registers
			                              tcg_target_long offset,
			                              const char *name) //global variables initialization
	{
	    TCGContext *s = &tcg_ctx;
	    TCGTemp *ts;
	    int idx;

	    idx = s->nb_globals;
	#if TCG_TARGET_REG_BITS == 32
	    if (type == TCG_TYPE_I64) {
		char buf[64];
		tcg_temp_alloc(s, s->nb_globals + 2);
		ts = &s->temps[s->nb_globals];
		ts->base_type = type;
		ts->type = TCG_TYPE_I32;
		ts->fixed_reg = 0;
		ts->mem_allocated = 1;
		ts->mem_reg = reg; //sina: TCG_REG_R27
	#ifdef TCG_TARGET_WORDS_BIGENDIAN
		ts->mem_offset = offset + 4;
	#else
		ts->mem_offset = offset; //sina: in case of registers, the offset points to the CPUState register offser like env->regs[R_EAX]
	#endif
		pstrcpy(buf, sizeof(buf), name);
		pstrcat(buf, sizeof(buf), "_0");
		ts->name = strdup(buf); //sina: register or global variable name
		ts++;

		ts->base_type = type;
		ts->type = TCG_TYPE_I32;
		ts->fixed_reg = 0;
		ts->mem_allocated = 1;
		ts->mem_reg = reg;
	#ifdef TCG_TARGET_WORDS_BIGENDIAN
		ts->mem_offset = offset;
	#else
		ts->mem_offset = offset + 4;
	#endif
		pstrcpy(buf, sizeof(buf), name);
		pstrcat(buf, sizeof(buf), "_1");
		ts->name = strdup(buf);

		s->nb_globals += 2;
	    } else
	#endif
	    {
		tcg_temp_alloc(s, s->nb_globals + 1);
		ts = &s->temps[s->nb_globals];
		ts->base_type = type;
		ts->type = type;
		ts->fixed_reg = 0;
		ts->mem_allocated = 1;
		ts->mem_reg = reg;
		ts->mem_offset = offset;
		ts->name = name;
		s->nb_globals++; //sina: increasing this counter
	    }
	    return idx;
	}

Note that in TCGTemp, we store information about the type of variable (register, constant, or memory address) and the type of the variable. val_type of TCGTemp is initialized in tcg_reg_alloc_start; either TEMP_VAL_REG (fixed_reg 1 which is for temps) or TEMP_VAL_MEM (fixed_reg 0 which is for global mems). tcg_reg_alloc_start is called in tcg_gen_code_common which does the binary code generation. Below is the piece code for local variable allocation:

	tcg_temp_new_internal(...){
		    tcg_temp_alloc(s, s->nb_temps + 2); //sina: It is just a check that leads to abortion if s->nb_temps + 2 is bigger than the number of local variables
		    ts = &s->temps[s->nb_temps];
		    ts->base_type = type;
		    ts->type = TCG_TYPE_I32;
		    ts->temp_allocated = 1;
		    ts->temp_local = temp_local;
		    ts->name = NULL;
		    ts++;
		    ts->base_type = TCG_TYPE_I32;
		    ts->type = TCG_TYPE_I32;
		    ts->temp_allocated = 1;
		    ts->temp_local = temp_local;
		    ts->name = NULL;
		    s->nb_temps += 2; //sina: note that here is where we make the allocation official by increasing the nb_temps.}

In summary, all the wrapper functions to allocate local memory go through the tcg_temp_new_internal. The logic is to first find an available allocated memory from tcg_ctx->temps and then return the index. Along the way, we also initializes the object for this variable which is of type TCGTemp.

Opcodes and Operands container

The opcodes container is in fact an array of integers. It is a global variable named gen_opc_ptr. The operands container is also an array of integers. It is a global variable named gen_opparam_ptr.

Registers allocation

In addition to the executable's global variables, QEMU treats the registers as global variables as well. This means for every register reference in the guest executable, QEMU has a memory reference in the translated code. The TCGTemp references to registers are in fact to the elements in the cpu_regs array. As stated previously, allocation of TCGTemp objects for registers is done in optimize_flags_init function during initialization phase. For an x86 register assignment, a call to tcg_global_mem_new_i64 such as below is issued at optimize_flags_init that will lead to tcg_global_mem_new_internal execution:

cpu_regs[R_EAX] = tcg_global_mem_new_i64(TCG_AREG0, offsetof(CPUState, regs[R_EAX]), "rax");

TCG_AREG0 will be assigned to mem_reg field of the TCGTemp. mem_offset of the data structure will be set to the address of the corresponding register in the CPUState. This means for every architecture register, there is a fields in the CPUState object and the value of the guest registers at all times is kept there and only there. Such design has an advantage and a drawback. The advantage is that whenever there is an exception or interrupt, we do not need to save the guest status (the register values) because they are already stored in memory. The drawback, however, is that we will have a huge slowdown because for every operation we have at least a memory reference which is the emulated register.

Deciding about which host register to use for an operation happens in the binary code generation time and based on the TCGTemp object for the operand. The register allocation process happens in tcg_reg_alloc function that is called by tcg_reg_alloc_op. The function is shown below:

		/* Allocate a register belonging to reg1 & ~reg2 */
	static int tcg_reg_alloc(TCGContext *s, TCGRegSet reg1, TCGRegSet reg2)
	{
	    int i, reg;
	    TCGRegSet reg_ct;

	    tcg_regset_andnot(reg_ct, reg1, reg2); //sina: arg_ct->u.regs, allocated_regs (TCG_REG_ESP)

	    /* first try free registers */
	    for(i = 0; i < ARRAY_SIZE(tcg_target_reg_alloc_order); i++) {
		reg = tcg_target_reg_alloc_order[i];
		if (tcg_regset_test_reg(reg_ct, reg) && s->reg_to_temp[reg] == -1)
		    return reg;
	    }

	    /* XXX: do better spill choice */
	    for(i = 0; i < ARRAY_SIZE(tcg_target_reg_alloc_order); i++) {
		reg = tcg_target_reg_alloc_order[i];
		if (tcg_regset_test_reg(reg_ct, reg)) {
		    tcg_reg_free(s, reg);
		    return reg;
		}
	    }

	    tcg_abort();
	}

The whole purpose of this function is to return a register code to Qemu that we can be used for the operation e.g. MOV. This function tells us which registers we can use based on the operation limitations, the reserved registers and the registers in use. The operations limitations are pre-determined for the architecture. The following trace pre-determines the limitation for each operation (opcode):

	tcg_target_init -> 
		tcg_add_target_add_op_defs -> 
			target_parse_constraint

In tcg_add_target_add_op_defs, QEMU sets the constraint for tcg_op_defs based on the constraint for the operations of the target archietecture that is passed to this function. The constraint is encapsulated in a TCGArgConstraint object. One field of TCGArgConstraint determines whether there is a limitation on the registers to be used for the opcode. Below is a piece of code from target_parse_constraint that sets the limitation on the registers to be used for the opcode. It simply means that for the register operand of the opcode, any register can be used:

	…
	    case 'r':
		ct->ct |= TCG_CT_REG;
		if (TCG_TARGET_REG_BITS == 64) {
		    tcg_regset_set32(ct->u.regs, 0, 0xffff);
		} else {
		    tcg_regset_set32(ct->u.regs, 0, 0xff);
		}
		break;

Anyhow, opcodes usually don't have any limitations. For instance, for mov the 2nd parameter to the tcg_reg_alloc function is 0xFF which means we can use all registers (you see later why 0xFF has this meaning), the second parameter for tcg_reg_alloc function is usually the OR result of all the reserved registers. For instance, for most operations, the second parameter is 0x30 which is the register code for SP. The third constraint comes from the reg_to_temp array, and we mark registers in use in this array as we allocate them (-1 means the register is not in use). Applying the first constraint is based on the reg1 & ~reg2. For instance, for the above example 0xff & !0x30 returns code 0xcf which will be used as the mask for our register selection from the tcg_target_reg_alloc_order (each element of this array has the opcode for a register). Marking the register as in use happens not in this function but in the below line of tcg_reg_alloc_op function:

s->reg_to_temp[reg] = arg

Example

In order to better illustrate the data flow in the translation and code generation process, we review an example. Let’s assume in the translation process, the assembler observes 0xca code.

Translation data flow

For this code, disas_insn calls gen_op_addl_A0_im function:

tcg_gen_addi_tl(cpu_A0, cpu_A0, val)

Let’s see what the parameters are and how they flow through the translation and code generation. cpu_A0 and cpu_A1 are global variables and val is a constant. The allocation for cpu_A0, cpu_A0 happens before, in gen_intermediate_code_internal, in this manner:

	cpu_A0 = tcg_temp_new();
	cpu_A1 = tcg_temp_new();

The tcg_gen_addi_tl function is defined as below:

		static inline void tcg_gen_addi_i32(TCGv_i32 ret, TCGv_i32 arg1, int32_t arg2)
			{
				/* some cases can be optimized here */
				if (arg2 == 0) {
					tcg_gen_mov_i32(ret, arg1);
				} else {
					TCGv_i32 t0 = tcg_const_i32(arg2);
					tcg_gen_add_i32(ret, arg1, t0);
					tcg_temp_free_i32(t0);
				}
			}

The first two arguments to tcg_gen_addi_i32 function are indeed CPU registers that are implemented as global (static in C++) variables. The last argument is a constant and as you see, a function for variable allocation is called. The global variables are passed as they are. The constant is wrapped with tcg_const_i32 that within calls tcg_temp_new_internal. tcg_gen_mov_i32 calls only tcg_gen_op3_i32(INDEX_op_add_i32, ret, arg1, arg2) which is a generic function and gets the operation (in this case INDEX_op_add_i32) and the operands. Now let’s look at the tcg_gen_op3_i32:

	static inline void tcg_gen_op3_i32(TCGOpcode opc, TCGv_i32 arg1, TCGv_i32 arg2,
			                   TCGv_i32 arg3)
	{
	    *gen_opc_ptr++ = opc;
	    *gen_opparam_ptr++ = GET_TCGV_I32(arg1); //sina: GET_TCGV_I32 does nothing
	    *gen_opparam_ptr++ = GET_TCGV_I32(arg2);
	    *gen_opparam_ptr++ = GET_TCGV_I32(arg3);
	}

gen_opc_ptr holds the operations for the tcg block and gen_opparam_ptr holds the parameters as explained above.

Code generation data flow

The code generation process at some point (from tcg_gen_code_common, tcg_out_op) reaches a function such as tcg_reg_alloc_call or tcg_reg_alloc_mov. These functions read the TCGTemp object from tcg_ctx->temps using the index put on gen_opparam_ptr. Then, they prepare the operands for the current opcode opcode based on the TCGTemp object. The argument could be a memory offset, a constant or a register. The former is accessible directly from the TCGTemp. And for the latter, we have to go through the Registers allocation process. Finally, the opcode and the operands should be written to the code cache (in binary). For most x86 opcodes, this process happens in tcg_out_opc.