ENIGMA_compiler - hpgDesigns/hpgdesigns-dev.io GitHub Wiki

ENIGMA's compiler takes EDL and compiles it to C++.

Process

Before the process begins, the compiler is already aware of the target platform, the necessary make calls, and the variables, functions, and other important definitions from the C++ engine. The process begins with this information tucked away in global memory.

First and foremost, the compiler tosses around resource names and makes declarations for them, adding them to a new virtual namespace allocated for this compile. This includes minor code generation for instances. From there, it begins lexing all the code and does some parse operations such as adding semicolons (see page on Parser for details on lexer and preliminary parsing), and then takes note of all the types that are declared locally in each object and script. At this point, the compiler has a structure for each event in each object and for each script, containing the code, the lex string, and a list of the variables it declares for each scope: globally, instance-locally, and via dot-access.

From there, it looks at which objects make what calls to what scripts, and which scripts call what scripts, in a complex resolution pass that results in a list of every script that could possibly be invoked by an object. Using that resolved list, the compiler scopes the scripts into the appropriate objects and then starts at the bottom and works its way up, gathering variables used by any script or event. The results of this pass are a comprehensive list of both scripts invoked by and variables used in each object.

Using the list of used variables and scripts for each object, the compiler can make choices on where to scope scripts and objects, be it solely at the global scope (using with() where necessary), at the parent-object scope (where all objects will inherit it), or at the individual object scope.

From there, the compiler conducts a second pass, using its newly gathered information to resolve access routines and other heavily context-dependent mechanisms of EDL, many of which involve heavy code generation. Dot-based access of form a.b, where '' 'a' '' is an integer, resolves to either enigma::glaccess(a)->b in the case of shared or "global" locals, or enigma::varaccess_b(a) for strict locals. It is up to the compiler to generate these functions.

  • First, the compiler must isolate a type that will represent '' 'b.' '' It does this by crawling objects to find any that declare it explicitly.
    • If all objects agree on one type, it writes an access routine for it and allocates a dummy to be returned to prevent segfault.
    • If they do not agree, it bitches and does so anyway
  • The accessor function switch()es the object index. It then makes a case for each index that contains the correct definition of someVariable.
  • The default case returns the anti-segfault dummy for that type; it is declared exactly once in form static`` ``someType dummy_someType;

After that, the hard work is basically done on ENIGMA's part; it uses the lex buffer to dump the code buffer into the specific files under ENIGMAsystem/SHELL/Preprocessor_Environment_Editable/ so it looks nice, meanwhile adding the strings and other collapsed sections back in. At this point, it is compiled to C++, and it is just a matter of invoking the GCC on the produced code. Native compiler invocation is done through Make; when that process finishes, the game is officially natively compiled.

From there, the compiler simply tacs resource data onto the end of the executable, or where requested by Compilers/*/compiler.ey. If requested, the compiler will then invoke the game.

Code Generation

Getting GML to bode well with a C++ compiler is obviously impossible without generating some additional code to be compiled with the game. The code often fills large gaps in the ENIGMA engine. Among the various code pieces generated to get a variety of games to compile are the following:

  1. A switch statement is generated for use by instance_create(). Since class ids cannot be enumerated in an array in C++, the switch statement pairs each object index with its own new statement.
  2. A framework of structures is generated for each object. Locals and scripts are then scoped into each structure as appropriate.
  3. An accessor function is generated for each local variable accessed as object.local_variable.
  4. A common-class cast is generated to allow instance_change to be implemented; each object has a method to cast to the common class and a constructor from it.

What needs done to the compiler

  1. Template type tracking: The C Parser needs to keep track of all template instantiations. This may involve creating an instantiation scope in each template, or creating an instantiation parameters list in each object.
  2. Default flag: All searchable objects need to have a flag set so a special case doesn't need made for 0xFFFFFFFF flag search in the C Parser.
  3. Constants and enums need flagged as such: For future items on this list to work, the "const" keyword needs acknowledged.
  4. Flag pair "local const" needs special treatment: Local constants should be initialized in the constructor instead of set inline to avoid errors.
  5. Local array bounds need coerced: To permit having a local array of variable-sized dimension, array subscripts should be determined to be constant or variable. Constant subscripts should remain in the declaration, variable subscripts should be replaced with * and allocated in the constructor.
  6. Switch statements need coerced: To allow for a more efficient switch statement, the types of the switch value and of each case label should be coerced. There is only one switch value type, the key type. Since there are typically multiple case labels, the worst type used in any of the switch()'s case labels will represent them all. The "best" type is the smallest integer type, then largest integer type, then any floating point type is bad, and the "worst" is any string or variant type. The case type is considered const if and only if all of the case label types are constant. Scenarios for (''key***,case'')* type pairs are as follows (??? indicates that the type is irrelevant, all const types are denoted as such):
    • (int:const int): The statement is left alone completely.
    • (???:const ???): The statement is replaced with a hash function and integral keys as the case labels. An if() is placed in each case to make sure the hash was accurate.
    • (???:???): Regardless of switch value type, if the case types are not all constant, the switch() must be replaced with consecutive if()s.
  7. Locally- and globally-declared array subscripts need special treatment. Variables marked "const" need to be declared first; of those, local consts need initialized via () in the constructor. It'd be a good idea to allow = for in-place construction and () for in-constructor construction.
  8. eYAML files of locals need acted upon: Ism presently has a mechanism by which she can look up alarms in separate sources. Files like the one she created manually need generated automatically by ENIGMA in accordance to the eYAML files under Extensions/.
  9. Variable tracking mechanism needs implemented: In accordance with the eYAML files mentioned above, a system needs implemented that can execute certain code at the end of events in which it is possible that a value may have changed. This is useful for establishing spacial containers for speeding up the collision system.
  10. The options in the LGM ENIGMA settings pane (and the ones that were requested but aren't there) need implemented. This is actually relatively trivial and not worth naming, but a couple not listed are as follows:
    • Scripts should have two modes for max efficiency; either being placed in the global scope and var accessed via a with(), or being scoped into each object that uses them (this is the current behavior)
    • Global array types should have two type options: pointer or var (many people use view_xview without an array subscript, which will error for int* but not for var).
    • Switch() should have an option to use strictly GML or strictly C methods.
    • There needs to be an option for = vs == treatment in conditionals and parameters.

Toolchain Calls

To allow compilation of games for all platforms, and to allow cross-compilation, a system needed incorporated for compiler management. Though the About.ey files allow for some specification of system dependencies, compilers need to be delimited in a manner in which they can be looked up by the name of one of the three operating systems on which the IDE can run. In other words, a directory called Compilers/ must be kept containing a folder for each of Windows, Linux, and MacOSX. In each of those child folders, an eYAML file must be kept specifying fundamental information needed to call the toolchain executables.