ENIGMA_compiler - hpgDesigns/hpgdesigns-dev.io GitHub Wiki
ENIGMA's compiler takes EDL and compiles it to C++.
Process
Before the process begins, the compiler is already aware of the target platform, the necessary make calls, and the variables, functions, and other important definitions from the C++ engine. The process begins with this information tucked away in global memory.
First and foremost, the compiler tosses around resource names and makes declarations for them, adding them to a new virtual namespace allocated for this compile. This includes minor code generation for instances. From there, it begins lexing all the code and does some parse operations such as adding semicolons (see page on Parser for details on lexer and preliminary parsing), and then takes note of all the types that are declared locally in each object and script. At this point, the compiler has a structure for each event in each object and for each script, containing the code, the lex string, and a list of the variables it declares for each scope: globally, instance-locally, and via dot-access.
From there, it looks at which objects make what calls to what scripts, and which scripts call what scripts, in a complex resolution pass that results in a list of every script that could possibly be invoked by an object. Using that resolved list, the compiler scopes the scripts into the appropriate objects and then starts at the bottom and works its way up, gathering variables used by any script or event. The results of this pass are a comprehensive list of both scripts invoked by and variables used in each object.
Using the list of used variables and scripts for each object, the compiler can make choices on where to scope scripts and objects, be it solely at the global scope (using with() where necessary), at the parent-object scope (where all objects will inherit it), or at the individual object scope.
From there, the compiler conducts a second pass, using its newly
gathered information to resolve access routines
and other heavily context-dependent mechanisms of EDL, many of which
involve heavy code generation. Dot-based
access of form a.b
, where '' 'a' '' is an integer, resolves to
either enigma::glaccess(a)->b
in the case of shared or "global"
locals, or enigma::varaccess_b(a)
for strict locals. It is up to the
compiler to generate these functions.
- First, the compiler must isolate a type that will represent '' 'b.'
'' It does this by crawling objects to find any that declare it
explicitly.
- If all objects agree on one type, it writes an access routine for it and allocates a dummy to be returned to prevent segfault.
- If they do not agree, it bitches and does so anyway
- The accessor function switch()es the object index. It then makes a case for each index that contains the correct definition of someVariable.
- The default case returns the anti-segfault dummy for that type; it
is declared exactly once in form
static`` ``someType
dummy_someType;
After that, the hard work is basically done on ENIGMA's part; it uses the lex buffer to dump the code buffer into the specific files under ENIGMAsystem/SHELL/Preprocessor_Environment_Editable/ so it looks nice, meanwhile adding the strings and other collapsed sections back in. At this point, it is compiled to C++, and it is just a matter of invoking the GCC on the produced code. Native compiler invocation is done through Make; when that process finishes, the game is officially natively compiled.
From there, the compiler simply tacs resource data onto the end of the executable, or where requested by Compilers/*/compiler.ey. If requested, the compiler will then invoke the game.
Code Generation
Getting GML to bode well with a C++ compiler is obviously impossible without generating some additional code to be compiled with the game. The code often fills large gaps in the ENIGMA engine. Among the various code pieces generated to get a variety of games to compile are the following:
- A switch statement is generated for use by instance_create(). Since
class ids cannot be enumerated in an array in C++, the switch
statement pairs each object index with its own
new
statement. - A framework of structures is generated for each object. Locals and scripts are then scoped into each structure as appropriate.
- An accessor function is generated for each
local variable accessed as
object
.
local_variable
. - A common-class cast is generated to allow instance_change to be implemented; each object has a method to cast to the common class and a constructor from it.
What needs done to the compiler
- Template type tracking: The C Parser needs to keep track of all template instantiations. This may involve creating an instantiation scope in each template, or creating an instantiation parameters list in each object.
- Default flag: All searchable objects need to have a flag set so a special case doesn't need made for 0xFFFFFFFF flag search in the C Parser.
- Constants and enums need flagged as such: For future items on this list to work, the "const" keyword needs acknowledged.
- Flag pair "local const" needs special treatment: Local constants should be initialized in the constructor instead of set inline to avoid errors.
- Local array bounds need coerced: To permit having a local array of variable-sized dimension, array subscripts should be determined to be constant or variable. Constant subscripts should remain in the declaration, variable subscripts should be replaced with * and allocated in the constructor.
- Switch statements need coerced: To allow for a more efficient
switch statement, the types of the switch value and of each case
label should be coerced. There is only one switch value type, the
key type. Since there are typically multiple case labels, the worst
type used in any of the switch()'s case labels will represent them
all. The "best" type is the smallest integer type, then largest
integer type, then any floating point type is bad, and the "worst"
is any string or variant type. The case type is considered
const
if and only if all of the case label types are constant. Scenarios for (''key***,case'')* type pairs are as follows (??? indicates that the type is irrelevant, all const types are denoted as such):- (int:const int): The statement is left alone completely.
- (???:const ???): The statement is replaced with a hash function and integral keys as the case labels. An if() is placed in each case to make sure the hash was accurate.
- (???:???): Regardless of switch value type, if the case types
are not all constant, the
switch()
must be replaced with consecutiveif()
s.
- Locally- and globally-declared array subscripts need special treatment. Variables marked "const" need to be declared first; of those, local consts need initialized via () in the constructor. It'd be a good idea to allow = for in-place construction and () for in-constructor construction.
- eYAML files of locals need acted upon: Ism presently has a mechanism by which she can look up alarms in separate sources. Files like the one she created manually need generated automatically by ENIGMA in accordance to the eYAML files under Extensions/.
- Variable tracking mechanism needs implemented: In accordance with the eYAML files mentioned above, a system needs implemented that can execute certain code at the end of events in which it is possible that a value may have changed. This is useful for establishing spacial containers for speeding up the collision system.
- The options in the LGM ENIGMA settings pane (and the ones that were
requested but aren't there) need implemented. This is actually
relatively trivial and not worth naming, but a couple not listed are
as follows:
- Scripts should have two modes for max efficiency; either being placed in the global scope and var accessed via a with(), or being scoped into each object that uses them (this is the current behavior)
- Global array types should have two type options: pointer or var (many people use view_xview without an array subscript, which will error for int* but not for var).
- Switch() should have an option to use strictly GML or strictly C methods.
- There needs to be an option for = vs == treatment in conditionals and parameters.
Toolchain Calls
To allow compilation of games for all platforms, and to allow cross-compilation, a system needed incorporated for compiler management. Though the About.ey files allow for some specification of system dependencies, compilers need to be delimited in a manner in which they can be looked up by the name of one of the three operating systems on which the IDE can run. In other words, a directory called Compilers/ must be kept containing a folder for each of Windows, Linux, and MacOSX. In each of those child folders, an eYAML file must be kept specifying fundamental information needed to call the toolchain executables.