Corrections to GameMaker Studio 1.4 data.win format and VM bytecode, .yydebug format and debugger instructions - UnderminersTeam/UndertaleModTool GitHub Wiki

These are some great resources about GM:S data file format and VM bytecode:

This page is a list of corrections to things that have either changed in newer versions, or were just plain wrong. Things may be missed, because this is formed from random researches. This is valid for bytecode version 16 and 15 - older ones isn't researched yet.

Unpacking / data structure

SimpleList vs PointerList

https://pcy.ulyssis.be/undertale/unpacking-corrected sometimes specifies List for things that are simple lists, not lists of pointers. If in doubt, check the UndertaleModTool source code.

GEN8.Debug

This is a flag that disables the debugger. If you disable it, the game will wait on the splash screen for the debugger to connect, listening on a port...

GEN8.SteamAppID

... configured here! This is not the SteamID, no actual SteamID can be found from anywhere.

GEN8.PaddingNumbers

This is actually a list of rooms. Seems to be used to control the order of the rooms, as without it room_prev and room_next GML functions start to misbehave and room_goto works fine.

OPTN.Info

Doesn't seem to actually be a duplicate of GEN8.Info. Unknown for now.

LANG

This is a new chunk in bytecode 16 (TODO: verify). Structure is as follows:

class UndertaleChunkLANG
{
    uint Unknown1 = 1;
    uint Unknown2 = 0;
    uint Unknown3 = 0;
}

SPRT.Unknown

These are collision masks. The first uint32 is the count, then a stream of standard 1 bit-per-pixel bitmaps follows (i.e. width rounded up to fill full byte). After all of the images are done, everything is padded with zeroes so that the next entry starts aligned to a 4-byte boundary.

For now, no relationship between the collision masks and the SPRT.SepMasks field is found.

FONT.Glyphs.Unknown

It's apparently:

ushort Shift;
uint Offset;

OBJT.ShapePoints

This is a list of events and associated event handlers. The structure is as follows (yes, this is one list inside another, containing objects that have a third list in them...):

    public UndertalePointerList<UndertalePointerList<Event>> Events;

    public class Event
    {
        uint EventSubtype; // (the same as the ID at the end of name)
        UndertalePointerList<EventAction> Actions; // seems to always have 1 entry, maybe the games using drag-and-drop code are different
    }

    public class EventAction
    {
        uint Unknown1; // always 1
        uint Unknown2; // always 603
        uint Unknown3; // always 7
        uint Unknown4; // always 0
        uint Unknown5; // always 0
        uint Unknown6; // always 1
        uint Unknown7; // always 2
        UndertaleString Unknown8; // always "" (pointer to empty string)
        UndertaleResourceById<UndertaleCode> CodeId; // ID in CODE
        uint Unknown10; // always 1
        int Unknown11; // always -1
        uint Unknown12; // always 0
        uint Unknown13; // always 0
        uint Unknown14; // always 0
    }

The outermost list always has 13 items: one for each event type

    public enum EventType : uint
    {
        Create = 0, // no subtypes, always subtype=0
        Destroy = 1, // no subtypes, always subtype=0
        Alarm = 2, // subtype is alarm id (0-11)
        Step = 3, // subtype is EventSubtypeStep
        Collision = 4, // subtype is other game object ID
        Keyboard = 5, // subtype is key ID, see EventSubtypeKey
        Mouse = 6, // TODO: subtypes (see game maker studio for possible values)
        Other = 7, // subtype is EventSubtypeOther
        Draw = 8, // subtype is EventSubtypeDraw
        KeyPress = 9, // subtype is key ID, see EventSubtypeKey
        KeyRelease = 10, // subtype is key ID, see EventSubtypeKey
        Gesture = 11, // TODO: mapping is a guess // TODO: subtypes
        Asynchronous = 12, // TODO: mapping is a guess // TODO: subtypes
    }

The inner list can contain any number of elements (including 0), depending on what event handlers are added to the object. The Action class contains a field that determines the subtype, the possible subtypes are as follows (names taken from GMS interface):

    public enum EventSubtypeStep : uint
    {
        Step = 0,
        BeginStep = 1,
        EndStep = 2,
    }

    public enum EventSubtypeDraw : uint
    {
        Draw = 0,
        DrawGUI = 64,
        Resize = 65,
        DrawBegin = 72,
        DrawEnd = 73,
        DrawGUIBegin = 74,
        DrawGUIEnd = 75,
        PreDraw = 76,
        PostDraw = 77,
    }

    public enum EventSubtypeKey : uint
    {
        // one of http://cherrytree.at/misc/vk.htm or just chr(value)
    }

    public enum EventSubtypeOther : uint
    {
        OutsideRoom = 0,
        IntersectBoundary = 1,
        OutsideView0 = 41,
        OutsideView1 = 42,
        OutsideView2 = 43,
        OutsideView3 = 44,
        OutsideView4 = 45,
        OutsideView5 = 45,
        OutsideView6 = 46,
        OutsideView7 = 47,
        BoundaryView0 = 51,
        BoundaryView1 = 52,
        BoundaryView2 = 53,
        BoundaryView3 = 54,
        BoundaryView4 = 55,
        BoundaryView5 = 55,
        BoundaryView6 = 56,
        BoundaryView7 = 57,
        GameStart = 2,
        GameEnd = 3,
        RoomStart = 4,
        RoomEnd = 5,
        NoMoreLives = 6,
        NoMoreHealth = 9,
        AnimationEnd = 7,
        AnimationUpdate = 58,
        AnimationEvent = 59,
        EndOfPath = 8,
        User0 = 10,
        User1 = 11,
        User2 = 12,
        User3 = 13,
        User4 = 14,
        User5 = 15,
        User6 = 16,
        User7 = 17,
        User8 = 18,
        User9 = 19,
        User10 = 20,
        User11 = 21,
        User12 = 22,
        User13 = 23,
        User14 = 24,
        User15 = 25,
    }

The innermost Event.Actions list seems to always contain exactly one entry - this is probably because AFAIK the newer versions of GMS compile the drag-and-drop blocks into GML bytecode anyway, this was probably used before that.

For possible meaning of the values, see https://github.com/WarlockD/GMdsam/blob/26aefe3e90a7a7a1891cb83f468079546f32b4b7/GMdsam/GameMaker/ChunkTypes.cs#L466 (although I didn't check it)

Or, if you prefer to look at the lists in visual form (this is a fragment of obj_mainchara editor from UndertaleModTool):

ROOM.Backgrounds.ObjectId

Could be ID to OBJT. The object name has something to do with backgrounds.

ROOM.Views.ObjectId

ID into OBJT, reference to player object for this view?

ROOM.GameObjects

Contains a new unknown uint32 field in bytecode >= 16

TPAG.SpritesheetId

ID into TXTR.

CODE.Unknown (the first one)

this matches LocalsCount (see second part of FUNC)

CODE.Unknown (the second one)

Seems to always be 0.

VARI

There is still have a lot of doubt about it. This chunk is NOT a simple ListChunk. It doesn't even have an item count, so you have to read until you reach the length of the chunk. The chunk can be defined as follows:

public class UndertaleChunkVARI : UndertaleChunk
{
    uint InstanceVarCount { get; set; }
    uint InstanceVarCountAgain { get; set; } // the same as InstanceVarCount
    uint MaxLocalVarCount { get; set; }
    ListWithNoCount<UndertaleVariable> List;
}

public class UndertaleVariable
{
    UndertaleString Name;
    InstanceType InstanceType; // (int32)
    int VarID;
    uint Occurrences;
    Ptr<UndertaleInstruction> FirstAddress; // this is a pointer to the instruction, NOT the reference object after it, and the offsets later follow this rule too
}

The variables have two ID pools, and the ID is stored in UndertaleVariable.VarID:

  • global and instance variables are in one pool. The max ID is stored in both InstanceVarCount and InstanceVarCountAgain fields of the VARI chunk
  • local variables have a separate pool. The ID always matches their index on the CodeLocals list for a given code block. The max ID in this pool is stored in the MaxLocalVarCount field of the VARI chunk.
  • argument0..argumentN and other builtin variables (x, y, etc.) are special, they get an ID of -6, always. This matches InstanceType.Unknown, so I guess that value means 'builtin' but it's used only in this context and not anywhere else.

FUNC

Another non-standard chunk. It contains two simple, item count-prefixed lists. The definitions are as follows:

public class UndertaleChunkFUNC : UndertaleChunk
{
    public UndertaleSimpleList<UndertaleFunction> Functions;
    public UndertaleSimpleList<UndertaleCodeLocals> CodeLocals;
}

public class UndertaleFunction
{
    UndertaleString Name;
    uint Occurrences;
    Ptr<UndertaleInstruction> FirstAddress;
}

// Seems to be unused. You can remove all entries and the game still works normally.
// But it exactly matches the structure in .yydebug, maybe the debugger uses it?
public class UndertaleCodeLocals
{
    uint LocalsCount;
    UndertaleString Name;
    ListWithNoLength<LocalVar> Locals; // but the length is defined above, the name being in the middle interferes with my naming scheme
}

public class LocalVar
{
    uint Index;
    UndertaleString Name;
}

All ARGB fields

They are actually ABGR...

TXTR

Has no lengths anywhere as far as I can tell, so you have to guess by parsing the PNG inside and looking for IEND. The chunk starts with a normal PointerList, followed by the actual PNG blobs.

AUDO

or was it Length + 4?

It wasn't. But keep in mind the paddings (see below).

Paddings

The last piece of the puzzle to fully recreating byte-for-byte data.win from decompiled data. It probably doesn't matter too much for the actual runner (other than possibly very minor performance hits) but I decided to implement them anyway. In all cases, the padding is included in the length of chunk (or any other respective object).

Most of these can probably be refactored into some more generic rules...

  • SPRT: as mentioned before, the masks are padded so that they end on 4-byte boundary.
  • FONT:
    for (ushort i = 0; i < 0x80; i++)
        writer.Write(i);
    for (ushort i = 0; i < 0x80; i++)
        writer.Write((ushort)0x3f);
  • STRG: padded to end on 128-byte boundary (or TXTR padded to start on 128-byte?)
  • TXTR: texture blobs are padded to always start on 128-byte boundaries
  • AUDO: every entry (except the last one) is padded to end on 4-byte boundaries

For GMS versions >= 2.x and >= 1.9999 there is one additional rule:

  • Every chunk has to start on a 16-byte boundary. Padding is added to the length of previous chunk.
$ md5sum data.win newdata.win
5903fc5cb042a728d4ad8ee9e949c6eb  data.win
5903fc5cb042a728d4ad8ee9e949c6eb  newdata.win

Success!

Decompilation / VM bytecode

The POP instruction

The https://pcy.ulyssis.be/undertale/decompilation-corrected description of POP instruction is wrong, TypePair and InstanceType are swapped. The correct definition is as follows:

0x45: Pop
    Block1
        Instance : InstanceType(int16)
        Types : TypePair(uint4, uint4)
        OpCode : OpCode(uint8)
    Block2
        Destination : Reference<Variable>

Moreover, the description ("Types.First is Int32, the value will be on top of the stack, even before array access parameters!") is a little bit vague, Altar.NET seems to have gotten it wrong too. Here is a piece of code that properly executes it:

    // ...
    Debug.Assert(instr.Type1 == UndertaleInstruction.DataType.Int32 || instr.Type1 == UndertaleInstruction.DataType.Variable);
    if (instr.Type1 == UndertaleInstruction.DataType.Int32)
        val = stack.Pop();
    if (target.NeedsInstanceParameters)
        target.InstanceIndex = stack.Pop();
    if (target.NeedsArrayParameters)
    {
        target.ArrayIndex = stack.Pop();
        target.InstType = stack.Pop();
    }
    if (instr.Type1 == UndertaleInstruction.DataType.Variable)
        val = stack.Pop();
    statements.Add(new AssignmentStatement(target, val));

TODO: "magic array pop"? Decoding by hand gives pop.v.e obj_solidexwide.somevariable which would make sense if only obj_solidexwide had any instance variables... (unless that came from a different game or something)

The DUP instruction

The DUP instruction takes one additional argument (where 'padding' normally goes) that specifies how many items will be duplicated. I assume it always copies padding+1 items, but values other than padding=0 and padding=1 is not tested there (TODO: check what other values would do).

This means, if your stack looks like this:

1 2 3 4 5

After DUP 0 you have:

1 2 3 4 5 5

but after DUP 1:

1 2 3 4 5 4 5

This is often used when working with arrays, as to access the array you need two values on the stack (index and InstanceType).

The BREAK instruction

Actually, https://github.com/donkeybonks/acolyte/wiki/Bytecode#0xff-break has a reasonable description. The thing to note here is that while BREAK looks at the top of the stack, it doesn't modify the stack in any way and can usually be ignored if you are just trying to decompile bytecode to high-level representation.

PUSHENV/POPENV

Again, explained quite well at https://github.com/donkeybonks/acolyte/wiki/Bytecode#0xbb-pushenv

It turns out that the with() statement is actually a loop. If there are multiple objects matching your query, popenv will jump back for another iteration, and if there are no object matching your query, pushenv will skip your code entirely.

B/BT/BF offsets are Int23?

The offset of there instructions is specified to be Int24 (3 bytes), but in fact, if you read it like that, negative offsets don't work properly. It turns out, the value is actually an Int23, with the most significant byte being some kind of a flag that is set for only one instruction in the whole bytecode (no idea what it means).

InstanceType.StackTopOrGlobal

It doesn't exist. A value of 0 means that either you are actually referring to a game object with ID 0, or the instruction just doesn't use the instance type at all (see info about array push/pop below).

Accessing arrays

When parsing a Variable, if the Type is VariableType.Array: the index is at the stack top and has to be popped, and is followed by the Int16 -5, which also has to be popped. When an array is pushed and the Dup instruction occurs, the index is also duplicated.

First, it's not -5 but it can be any InstanceType (where -5 = global). For arrays, the push/pop instruction doesn't contain an InstanceType at all, so just override it with what you find on the stack.

Second, the DUP instruction was already explained - this behavior is not connected with that being an array index, it has to do with the additional parameter to DUP that was hiding in 'padding'.

Accessing instance parameters

When Pop's or Push's InstanceType is InstanceType.StackTopOrGlobal, if Type is VariableType.StackTop then one additional value, representing the instance, will be popped from the stack.

The part about InstanceType.StackTopOrGlobal is wrong, see the two points above. Perhaps something in the old bytecode worked like that.

Final value in reference chains

The NextReferenceOffset at the end of the chain (you can count which one it is by using Occurrences from FUNC or VARI block) seems to have some meaningful value. Not sure what meaning it has though. It seems to be always increasing when you are iterating over the list, but the pattern is not clear. I've noticed that if you want to add references to additional builtin functions, you can just put 0 there and it works without any problems.

Casts to/from type Variable

There are a lot of casts to/from variable type in the code. It seems that when executing a function, all passed parameters need to be of type VARIABLE and the return also is VARIABLE. If you forget a cast into variable type before calling a function, the VM will read something you didn't expect it to (usually 0).

.yydebug file format

This is a binary file format used by the debugger. If you want to attach using the debugger, you'll have to generate it.

To run the debugger, first, rebuild data.win with DisableDebugger=0, then run GMDebug using commandline as follows:

C:\Users\krzys\AppData\Roaming\GameMaker-Studio\GMDebug\GMDebug.exe -d=data.yydebug -t="127.0.0.1" -tp=6502 -p="C:\Users\krzys\Documents\GameMaker\Projects\Project4.gmx\Project4.project.gmx"

The project file can actually be just an empty project, but it needs to be there, otherwise the debugger crashes. Note that you have to use the same (or similar) version to the one that was used to build the game, 1.4.1773 seems to work fine for Undertale.

The format itself is quite similar to data.win, but the data in the chunks is obviously different.

FORM

didn't change at all, but ofc contains different chunks.

SCPT

Contains high-level source code for scripts. There is exactly as much of these as in the CODE block of main data.win, and each entry contains just one string pointer to the code.

public class UndertaleScriptSource
{
     UndertaleString SourceCode;
}

public class UndertaleDebugChunkSCPT : UndertaleListChunk<UndertaleScriptSource>
{
}

DBGI

Debug info - contains mappings between offsets in the bytecode and source code. If you can't or don't want to generate these, you don't need too, but then you won't be able to use breakpoints. Add at least one 0-0 mapping so that you can at least set a breakpoint at the start of a script.

public class UndertaleDebugInfo
{
    public class DebugInfoPair
    {
        uint BytecodeOffset;
        uint SourceCodeOffset;
    }

    uint Count;
    DebugInfoPair Data[Count/2]; // Note the /2 here!
}

public class UndertaleDebugChunkDBGI : UndertaleListChunk<UndertaleDebugInfo>
{
}

INST

Was always empty during testing, guess it's "instance variables". Everyting seems to work if it's empty. If you know how to fill this, please contact in any way.

public class UndertaleInstanceVars
{
    // unknown
}

public class UndertaleDebugChunkINST : UndertaleListChunk<UndertaleInstanceVars>
{
}

LOCL

Local variables - matches exactly with the second part of FUNC in data.win (but this time it's a normal PointerListChunk<>)

public class UndertaleDebugChunkLOCL : UndertaleListChunk<UndertaleCodeLocals>
{
}

STRG

works the same as in normal data.win.

⚠️ **GitHub.com Fallback** ⚠️