Xemu snapshot format - lgblgblgb/xemu GitHub Wiki
Xemu snapshot format definition
Xemu snapshot has a "standard skeleton" should be the same for every emulators, but of course it does not mean that a VIC-20 emulator can load a snapshot taken from a C65 :) This "skeleton" format is called "framing", and it divides the file into "blocks". The file itself does not have a "file level" header for identification, but there is a block level value for this. Since the first block of a file must be the "ident block", it almost the same. Other blocks in the snapshot though may vary depending on the emulator saved the snapshot.
Each block has the two parts: header and data. Header is the same structure for all blocks in the file. Interpretation of the data part is up to the emulator target and block purpose though.
Warnings and definitions
Definitions: BE32
means big-endian 32 bit machine word, so you can imagine BE16
and BE64
too. The format uses a fixed endian, to be able to have portable snapshot files between architectures with different byte order.
Warning: I am sure, the exact format will change, especially the block data payloads. The current goal is to have some "barely working" snapshot feature without too much precise state save/restore. Later more and more info should be added. Generally, you can't expect the emulator to be able to load snapshot taken with a previous version, since it's in early development stage. Later, the version information can be used to support older (but not very old ...) versions to be handled as well, though, if the format and the emulator is mature enough.
Block headers
All blocks in the file has the following header:
Header offs | Size | Field | Comments |
---|---|---|---|
+0 | 8 bytes | block-framing-id | the ASCII values of "XemuSnap" without quotes and NUL terminator |
+8 | BE32 | framing-version | version of framing, must use the current version, in all blocks, this version number only changes, if block header specification changes in the future |
+12 | BE32 | block-flags | block flags, currently all bits should be zero (in the future bit-0 will signal compressed block payload. loader must refuse block/file with unknown bits (to it) set to '1' |
+16 | BE32 | block-version | version of the block data payload interpretation (not the framing) for the given block! |
+20 | byte | header-size | size of the header of the current block |
+21 | ? bytes | block-identify | the identify of the block as ASCII string, without NUL terminator. Its size is calculated with subtracting the fixed size part of the header from the header-size field. This field identifies the purpose of the block. |
Data part of blocks, sub-blocks
After the header, each blocks has the data part. Every block has it, though it's possible that the block effectively does not carry any information in its data part. The theory is the following:
Data part of blocks are divided into primitive sub-blocks. Each sub block has a single BE32 "header" defining the size of the sub-block, excluding the BE32 itself. The data part of the block must be end with a BE32 with value of the zero. That is, a block with no data payload at all has a single BE32 with zero value after the block header itself as the block data. The sub-block data structure is up to the given block-type (it's also possible that the handler uses internal header like info - not handled by the core snapshot parser - for the given sub-block).
The purpose of sub-blocks can be to encode state of more, similar emulated hardware elements, like two CIAs in the C64. Other example: memory content of different part of the system memory map. This would require multiple blocks, but with this solution, you only need one block, with multiple sub-blocks though.
Note, a given block (with identify, version) is valid within the scope of an Ident block. That is, maybe the same block type (identify string even version of the block is same) is used by two Xemu emulation targets, but it does not mean that even the block interpretation is the very same!!!
Block of Ident
There is one strict format level rule though: the first block must be the Ident block
, to help the emulator to decide if it is its format or not. This block has - of course - the standard block header, with block-identify field with the value of Ident:UUID
(non-NUL terminated ASCII values). The current block-version is zero.
The "UUID" value should identify the emulator uniquely globally, for example the C65 and M65 emulators in my Xemu project, for example:
- Ident:github.com/lgblgblgb/xemu@c65
- Ident:github.com/lgblgblgb/xemu@m65
From the view point of the format, the only important part which identifies the block type is the "Ident:" part, not the rest! The rest should be handled by the emulator itself, not the format parser. This allows the format to be used by multiple emulators, and maybe other projects as well than Xemu.
Each sub-blocks data - if any (it's not compulsory!) - contains a single byte identifying the purpose of the sub block and some data. It's allowed to have an ident block with NO data at all (that is, block-data payload is a single BE32 of zero value). Sub block types for Ident:
id-byte | content | meaning |
---|---|---|
0 | BE64 | 64 bit Unix timestamp of the creation of the snapshot |
1 | ? bytes | non-NUL terminated UTF-8 string of free-format textual comment |
2 | ? bytes | non-NUL terminated UTF-8 string of free-format HTML-like encoding comment, info |
Currently, I don't implement these at all. Since sub-block "headers" contains the size, they can be simply skipped without the need of any parsing!
Reading blocks
The snapshot file is divided into blocks, no other extra information can be at the end "outside of any block". However, for future purposes, it's supported to append more snapshot files together. That is, in case of an Ident block other than the very first (or at the end of file) the parsing must end, currently.
Reading a given block is about reading the fixed size part of the header, sanity checking ("XemuSnap", version, etc), then calculating the size of identify string from size header, and reading that too. If the block is Ident block (first block in file must be that, otherwise it's an error!) and there was already another ident block before, it means end of snapshot, without format violation, the rest of the file should not be read/parsed/etc. Selection of the block handler is based on the identify string. If it's the first block and not an ident block, it's an error. The block handler (already the data section!) should read a BE32, if it's zero, end of block. If it's not zero, that amount of bytes should be read, and passed to the sub-block parser, or something like that. Etc.
Surely, a mature implementation of this format should use "read cache" to avoid reading short data, like a BE32 only. Xemu currently does not use too sophisticated algorithm for this :) One of the design goals, was to able to read and write this file linearly, without "going back" or too much write caching (ie: no data size in the block header, since you may not know already with many sub blocks, etc). Currently, there is not so much CRC, or other protection ...
65xx-like CPU state block
Current block-version is 0. Block-ident string is "CPU:65xx". Data payload format of a single sub-block is:
offs | type | name | comment |
---|---|---|---|
0 | byte | cpu | cpu identifier: 0=NMOS 6502, 1=CMOS 65C02, 2=65CE02/4510 |
1 | BE16 | PC | PC |
3 | byte | A | Accumlator of the CPU |
4 | byte | X | X-register of the CPU |
5 | byte | Y | Y-register of the CPU |
6 | byte | S | stack pointer of the CPU |
7 | byte | P | flags of the CPU |
32 | BE32 | IRQ | IRQ input info |
36 | BE32 | NMI | NMI input info |
40 | BE16 | PC-old | PC value of previous opcode (may be not used?) |
42 | byte | cyc | Execution time (in cycles) of the previous opcode |
43 | byte | opc | Previous opcode |
If CPU is "2", it continues as (note, that there is no continuous offset numbering, there is "hole"):
offs | type | name | comment |
---|---|---|---|
64 | byte | Z | Z register |
65 | byte | B | base-page |
66 | byte | SPHI | stack-page |
96 | BE32 | NOIRQ | IRQ inhibit state |
Currently, only one sub-block is expected to have. Other information, like MAP memory mapping, etc with eg 4510 is not stored here, as from the view point of the emulator, it's not the CPU part (and Xemu design is about having call-backs for load/save state for components, so the CPU emulator can only restore its state, not other parts'), but the machine state. CPU 65xx state info is currently expected to be exactly 256 bytes of sub-block size, though most if it is not used too much.
Memory content block
Identify string is "Memory".
General purpose memory content block. Each sub-blocks has a short header (but for sure, from the point of view of the format, that is already the data part, not the sub-block header, so its size is in the sub-block size BE32):
data | description |
---|---|
BE32 | Memory block address |
BE32 | Memory block "type" |
BE32 | Memory block flags |
BE32 | Sub-block data "real" size |
.... | this is where data begins, that is sub_block_length - 16 bytes of data follows |
Sub-block data real size, is the size of the actual data, which can be different than the actual stored number of bytes of these BE32s if it's compressed (in the future?). It's the original size, after decompression, and must be not used for decoding the format, for that, as always, the format level sub-block-length field must be used!
Other blocks
Surely, it's not enough to encode the state of an emulator, but other block types are too emulator specific to give a cross-emulation description of them.