Xemu snapshot format - lgblgblgb/xemu GitHub Wiki

Xemu snapshot format definition

Xemu snapshot has a "standard skeleton" should be the same for every emulators, but of course it does not mean that a VIC-20 emulator can load a snapshot taken from a C65 :) This "skeleton" format is called "framing", and it divides the file into "blocks". The file itself does not have a "file level" header for identification, but there is a block level value for this. Since the first block of a file must be the "ident block", it almost the same. Other blocks in the snapshot though may vary depending on the emulator saved the snapshot.

Each block has the two parts: header and data. Header is the same structure for all blocks in the file. Interpretation of the data part is up to the emulator target and block purpose though.

Warnings and definitions

Definitions: BE32 means big-endian 32 bit machine word, so you can imagine BE16 and BE64 too. The format uses a fixed endian, to be able to have portable snapshot files between architectures with different byte order.

Warning: I am sure, the exact format will change, especially the block data payloads. The current goal is to have some "barely working" snapshot feature without too much precise state save/restore. Later more and more info should be added. Generally, you can't expect the emulator to be able to load snapshot taken with a previous version, since it's in early development stage. Later, the version information can be used to support older (but not very old ...) versions to be handled as well, though, if the format and the emulator is mature enough.

Block headers

All blocks in the file has the following header:

Header offs	Size	Field	Comments
+0	8 bytes	block-framing-id	the ASCII values of "XemuSnap" without quotes and NUL terminator
+8	BE32	framing-version	version of framing, must use the current version, in all blocks, this version number only changes, if block header specification changes in the future
+12	BE32	block-flags	block flags, currently all bits should be zero (in the future bit-0 will signal compressed block payload. loader must refuse block/file with unknown bits (to it) set to '1'
+16	BE32	block-version	version of the block data payload interpretation (not the framing) for the given block!
+20	byte	header-size	size of the header of the current block
+21	? bytes	block-identify	the identify of the block as ASCII string, without NUL terminator. Its size is calculated with subtracting the fixed size part of the header from the header-size field. This field identifies the purpose of the block.

Data part of blocks, sub-blocks

After the header, each blocks has the data part. Every block has it, though it's possible that the block effectively does not carry any information in its data part. The theory is the following:

Data part of blocks are divided into primitive sub-blocks. Each sub block has a single BE32 "header" defining the size of the sub-block, excluding the BE32 itself. The data part of the block must be end with a BE32 with value of the zero. That is, a block with no data payload at all has a single BE32 with zero value after the block header itself as the block data. The sub-block data structure is up to the given block-type (it's also possible that the handler uses internal header like info - not handled by the core snapshot parser - for the given sub-block).

The purpose of sub-blocks can be to encode state of more, similar emulated hardware elements, like two CIAs in the C64. Other example: memory content of different part of the system memory map. This would require multiple blocks, but with this solution, you only need one block, with multiple sub-blocks though.

Note, a given block (with identify, version) is valid within the scope of an Ident block. That is, maybe the same block type (identify string even version of the block is same) is used by two Xemu emulation targets, but it does not mean that even the block interpretation is the very same!!!

Block of Ident

There is one strict format level rule though: the first block must be the Ident block, to help the emulator to decide if it is its format or not. This block has - of course - the standard block header, with block-identify field with the value of Ident:UUID (non-NUL terminated ASCII values). The current block-version is zero.

The "UUID" value should identify the emulator uniquely globally, for example the C65 and M65 emulators in my Xemu project, for example:

Ident:github.com/lgblgblgb/xemu@c65
Ident:github.com/lgblgblgb/xemu@m65

From the view point of the format, the only important part which identifies the block type is the "Ident:" part, not the rest! The rest should be handled by the emulator itself, not the format parser. This allows the format to be used by multiple emulators, and maybe other projects as well than Xemu.

Each sub-blocks data - if any (it's not compulsory!) - contains a single byte identifying the purpose of the sub block and some data. It's allowed to have an ident block with NO data at all (that is, block-data payload is a single BE32 of zero value). Sub block types for Ident:

id-byte	content	meaning
0	BE64	64 bit Unix timestamp of the creation of the snapshot
1	? bytes	non-NUL terminated UTF-8 string of free-format textual comment
2	? bytes	non-NUL terminated UTF-8 string of free-format HTML-like encoding comment, info

Currently, I don't implement these at all. Since sub-block "headers" contains the size, they can be simply skipped without the need of any parsing!

Reading blocks

The snapshot file is divided into blocks, no other extra information can be at the end "outside of any block". However, for future purposes, it's supported to append more snapshot files together. That is, in case of an Ident block other than the very first (or at the end of file) the parsing must end, currently.

Reading a given block is about reading the fixed size part of the header, sanity checking ("XemuSnap", version, etc), then calculating the size of identify string from size header, and reading that too. If the block is Ident block (first block in file must be that, otherwise it's an error!) and there was already another ident block before, it means end of snapshot, without format violation, the rest of the file should not be read/parsed/etc. Selection of the block handler is based on the identify string. If it's the first block and not an ident block, it's an error. The block handler (already the data section!) should read a BE32, if it's zero, end of block. If it's not zero, that amount of bytes should be read, and passed to the sub-block parser, or something like that. Etc.

Surely, a mature implementation of this format should use "read cache" to avoid reading short data, like a BE32 only. Xemu currently does not use too sophisticated algorithm for this :) One of the design goals, was to able to read and write this file linearly, without "going back" or too much write caching (ie: no data size in the block header, since you may not know already with many sub blocks, etc). Currently, there is not so much CRC, or other protection ...

65xx-like CPU state block

Current block-version is 0. Block-ident string is "CPU:65xx". Data payload format of a single sub-block is:

offs	type	name	comment
0	byte	cpu	cpu identifier: 0=NMOS 6502, 1=CMOS 65C02, 2=65CE02/4510
1	BE16	PC	PC
3	byte	A	Accumlator of the CPU
4	byte	X	X-register of the CPU
5	byte	Y	Y-register of the CPU
6	byte	S	stack pointer of the CPU
7	byte	P	flags of the CPU
32	BE32	IRQ	IRQ input info
36	BE32	NMI	NMI input info
40	BE16	PC-old	PC value of previous opcode (may be not used?)
42	byte	cyc	Execution time (in cycles) of the previous opcode
43	byte	opc	Previous opcode

If CPU is "2", it continues as (note, that there is no continuous offset numbering, there is "hole"):

offs	type	name	comment
64	byte	Z	Z register
65	byte	B	base-page
66	byte	SPHI	stack-page
96	BE32	NOIRQ	IRQ inhibit state

Currently, only one sub-block is expected to have. Other information, like MAP memory mapping, etc with eg 4510 is not stored here, as from the view point of the emulator, it's not the CPU part (and Xemu design is about having call-backs for load/save state for components, so the CPU emulator can only restore its state, not other parts'), but the machine state. CPU 65xx state info is currently expected to be exactly 256 bytes of sub-block size, though most if it is not used too much.

Memory content block

Identify string is "Memory".

General purpose memory content block. Each sub-blocks has a short header (but for sure, from the point of view of the format, that is already the data part, not the sub-block header, so its size is in the sub-block size BE32):

data	description
BE32	Memory block address
BE32	Memory block "type"
BE32	Memory block flags
BE32	Sub-block data "real" size
....	this is where data begins, that is `sub_block_length - 16` bytes of data follows

Sub-block data real size, is the size of the actual data, which can be different than the actual stored number of bytes of these BE32s if it's compressed (in the future?). It's the original size, after decompression, and must be not used for decoding the format, for that, as always, the format level sub-block-length field must be used!

Other blocks

Surely, it's not enough to encode the state of an emulator, but other block types are too emulator specific to give a cross-emulation description of them.