EEF - hpgDesigns/hpgdesigns-dev.io GitHub Wiki

ENIGMA's Extensible Enumeration Format (EEF) is a format created to allow plain-text storage of a variety of data in a linear fashion, without compromising the ability to extend the format later without invalidating outdated readers.

Specification

Structure

The format is based on newlines with length-prefixing. Information stored in this format is organized into Data Blocks, which assign definite length patterns.

Comments

EEF comments are defined to commence with any single consistent combination of the following symbols: # ; % / ', and to be terminated by the end of the line. A file defines what comment combination it will be using by starting the file with those symbols as the very first symbols in the file, or if none are specified, it is assumed the file has no comments. The remainder of the file must consistently use that comment combination. This is to allow for the storage of multi-lingual code-snippets without alienating generic EEF readers as a parse option.

A valid comment symbol combination can be as simple as ; (as in Assembly), # (as in bash) or // (as in C), or can be as complex as /%#%/ (for the express purpose of not conflicting with the aforementioned languages), however, remember, in the latter case, the comment begins with the full /%#%/ and ends with a newline. Multi-line comments are not available - instead, simply precede each comment line with the comment sequence.

As of version one of the EEF specification, only one comment symbol pattern can be defined per file. This restriction may be revised later.

After the first non-comment line in the file, no further lines can be left entirely commented unless said lines would be otherwise blank. Hence, these lines will be exclusively attribute lines: a comment cannot appear as the complete line anywhere that a reader would expect to find a Data Block indicator. This is done to promote ease of skipping unknown Data Blocks.

Quotes

The EEF specification defines quotes for use in Small Attribute listing. The double quote symbol (") or single quote symbol (') may be used except where the latter is defined as a Comment pattern. The escape sequences \\, \" and \' are defined; all others are to be handled application-side.

Indentation

Indentation is optional, and should be done as-desired to make the format visually pleasing and computationally simple. Generally, where unimportant, these indentation characters will be discarded by readers. Many indentation characters are supported, including tab ('\t'), space (' '), and any combination thereof.

Data Blocks

Similar to binary data chunks in the PNG specification, EEF Data Blocks are multi-line sections of text defined to contain at most one of the following:

  1. A set of {N} subsequent Data Entries
  2. A set of [N] Attribute Lines.

Data Block Indicator

These blocks are set off by a single line which names the Data Type. The format for a data block indicator is as follows:

  Type`` ``Name (plural) {Number of contained Data`` ``Blocks}

The data block indicator may appear on its own at the root of the file or after any Data Entry Indicators

Data Entries

Data entries are typically members of data blocks, but are permitted to appear at the root of the file if no higher level is required. Data entries define an instance of their Data

Data Entry Indicator

Data entry indicators

  Type`` ``Name(primary`` ``id): Small`` ``Attributes or [Number of additional Line`` ``Attributes]

Attributes

EEF specifies two methods of associating attributes with each Data Block. Small data is typically stored in Small Attributes, while larger snippets of data are better suited to storage in Line Attributes

Small Attributes

Small Attributes are those which appear after a Data Entry Indicator; they can be completely scalar, or can be given a value using a pair of parentheses. For example, take this code:

   Examples{1} Example: "yellow" Shape(Round)

Our Example entry is declared with two attributes. The first is the valueless attribute "Yellow", which speaks for itself. The second is a Shape attribute, which has been set to the value "Round". Quotes may appear around any attribute, and are completely optional except where otherwise ambiguous (for instance, the attribute "lime green" must be quoted even though "yellow" may be left alone). Since attribute values are surrounded by parentheses, quotes are generally unnecessary in them.

Line Attributes

Line Attributes follow any Data Entries which define a non-zero count inside a pair of square brackets[]. Line Attributes are Attributes which are given an entire line to themselves. They may, semantically, instead be a single very large attribute which spans multiple lines, but this is up to the application to determine; this spec only mandates that the number of lines used is given up front for the purposes of skipping or isolating.

Type Name

The name of the members of the Data Block. When used in the Data Block Indicator, it should be pluralized. When used in a Data Entry Indicator, it should be singular. The difference between the singular and plural form shall be implementation-specific. By convention, the reader must warn if the plural string doesn't match the singular string using according to some regular expression generated from the singular string. This generated regular expression can be anything from /singulare?s?/ to /.*/. Since this requirement is purely conventional, no check must actually be made by the implementation for the latter expression.

More advanced plurality checks can be accomplished loosely by starting with the expression /singulare?s?/, and applying any expressions in the following table to the singular string:

Find Replace with To support
y [yi] Property, Entry, Assembly...
f [fv] Leaf, Hoof, Elf, Self, Knife...
ex (ex|ic) Vertex, Index
ix (ix|ic) Matrix, Appendix
ous (ous|ic) Mouse, Louse
oo (oo|ee) Goose, Tooth, Feet
on$ (on|a) Criterion, Phenomenon
um$ (um|a) Bacterium, Curriculum, Medium, Datum
is$ (is)? Axis, Analysis, Thesis
us$ (us|i|.ra) Cactus, Fungus, Octopus, Alumnus, Succubus... Genus, Corpus, Viscus
an$ [ae]n Man, Woman
$ (r?en)? Children, Oxen
$ x? Beau, Bureau, Tableau

There only plural not covered by the entirety of the above table is "person" -> "people".

Primary ID

Each Data Entry should give itself a name or ID, or some primary data affiliated with that particular entry. It does not necessarily have to be unique, or even scalar. The primary ID is simply a place to put the most important information about the entry.

Root

Any line which lies outside the scope of lines claimed by any of the above is considered to be at the root of the file.

Length-prefixing

In other extensible formats, such as PNG, length-prefixed chunks and chunk-IDs (or similar devices) are used to allow various readers to ignore sets of data they are incapable of reading. In EEF, different brackets are used to indicate various lengths.

  • Curly braces {} are used to indicate a number of repetitions of Data Blocks. The data that is repeated can be multiple lines, and the number of additional lines can vary. See the next point.
  • Square brackets [] are used in Data Entries to indicate that a fixed number of lines follows which describe or belong to the current data block.

Example File

## Sample EEF file
Items{3} ## The {} tells us we'll be reading 3 fields of varying size before this block is over.
  Item("Sword"): "Weapon" Damage(4) "Magic" Text[2] ## Note the use of [] to indicate lines we can skip if we don't know what this is.
    A magical sword
    Given to you by an elf or something.
  Item("Pen"): "Weapon" Damage(.5) Text[2]
    An ordinary pen
    Just a regular pen. Doesn't do much damage, but it's still mightier than the sword, right?
  Item("Apple"): "Food" Health(10) Text[2]
    A delicious apple
    Probably came from a tree, or something. Who knows. Restores 10HP and doesn't afraid of anything.
Collectables{1} ## Now that all three Items have been defined, we can end the file or declare new. We'll do the latter.
  Collectable("Watch fob"): Text[2]
    A beautiful golden watch fob
    It has absolutely no use.