TOSEC - cressie176/Load64 GitHub Wiki

Overview

TOSEC (The Old School Emulation Center) is a retrocomputing initiative dedicated to the cataloguing and preservation of software, firmware, and resources for retro systems. The TOSEC DAT files for the Commodore 64 are used to seed the LoadC64 Catalogue — matching ROM files by SHA-1 hash and extracting structured metadata from TOSEC filenames.

This page documents the TOSEC Naming Convention (TNC v4, 2015-03-23) as it applies to C64 software, with notes on how each field maps to LoadC64 concepts.

The authoritative specification is at https://www.tosecdev.org/tosec-naming-convention.

Filename Structure

A TOSEC filename encodes all metadata about a ROM image. The full structure is:

Title version (demo) (date)(publisher)(system)(video)(country)(language)(copyright)(devstatus)(media type)(media label)[dump flags][more info].ext

Only Title, (date), and (publisher) are mandatory. All other fields are optional and appear in the order shown above when present.

Two bracket types are used throughout:

Brackets Purpose
( ) Classification metadata (who, when, what)
[ ] Dump information and supplementary detail

Worked example

Last Ninja, The (1987)(System 3)(PAL)[cr Steve][!].d64
Segment Field Value
Last Ninja, The Title Last Ninja, The
(1987) Date 1987
(System 3) Publisher System 3
(PAL) Video PAL
[cr Steve] Dump flag Cracked by Steve
[!] Dump flag Verified good dump

Title Field

Mandatory. The display title of the software.

  • Articles (The, A, An, De, Die, Le, La, Les) are moved to the end, preceded by a comma and space: The Last NinjaLast Ninja, The.
  • Subtitles are appended after - (space-hyphen-space): Robocop - The Last Chapter.
  • Forbidden characters are never present: / \ ? : * " < > |
  • Apostrophes and hyphens are permitted.
  • Numbers and symbols are permitted.

Article normalisation

The seeding utility must reverse article normalisation to recover a natural display title:

TOSEC title Recovered title
Last Ninja, The The Last Ninja
Legend of Tosec, A A Legend of Tosec
Atic Atac Atic Atac

Detection rule: if the title ends with , The or , A or , An, move the suffix to the front.

Version Field

Optional. Appears immediately after the title, before the first parenthesis.

Format Example
v x.yy v1.0
v x.yyb v1.03b
Rev N Rev 1
vYYYYMMDD v20000101

The seeding utility should capture this as part of the title or store it separately for reference, but it does not map to a LoadC64 Game field.

Demo Field

Optional. Appears after version, before the date.

Value Meaning
(demo) General demonstration
(demo-kiosk) Demo intended for kiosk/retail
(demo-playable) Playable portion of a game
(demo-rolling) Non-interactive rolling demo
(demo-slideshow) Non-interactive slideshow

Seeding note: Demo entries should be excluded from the LoadC64 Catalogue — they are not complete games.

Date Field

Mandatory. Always the first parenthesised field after title/version/demo.

Format Meaning
(19xx) Year unknown, 1900s
(200x) Year unknown, 2000s
(1986) Known year
(2001-01) Known year and month
(1986-06-21) Full date
(19xx-12-Dx) Partial information (day unknown within month)

Seeding mapping: Extract the four-digit year where known. If the year contains x (e.g. 19xx, 198x), treat as unknown and omit the year field.

Publisher Field

Mandatory. Always the second parenthesised field.

Value Meaning
(-) Publisher unknown
(Devstudio) Single publisher
(Delphine - U.S. Gold) Multiple publishers, alphabetical order
(Smith, Robert) Individual person

Seeding mapping: Map to publisher. If the value is (-), set publisher to null. Strip outer parentheses.

System Field

Optional. Used in multi-system DATs to indicate which hardware variant an image is for.

Common C64-relevant values: (C64), (C128), (+4), (VIC-20).

Seeding note: For C64-specific DATs this field is typically absent. If present and not C64, the entry should be excluded from the C64 catalogue.

Video Field

Optional. Specifies the TV standard when it cannot be inferred from the country.

Value Meaning
(NTSC) NTSC
(PAL) PAL
(PAL-60) PAL-60
(NTSC-PAL) Dual standard
(PAL-NTSC) Dual standard

Seeding mapping: Map to colour_encoding:

TOSEC video LoadC64 colour_encoding
PAL pal
PAL-60 pal
NTSC ntsc
NTSC-PAL unknown
PAL-NTSC unknown
absent infer from country (see below)

Country Field

Optional. ISO 3166-1 alpha-2 codes, uppercased. Multiple countries separated by - in alphabetical order: (DE-FR), (EU-US).

Selected codes relevant to C64 software:

Code Country/Region
AT Austria
AU Australia
BE Belgium
CA Canada
DE Germany
DK Denmark
ES Spain
EU Europe
FI Finland
FR France
GB Great Britain
IT Italy
JP Japan
NL Netherlands
NO Norway
NZ New Zealand
PL Poland
PT Portugal
SE Sweden
US United States
ZA South Africa

Seeding mapping — colour_encoding inference from country when video field is absent:

Country code(s) Inferred colour_encoding
US, CA, JP ntsc
EU, GB, DE, FR, ES, IT, NL, SE, NO, DK, FI, AT, BE, PT, AU, NZ, ZA, PL pal
Multiple mixed regions unknown
Absent unknown

Language Field

Optional. ISO 639-1 codes, lowercased. Multiple languages in alphabetical order: (en-fr). More than two: (M3) etc.

English is the default — absence of a language field implies English or language-neutral.

Seeding note: Language is not a LoadC64 Game field. Capture in TOSEC notes only.

Copyright Status Field

Optional.

Code Meaning
(CW) Cardware
(FW) Freeware
(GW) Giftware
(LW) Licenceware
(PD) Public Domain
(SW) Shareware
(SW-R) Shareware Registered

Seeding note: Not a LoadC64 field. May be useful for filtering — PD and Freeware titles are unambiguously safe to include.

Development Status Field

Optional.

Value Meaning
(alpha) Early test build
(beta) Feature-complete test
(preview) Near-complete
(pre-release) Near-complete
(proto) Unreleased prototype

Seeding note: Prototypes and betas represent distinct ROM variants and should be included as separate ROMSet entries under their game, but flagged in the label field (e.g. "Beta (1986)").

Media Type Field

Optional. Describes multi-part media.

Value Format example Meaning
Disk (Disk 1 of 3) Magnetic disk
Disc (Disc 1 of 2) Optical disc
Tape (Tape 1 of 2) Magnetic tape
Side (Side A) / (Side B) Tape or disk side
File (File 1 of 2) Individual file
Part (Part 1 of 3) Numbered part

Seeding mapping: The total count (of N) determines how many ROM files belong to a ROMSet. Individual files with the same title and publisher that form a set should be grouped into a single ROMSet. The media type and number map to roms[].label (e.g. "Disk 1", "Side A").

Media Label Field

Optional. The last parenthesised field before square brackets. Describes the label printed on a physical disk, used to identify which disk to insert at runtime.

Examples:

  • (Disk 1 of 2)(Program)
  • (Disk 2 of 2)(Data)
  • (Disk 3 of 3)(Character Disk)

Seeding mapping: Append to roms[].label for disambiguation where needed: "Disk 1 – Program".

Dump Info Flags

Square bracket flags [ ] describing the image's condition or modification history. Multiple flags on a single file are ordered: modification flags first (alphabetically), then dump process flags.

Modification flags

Flag Full form examples Meaning
[cr] [cr], [cr Cracker] Copy protection removed (cracked)
[f] [f], [f Fix], [f Fix Fixer] Fixed to run in a non-standard environment
[h] [h], [h Hack], [h Hack Hacker] Intro, sprites, or text altered (hacked)
[m] [m], [m Modification] Unintended modification (e.g. save state)
[p] [p], [p Pirate] Unlicensed copy
[t] [t], [t +3 Trainer] Cheat/trainer added
[tr] [tr fr], [tr de-partial Translator] Translated to another language

Dump process flags

Flag Full form examples Meaning
[o] [o] Over dump — more data than expected
[u] [u] Under dump — less data than expected
[v] [v], [v Virus Name] Image contains a virus
[b] [b], [b Descriptor] Bad dump — known damage
[a] [a], [a2], [a3] Alternate dump — variant of the original
[!] [!] Verified good dump

Numbering: Multiple instances of the same flag are numbered from the second: [a], [a2], [a3]. There is no [a1].

Multiple crackers/hackers: Separated by - within the flag: [h PDX - TRSi].

Seeding filtering rules

The seeding utility should apply the following rules based on dump flags:

Flag present Action
[!] Include — verified good dump; preferred ROMSet
[b] Exclude — bad dump; unreliable data
[v] Exclude — virus present
[o] Exclude — over dump; corrupt or inaccurate
[u] Exclude — under dump; incomplete
[cr] Include — cracked; common on C64, often the only dump available; use as separate ROMSet
[t] Include — trained; use as separate ROMSet with label noting trainer
[h] Include with caution — hacked; include only if no clean dump exists; label accordingly
[f] Include — fixed; often required to run on modern emulators; use as separate ROMSet
[tr] Include — translated; use as separate ROMSet with language noted in label
[m] Exclude — unintended modification; data unreliable
[p] Include — pirated copy may be the only dump; use as separate ROMSet
[a] Include — alternate dump; use as separate ROMSet

More Info Field

Square bracket field [ ] appearing after all dump flags. Free-text supplementary information not covered by other fields.

Examples:

  • [aka House of TOSEC] — alternate name
  • [Req TRS-DOS] — software requirement
  • [source code] — source code image
  • [data disk] — data-only disk
  • [docs] — documentation disk

Seeding mapping: [aka ...] values are useful for matching games that are known under alternate titles. [data disk], [docs], and [source code] entries should be excluded from the game catalogue as they are not playable software.

Multi-Image Sets (Compilations)

Multiple programs in one filename, separated by & (space-ampersand-space):

Amidar (19xx)(Devstudio) & Amigos (1987)(Mr. Tosec)

Each segment carries its own metadata. Global flags (shared by all programs in the set) are separated by - before the flag:

Amidar (19xx)(Devstudio) & Amigos (1987)(Mr. Tosec) -(PD)[!]

Seeding note: Multi-image compilations in a single file represent a different use case from a multi-disk game. A compilation file should not be added to the catalogue as a game — its component titles may already exist as individual entries. Skip compilation entries during seeding.

ROM File Extensions

Relevant extensions for C64 software:

Extension Media type
.d64 1541 disk image
.d71 1571 disk image
.d81 1581 disk image
.t64 Tape image (container)
.tap Raw tape image
.prg Executable program file
.crt Cartridge image
.g64 GCR-encoded disk image
.nib Nibble-encoded disk image
.p00 PC64 program file

Parsing Algorithm

The seeding utility should parse a TOSEC filename as follows:

  1. Strip the file extension.
  2. Split on & — if more than one segment results, this is a compilation; skip it.
  3. Extract dump info flags: match all [...] tokens from the right of the string. Remove them from the working string.
  4. Extract more info flags: these are [...] tokens that follow all dump flags. In practice, extract all [...] tokens; classify by content.
  5. Extract parenthesised fields: match all (...) tokens from the string in order. Remove them from the working string.
  6. The remaining text is the title (plus optional version and demo fields).
  7. Parse the title for a trailing version string (v\d, Rev \d, vYYYYMMDD).
  8. Parse the title for a trailing demo flag ((demo...) — already removed in step 5, but check the ordered list).
  9. Parse parenthesised fields in order:
    • Field 1: date
    • Field 2: publisher
    • Field 3+: system, video, country, language, copyright, devstatus, media type, media label — identify by content pattern

Parenthesis field identification rules (field 3 onwards)

Match each (value) token against these patterns in order — use the first match:

Pattern Field
Matches date pattern (\d{4}, 19xx, 200x, etc.) date (already consumed as field 1)
Matches known system token (C64, C128, VIC-20, +4, etc.) system
Matches known video token (PAL, NTSC, PAL-60, etc.) video
Matches 2-letter uppercase country code or known region (EU, US, etc.), optionally hyphenated country
Matches 2-letter lowercase language code, optionally hyphenated, or M\d language
Matches copyright token (PD, SW, FW, GW, LW, CW, SW-R, GW-R, CW-R) copyright
Matches devstatus token (alpha, beta, preview, pre-release, proto) devstatus
Matches media type pattern (Disk \d+ of \d+, Tape \d+ of \d+, Side [AB], etc.) media type
Any remaining token media label

Mapping to LoadC64 Catalogue Fields

TOSEC field LoadC64 field Notes
Title title Reverse article normalisation (, TheThe )
Date year Extract 4-digit year; omit if contains x
Publisher publisher Strip parentheses; set null if (-)
Video colour_encoding See video mapping table above
Country colour_encoding Fallback when video absent; see country inference table above
Devstatus romsets[].label Append to label: e.g. "Beta (1986)"
Media type roms[].label e.g. "Disk 1", "Side A"
Media label roms[].label Append for disambiguation
Dump flags romsets[].label Summarise notable flags in label: e.g. "PAL release [cr]", "Trained +3"
SHA-1 roms[].sha1 Taken from DAT file, not computed from filename
TOSEC filename third_party_ids.tosec Store the full filename (without extension) as the TOSEC ID

colour_encoding derivation priority

  1. Video field if present
  2. Country field inference if video absent
  3. unknown if neither is present or inference is ambiguous

true_drive_emulation default

TOSEC provides no TDE flag. Default to false; allow manual override in the catalogue.

multiplayer_mode default

TOSEC provides no multiplayer data. Default to unknown; allow manual override in the catalogue.

ROMSet label construction

Build a human-readable romsets[].label from available context:

<video or country> <devstatus> <notable dump flags>

Examples:

Filename flags Label
(PAL)[!] PAL
(NTSC)[cr] NTSC [cr]
(EU)(beta) EU beta
(US)[t +3 Trainer] US [t]
(GB)[f NTSC] GB [f]
[a] Alternate
[a2] Alternate 2

Exclusion Rules Summary

The seeding utility should skip a TOSEC entry if any of the following apply:

  • The filename contains & (compilation)
  • The demo field is present (demo version)
  • Any [more info] flag contains data disk, docs, or source code
  • Dump flags include [b], [v], [o], or [u]
  • The system field is present and is not C64 (or the expected target platform)
  • The [m] (modified) flag is present
⚠️ **GitHub.com Fallback** ⚠️