Audio and Localization Formats - OpenKotOR/PyKotor GitHub Wiki

Audio and Localization Formats

The KotOR engine handles text, voice, and character sound effects through a set of interconnected formats. TLK (Talk Table) files store every localized string the game displays or speaks — dialogue lines, item names, journal entries, feedback messages [tlk_data.py]. SSF (Sound Set File) files map character combat and movement sounds to TLK entries. LIP files drive facial animation timing to match spoken dialogue. WAV files provide the raw audio data. Together, these formats implement the localization and voice-over pipeline that runs from authored content through to in-game playback. xoreos-tools mirrors that workflow from the tooling side with dedicated TLK conversion utilities and an explicit language-ID/encoding matrix for Aurora/Odyssey talk tables, which makes it a useful external corroboration source for KotOR localization behavior [Running xoreos-tools, Tlk2xml, TLK language IDs and encodings].

Contents


TLK — Talk Table

The Talk Table is the game's central string database. Every piece of text the player sees — dialogue, item descriptions, journal entries, feedback messages, character names — is stored in dialog.tlk and accessed by a numeric index called a StrRef [tlk_data.py L1–L35]. This design makes localization straightforward: translating the game means replacing one file rather than hunting through thousands of individual resources. Each entry can also reference a voice-over sound file (ResRef), so the engine can play spoken audio alongside the displayed text [TLKEntry]. xoreos's tlk2xml page is also useful here because it describes why TLK tooling cannot rely on the header alone to infer text encoding: the format family reuses language IDs differently across games, and tools often need either an explicit game target or an explicit code page [Tlk2xml, TLK language IDs and encodings].

To modify TLK entries in a mod, use TSLPatcher/HoloPatcher TLKList syntax — this appends new entries or patches existing ones without replacing the entire file, which is critical for mod compatibility. TLK entries are referenced from GFF resources (especially DLG dialogue files), 2DA tables, and SSF sound sets.

Table of Contents


File structure overview

TLK files store localized strings in a binary format. The game loads dialog.tlk at startup and references strings throughout the game using StrRef numbers (array indices). String lookups use the same resource resolution order as other resources (override, then module/SAV, then KEY/BIF), so custom TLKs in override or in a MOD take precedence over the base game dialog.tlk. External tooling reflects the same constraints: xoreos's tlk2xml can read the classic V3.0/V4.0 talk-table families and exposes game-specific switches such as --kotor and --kotor2 precisely because the encoding cannot be safely autodetected from TLK content alone [Tlk2xml].

Implementation: Libraries/PyKotor/src/pykotor/resource/formats/tlk/

Cross-reference implementations (line anchors are against master and may drift):

See also


Binary format

File Header

The file header is 20 bytes in size:

Name type offset size Description
file type char 0 (0x00) 4 Always "TLK " (space-padded)
file Version char 4 (0x04) 4 "V3.0" for KotOR, "V4.0" for Jade Empire
Language ID int32 8 (0x08) 4 Language identifier (see Concepts; TLK-specific notes)
string count int32 12 (0x0C) 4 Number of string entries in the file
string Entries offset int32 16 (0x10) 4 offset to string entries array (typically 20)

References:

String data table

The string data table contains metadata for each string entry. Each entry is 40 bytes:

Name type offset size Description
flags UInt32 0 (0x00) 4 bit flags: bit 0=text present, bit 1=sound present, bit 2=sound length present
Sound ResRef char 4 (0x04) 16 Voice-over audio filename (null-terminated, max 16 chars)
Volume Variance UInt32 20 (0x14) 4 Unused in KotOR (always 0) [tlk_data.py L27, io_tlk.py L214]
Pitch Variance UInt32 24 (0x18) 4 Unused in KotOR (always 0) [tlk_data.py L28, io_tlk.py L215]
offset to string UInt32 28 (0x1C) 4 offset to string text (relative to string Entries offset)
string size UInt32 32 (0x20) 4 Length of string text in bytes
Sound Length float 36 (0x24) 4 Duration of voice-over audio in seconds

References:

flag bits:

  • bit 0 (0x0001): Text present - string has text content
  • bit 1 (0x0002): Sound present - string has associated voice-over audio
  • bit 2 (0x0004): Sound length present - sound length field is valid

flag Combinations:

Common flag patterns in KotOR TLK files:

flags Hex Description Usage
0x0001 0x01 Text only Menu options, item descriptions, non-voiced dialog
0x0003 0x03 Text + Sound Voiced dialog lines (most common for party/NPC speech)
0x0007 0x07 Text + Sound + Length Fully voiced with duration data (cutscenes, important dialog)
0x0000 0x00 Empty entry Unused StrRef slots

The engine uses these flags to decide:

The soundlength-present flag (bit 2) is parsed into soundlength_present [io_tlk.py L110] but is noted as "Unused by KOTOR1 and 2" in PyKotor's writer [io_tlk.py L210]. The sound_length float stores duration in seconds but the engine does not appear to require the flag to be set to read that value.

Missing flags are treated as false by PyKotor's reader [io_tlk.py L108–L110] — if Text present is not set, the string is treated as empty even if text data exists.

String entries

String entries follow the string data table:

Name Type Description
string Text char[] null-terminated string data (UTF-8 or Windows-1252 encoded)

string text is stored at the offset specified in the string data table entry. The encoding depends on the language ID; see:


TLKEntry structure

Each TLK entry contains:

References:

Attribute Type Description
text str Localized text string
voiceover ResRef Voice-over audio filename (WAV file)
text_present bool Whether text content exists
sound_present bool Whether voice-over audio exists
soundlength_present bool Whether sound length is valid
sound_length float Duration of voice-over audio in seconds

string References (StrRef)

string references (StrRef) are integer indices into the TLK file's entry array:

  • StrRef 0: First entry in the TLK file
  • StrRef -1: No string reference (used to indicate missing/empty strings)
  • StrRef N: Nth entry (0-based indexing)

The game uses StrRef values throughout GFF files, scripts, and other resources to reference localized text. When displaying text, the game looks up the StrRef in dialog.tlk and displays the corresponding text.

Custom TLK files

Mods can add custom TLK files to extend available strings:

dialog.tlk structure:

  • Base game: dialog.tlk (read-only, ~50,000-100,000 entries)
  • Custom content: dialogf.tlk or custom TLK files placed in override

StrRef Ranges:

  • 0 to ~50,000: Base game strings (varies by language)
  • 16,777,216 (0x01000000) and above: Custom TLK range (TSLPatcher convention)
  • Negative values: Invalid/missing references

Mod Tools Approach:

TSLPatcher and similar tools use high StrRef ranges for custom strings:

Base [StrRef](Audio-and-Localization-Formats#string-references-strref):   0 - 50,000 (dialog.tlk)
Custom Range:  16777216+ (custom TLK files)

This avoids conflicts with base game strings and allows mods to add thousands of custom text entries without overwriting existing content.

Multiple TLK files:

The game can load multiple TLK files:

  1. dialog.tlk - Primary game text
  2. dialogf.tlk - Female-specific variants (polish K1 only)

Priority: Custom TLKs --> dialogf.tlk --> dialog.tlk


Localization

Numeric language IDs and typical encodings (0–8) are defined on Concepts. The notes below focus on TLK-specific behavior.

KotOR: Retail builds usually install one dialog.tlk per language; the file header’s Language ID is often ignored at runtime and is mainly useful for tools tagging which language a TLK was built for.

Encoding: Western languages (IDs 0–4) normally use Windows-1252 in legacy tooling; Polish (5) uses Windows-1250. IDs 6–8 use UTF-8 in the wider Aurora spec. String payloads may still be UTF-8 or Windows-1252 depending on the tool and target game—see the string entry description above.

Windows-1252 is a single-byte encoding (256 code points) and is often loosely called "ISO-8859-1" or cp1252.


Cross-reference: implementations

Layer PyKotor (master)
Layout spec (docstring) tlk_data.py L1–L39
TLK / TLKEntry tlk_data.py L54–L420
Binary read io_tlk.py L32–L156 (TLKBinaryReader L123–L156)
Binary write io_tlk.py L159–L219

Kotor.NET: TLKBinaryStructure.cs L16–L130.

See Cross-reference implementations under File structure overview for reone, xoreos, KotOR.js, and KotOR-Unity.

See also



SSF — Sound Set File

SSF files map 28 creature sound events — battle cries, selection acknowledgements, pain grunts, death screams — to StrRef indices into dialog.tlk. Each creature template references an SSF by ResRef, and the engine plays the associated voice-over line whenever the corresponding event fires. Slots that store -1 (0xFFFFFFFF) produce no sound.

To modify SSF files in mods, see the TSLPatcher SSFList syntax guide. SSF StrRefs resolve through the same TLK string table used for dialogue and UI text.

Table of Contents


File structure overview

SSF files define 28 logical sound slots (indices 027) that creatures use for battle cries, selection lines, grunts, UI feedback, etc. Each slot holds a StrRef into dialog.tlk (or -1 / 0xFFFFFFFF for “no sound”). SSF files load through the same resource resolution order as other resources (override, MOD/SAV, KEY/BIF).

On-disk size:

  • The header is 12 bytes; the semantic KotOR table is 28 × 4 = 112 bytes.
  • Some writers emit 40 × 4 = 160 bytes after the header (28 mapped slots plus 12 extra 0xFFFFFFFF words).
  • PyKotor’s writer does this (io_ssf.py L177–L181).
  • Kotor.NET models a 40-entry table in SSFBinaryStructure.cs SoundTable L61–L77.
  • Readers that only consume the first 28 entries after offset still match vanilla behavior.
  • KotOR.js SSFObject.Open derives soundCount = (length - 12) / 4 and therefore accepts either width.

Implementation (PyKotor): Libraries/PyKotor/src/pykotor/resource/formats/ssf/

Cross-reference implementations (line anchors are against master and may drift):

See also


Binary format

File Header

The file header is 12 bytes in size:

Name type offset size Description
file type char 0 (0x00) 4 Always "SSF " (space-padded)
file Version char 4 (0x04) 4 Always "V1.1"
offset to Sound Table UInt32 8 (0x08) 4 Byte offset to the first StrRef (almost always 12)

Sound Table

After the header, the file contains a contiguous array of little-endian int32 StrRefs. KotOR uses the first 28 entries (indices 027) as in the Sound event types table below. -1 or 0xFFFFFFFF means “no sound” for that slot.

Some files and tools use 40 uint32 entries (extra trailing -1 words). Treat anything beyond index 27 as padding unless you have a specific toolchain that assigns meaning to it.


Sound event types

Indices are fixed; do not reorder. PyKotor names are authoritative for this repo; KotOR.js SSFType uses the same numeric values with different spellings on a few rows (noted in the third column).

Index PyKotor SSFSound KotOR.js SSFType (if different) Role
0–5 BATTLE_CRY_1BATTLE_CRY_6 (same) Combat entry / battle cries
6–8 SELECT_1SELECT_3 (same) Creature selected
9–11 ATTACK_GRUNT_1ATTACK_GRUNT_3 ATTACK_1ATTACK_3 Attack animation grunts
12–13 PAIN_GRUNT_1PAIN_GRUNT_2 PAIN_1PAIN_2 Damage reactions
14 LOW_HEALTH (same) Low HP warning
15 DEAD (same) Death
16 CRITICAL_HIT (same) Critical hit feedback
17 TARGET_IMMUNE (same) Immune target
18 LAY_MINE (same) Place mine
19 DISARM_MINE (same) Disarm mine
20 BEGIN_STEALTH STEALTH Enter stealth
21 BEGIN_SEARCH SEARCH Search mode
22 BEGIN_UNLOCK UNLOCK Start unlock
23 UNLOCK_FAILED UNLOCK_FAIL Unlock failed
24 UNLOCK_SUCCESS (same) Unlock succeeded
25 SEPARATED_FROM_PARTY SOLO_MODE Left party / solo
26 REJOINED_PARTY PARTY_MODE Rejoined party
27 POISONED (same) Poisoned

Primary references:


Cross-reference: implementations

Component Location
SSF data model (SSF) ssf_data.py SSF L26–L121
SSF data model (SSFSound) SSFSound L123–L234
Binary I/O io_ssf.py (see File structure overview for line-level anchors)


LIP — Lip Synchronization

LIP files drive mouth animation for voiced dialogue. Each file stores a compact series of keyframes that pair a timestamp (in seconds) with a viseme index (0–15), and the engine interpolates between them while playing the companion WAV voice line. The LIP length field must match the WAV playback duration exactly, or the mouth animation will desynchronize. LIP files load through the same resource resolution order as other game assets.

LIP is always paired with a WAV of matching ResRef. Dialogue nodes in DLG trigger the voice line, which in turn triggers the LIP animation on the speaking creature model.

Table of Contents


File structure overview

  • LIP files are always binary ("LIP V1.0" signature) and contain only animation data.

  • They are paired with WAV voice-over resources of identical duration; the LIP length field must match the WAV data playback time for glitch-free animation.

  • keyframes are sorted chronologically and store a timestamp (float seconds) plus a 1-byte viseme index (0–15).

  • The layout is identical across these implementations (header / keyframe offsets below are cross-confirmed):

Implementation (PyKotor):

Comparable LIP implementations include reone's C++ parser (lipreader.cpp L27+), xoreos's Aurora reader (lipfile.cpp), KotOR.js's LIP object and binary reader (LIPObject.ts L26+, readBinary L112-L136), and Kotor.NET's in-memory DTO layer (LIP.cs L11+).

See also


Binary format

Header

Name type offset size Description
file type char 0 (0x00) 4 Always "LIP "
file Version char 4 (0x04) 4 Always "V1.0"
Sound Length float32 8 (0x08) 4 Duration in seconds (must equal WAV length)
Entry count UInt32 12 (0x0C) 4 Number of keyframes immediately following

The LIP header layout is corroborated by reone's parser (lipreader.cpp L27-L42).

keyframe Table

keyframes follow immediately after the header; there is no padding.

Name type Offset (per entry) size Description
Timestamp float32 0 (0x00) 4 Seconds from animation start
Shape uint8 4 (0x04) 1 Viseme index (0–15)
  • Entries are stored sequentially and must be sorted ascending by timestamp.

  • Libraries average multiple implementations to validate this layout:

KotOR.js's binary reader shows the same contiguous keyframe-table decode (LIPObject.ts readBinary L112-L136).


Mouth Shapes (Viseme Table)

KotOR reuses the 16-shape Preston Blair phoneme set. Every implementation agrees on the byte value assignments; KotOR.js only renames a few labels but the indices match.

Value Shape Description
0 NEUTRAL Rest/closed mouth
1 EE Teeth apart, wide smile (long "ee")
2 EH Relaxed mouth ("eh")
3 AH Mouth open ("ah/aa")
4 OH Rounded lips ("oh")
5 OOH Pursed lips ("oo", "w")
6 Y Slight smile ("y")
7 STS Teeth touching ("s", "z", "ts")
8 FV Lower LIP touches teeth ("f", "v")
9 NG Tongue raised ("n", "ng")
10 TH Tongue between teeth ("th")
11 MPB Lips closed ("m", "p", "b")
12 TD Tongue up ("t", "d")
13 SH Rounded relaxed ("sh", "ch", "j")
14 L Tongue forward ("l", "r")
15 KG Back of tongue raised ("k", "g", "h")

PyKotor's LIPShape table and related phoneme helpers define the same core viseme set used throughout the surrounding LIP tooling ecosystem.


animation Rules


PyKotor's LIP reader and data model (io_lip.py, lip_data.py) stay aligned with the same header and keyframe encoding used by reone (lipreader.cpp L27+), xoreos (lipfile.cpp), KotOR.js (LIPObject.ts L26+), and Kotor.NET (LIP.cs L11+).


KotOR WAV file format Documentation

KotOR stores both standard WAV voice-over lines and Bioware-obfuscated sound-effect files. Voice-over assets are regular RIFF containers with PCM headers, while SFX assets prepend a 470-byte custom block before the RIFF data. PyKotor handles both variants transparently. WAV files are resolved using the same resource resolution order as other resources:

Hex type id 0x0004 is listed under Resource type identifiers.

Implementation (PyKotor)

Cross-reference implementations (line anchors are against master and may drift):

For mod developers: WAV files are referenced from:

See HoloPatcher README for Mod Developers.

Related formats:

Table of Contents


File types

Type Usage Description
VO (Voice-over) Dialogue lines (*.wav referenced by TLK StrRefs). Plain RIFF/WAVE PCM files readable by any media player.
SFX (Sound effects) Combat, UI, ambience, .wav files under StreamSounds/SFX. Contains a Bioware 470-byte obfuscation header followed by the same RIFF data.

PyKotor exposes these via the WAVType enum (VO vs. SFX) so tools know whether to insert/remove the proprietary header (io_wav.py:52-121).


Standard RIFF/WAVE structure

KotOR sticks to the canonical RIFF chunk order:

Offset Field Description
0 (0x00) "RIFF" Chunk ID
4 (0x04) <uint32> file size minus 8
8 (0x08) "WAVE" format tag
12 (0x0C) "fmt " format chunk ID
16 (0x10) <uint32> format chunk size (usually 0x10)
See below

Format chunk

Field Type Description
audio_format uint16 0x0001 for PCM, 0x0011 for IMA ADPCM.
channels uint16 1 (mono) or 2 (stereo).
sample_rate uint32 Typically 22050 Hz (SFX) or 44100 Hz (VO).
bytes_per_sec uint32 sample_rate × block_align.
block_align uint16 Bytes per sample frame.
bits_per_sample uint16 8 or 16 for PCM.
extra_bytes Present only when fmt_size > 0x10 (e.g., ADPCM coefficients).

Data chunk

After the fmt chunk (and any optional fact chunk), the "data" chunk begins:

Field Description
"data" Chunk ID.
<uint32> Number of bytes of raw audio.
<byte[]> PCM/ADPCM sample data.

KotOR voice-over WAVs add a "fact" chunk with a 32-bit sample count, which PyKotor writes for compatibility (io_wav.py:182-186).


KotOR SFX header

  • SFX assets start with 470 bytes of obfuscated metadata (magic numbers plus filler 0x55).
  • After this header, the file resumes at the "RIFF" signature described above.
  • When exporting SFX, PyKotor recreates the header verbatim so the game recognizes the asset (io_wav.py:150-163).

(See Cross-reference implementations above for reone / KotOR.js / Kotor.NET line anchors.)


Encoding Details

  • PCM (audio_format = 0x0001): Most dialogue is 16-bit mono PCM, which streams directly through the engine mixer.
  • IMA ADPCM (audio_format = 0x0011): Some ambient SFX use compressed ADPCM frames; when present, the fmt chunk includes the extra coefficient block defined by the WAV spec.
  • KotOR requires block_align and bytes_per_sec to match the values implied by the codec; mismatched headers can crash the in-engine decoder.

External tooling such as SithCodec and SWKotOR-Audio-Encoder implement the same formats; PyKotor simply exposes the metadata so conversions stay lossless.

Community context (workflow)

  • Deadly Stream — SithCodec (Windows tool for KotOR audio header strip/add; pair with PyKotor io_wav when verifying bytes).
  • Deadly Stream — Extracting dialoguecommunity-reported layout notes (e.g. some assets under StreamWaves / StreamVoice); treat thread as workflow hints, not normative RIFF layout—verify against PyKotor / reone / KotOR.js above.

Historical context (LucasForums Archive)

TSLPatcher-era threads discuss Streamsounds / Streamwaves paths, codec issues in Miles-based workflows, and DLG-related StreamVoice layout (e.g. alienvo.2da -> StreamVoice\AVO\). Treat these as dated player/modder reports, not engine specifications—pair with SithCodec, PyKotor io_wav, and the cross-refs at the top of this page.


Cross-reference: implementations

  • Reference implementations (engines): same as Cross-reference implementations at the top of this page (PyKotor io_wav.py, reone wavreader.cpp, KotOR.js AudioFile.ts).
  • Community tooling (not normative): SithCodec — encode/decode helper; SWKotOR-Audio-Encoder — batch-friendly encoder. Prefer verifying headers against PyKotor / reone / KotOR.js when debugging engine mismatches.

With this structure, WAV assets authored in PyKotor will play identically in the base game and in the other vendor tools.

See also



⚠️ **GitHub.com Fallback** ⚠️