USTX file format - stakira/OpenUtau GitHub Wiki
USTX is the project file format for OpenUtau. This document outlines the structure and semantics of the USTX format for developers building tools that process USTX files.
- Tick: the time unit in midi. 1 quarter note equals to 480 ticks.
- Midi number: tone representation in midi. C4 = 60. Unit is semitone
A USTX file is a YAML-based (YAML 1.2) text format in UTF-8 encoding, with the following top-level sections:
Field | Type | Description |
---|---|---|
name |
string | Project name |
ustx_version |
string | USTX format version, currently 0.6 |
resolution |
int | Ticks per quarter note, always 480 |
key |
int | Musical key of the project, used for scale degree displaying. 0 means C major or A minor |
time_signature |
list | List of time signature changes |
tempos |
list | List of tempo changes |
tracks |
list | List of tracks |
voice_parts |
list | List of voice parts (part containing notes for synthesis) |
wave_parts |
list | List of wave parts (external audio file, usually instrumental of the song) |
bpm
, beat_per_bar
and beat_unit
under the project root are deprecated. To get the tempo and time signatures, please use tempos
and time_signature
.
Defines vocal parameters (e.g., dynamics, vibrato) as curves or numerical values.
Located under expressions
as key-value pairs. Each expression has:
Field | Type | Description |
---|---|---|
name |
string | Human-readable name (e.g., "dynamics (curve)" ) |
abbr |
string | Short identifier (e.g., "dyn" ) |
type |
string |
Curve , Options , or Numerical
|
min /max
|
int | Value range |
default_value |
int | Default if not explicitly set |
is_flag |
bool | Whether the parameter is a UTAU resampler flag |
flag |
string | UTAU resampler flag symbol (e.g., "g" for gender) |
options |
list | For Options type: list of possible values (e.g., ["off", "on"] ) |
Example:
dyn:
name: dynamics (curve)
abbr: dyn
type: Curve
min: -240
max: 120
default_value: 0
-
Time Signatures (
time_signatures
):
Defines changes in time signature. Each entry includes:-
bar_position
: Bar number where the change applies. -
beat_per_bar
: Numerator (e.g.,4
for 4/4). -
beat_unit
: Denominator (e.g.,4
for 4/4).
-
-
Tempos (
tempos
):
Defines BPM changes. Each entry includes:-
position
: Tick position where the tempo change occurs. -
bpm
: BPM value.
-
Defines audio tracks under tracks
. Each track includes:
Field | Type | Description |
---|---|---|
singer |
string | Voicebank identifier |
phonemizer |
string | Phonemizer plugin |
track_name |
string | Track label |
track_color |
string | Display color (e.g., "Blue" ) |
mute /solo
|
bool | Mute/solo state |
volume |
float | Track volume (dB) |
voice_color_names |
list | List of available voice colors in the voicebank |
Contains musical notes and phrases under voice_parts
. Each part includes:
Field | Type | Description |
---|---|---|
duration |
int | Total duration in ticks |
name |
string | Name of the part |
track_no |
int | Index of the track that the part belongs to |
position |
int | Starting tick position relative to the project |
notes |
list | List of notes (see below) |
curves |
list | Curve expressions (see below) |
Each note under notes
has:
Field | Type | Description |
---|---|---|
position |
int | Start tick relative to the starting position voice part |
duration |
int | Note length in ticks |
tone |
int | MIDI note number (e.g., 60 = C4) |
lyric |
string | Lyric text (e.g., "san" ) |
pitch |
object | Pitch control points |
vibrato |
object | Vibrato parameters (depth, period, fade-in/out) |
phoneme_expressions |
list | Numerical and option expressions edited per phoneme by the user |
phoneme_overrides |
list | Phoneme name and position (tick offset relative to the original phonemizer result) edited by the user |
Example:
- position: 1920
duration: 720
tone: 70
lyric: san
pitch:
data:
- {x: -274.03845, y: 0, shape: io}
- {x: -158.65384, y: -32.46212, shape: io}
- {x: 40, y: 0, shape: io}
snap_first: true
vibrato: {length: 0, period: 175, depth: 25, in: 10, out: 10, shift: 0, drift: 0, vol_link: 0}
phoneme_expressions:
- {index: 0, abbr: clr, value: 1}
- {index: 1, abbr: clr, value: 2}
phoneme_overrides:
- index: 1
phoneme: a
- index: 0
offset: 65
Defines automation curves (e.g., pitch, dynamics) under curves
.
Field | Type | Description |
---|---|---|
xs |
List | Tick position of each sample point in the curve |
ys |
List | Value at each sample point |
abbr |
string | Expression abbr |
External audio files imported to the project, under wave_parts
Field | Type | Description |
---|---|---|
name |
string | Name of the part |
track_no |
int | Index of the track that the part belongs to |
position |
int | Starting tick position relative to the project |
relative_path |
string | Path to the audio file relative to the .ustx file, absolute path if on another drive |
file_duration_ms |
float | Duration of the audio file in milliseconds |
For python developers, please use ruamel.yaml
when working with .ustx files, because it supports yaml 1.2 . Don't use pyyaml
.
The reading and writing of USTX files is officially implemented in OpenUtau.Core/Ustx
Below are third-party implementations of USTX reading and writing:
- Libresvip, in python
- Utaformatix, in kotlin