Usage

Precautions

If you run into a bug, feel free to provide feedback. It's suggested to provide a full snapshot of your OpenUtau window, your ustx file and OpenUtau log file.

Developer notices

Please DON'T create pull requests towards this repo. This repo is for building and releasing only. If you want to add new features to OpenUtau for Diffsinger, please fork and create pull requests towards stakira/OpenUtau.

Download OpenUtau for Diffsinger.

Windows: Download OpenUtau-win-<version>-DiffsingerPack.zip (which is bundled with vocoder). Extract and run OpenUtau.exe.
MacOS: Download OpenUtau-osx-<version>.zip , launch OpenUtau, and drag vocoder into OpenUtau window to install.

Download a voicebank. Drag it into OpenUtau to install.
Choose the singer you installed.
Input lyrics. Use + for slur notes. Use AP for breath.

See OpenUtau documentation for detailed guide.

Expressions

Here are the expressions supported by DiffSinger:

PITD (pitch curve)
DYN (volume curve)
GENC (gender curve, need voicebank's support. See here for details. The default range -100~+100 equals to shifting the formant by +12~-12 semitones.)
VELC (velocity curve, need voicebank's support. See here for details. This expression will affect the speed of the head and tail of the vowels. Every increase of 100 in this expression will multiply the speed by 2.)

VELC is a custom expression defined by DiffSinger and isn't included in new projects. To add this expression into your project, click “Expreeions → Add all expressions suggested by renderers”

You can also adjust the range of each expression in this menu.

Notes

Pre-render is on by default. If your OpenUtau lags, you can turn pre-render off in preferences.
Preferences related to Diffsinger:
- Diffsinger Render steps is 20 by default. A higher steps may improve the quality of audio, but slow down the rendering.
- If you use Windows and have a discrete graphics card, you can use DirectML to make rendering faster. Please set "Machine Learning Runner" to "directml", choose your discrete graphics card in "GPU" menu, and restart your OpenUtau.

Phonemizers

DiffSinger phonemizers startswith "DIFFS" and are located in their corresponding language category. There are diffsinger phonemizers for various languages. For example, DIFFS ZH is for Chinese, and DIFFS EN is for English.

A DiffSinger voicebank may support one or more languages. In most cases you don't need to manually choose a phonemizer. OpenUtau will choose the phonemizer suitable for your voicebank automatically. You can view the voicebank's readme.txt, website or ask the developer of the voicebank to know which languages it supports.

See here for details.

Old Phonemizers

A few old voicebanks may use one of these phonemizers below. Most of the time you don't need to use them.

DIFFS RHY located in ZH category. It is based on the Diffsinger rhythmizer timing model which produces better result. Before using, please download the model and drag it into OpenUtau to install.
ENUNU X is located in General category. It is based on the NNSVS timing model. It supports any language supported by ENUNU/NNSVS. Usage
ENUNU X EN is located in EN category. It is based on the NNSVS timing model. It supports English voicebanks using CMUDict and is compatible with the EN VCCV phonemizer. Usage

FAQ

What is the relationship between DiffSinger and Diff-SVC? Can I use Diff-SVC models on DiffSinger?

Though they have similar names, DiffSinger has nothing to do with Diff-SVC. DiffSinger is a singing voice synthesis (SVC) software that takes sheet music input and produce singing voice. Diff-SVC is a vocal changer. Both their names start with "diff" because they are based on diffusion model.

DiffSinger doesn't support Diff-SVC models. You can train a DiffSinger model separately with the original audio data.

How is the future of thie project?

The official build of OpenUtau supports diffsinger now. This repo will continue to exist to provide "DiffsingerPack" build. New features will also be tested here.