.seg creation - LuciozUndiezz/VocaloidDBTool-SAK GitHub Wiki

.seg file creation!

.seg files are like an oto.ini file (for vocaloid, there's also as0 as1 as2 etc. files but I'll touch on those later). They are the configurations needed to segment your vocaloid voicebank audio.

The first step to getting .seg files (without the torture of doing them all by hand) is converting your oto.ini to lab.

lab creation

You would use Genon2DB (or Genon2NNSVS whatever it's called I made a mod for it I get to call it Genon2DB shh)

You'd usually use what I linked above, however, the labs made from that are full context labs, which look like this:

0 550000 p@xx^xx-pau+aHP=a1HP_xx%xx^xx_xxxx-1!1[xx$xx]xx/A:xx-xx-xx@xxxx/B:xx_xx_xx@xx|xx/C:1+1+1@xx&xx/D:xx!xx#xx$xx%xx|xx&xx;xx-xx/E:xx]xx^xx=xx100!1@8#3+xx]xx$xx|xx[xx&xx]xx=xx^xxxx#xx_xx;xx$xx&xx%xx[xx|xx]xx-xx^xx+xxxx=xx@xx$xx!xx%xx#xx|xx|xx-xx&xx&xx+xx[xx;xx]xx;xxxxxx^xx^xx@xx[xx#xx=xx!xxxx+xx!xx^xx/F:B3#xx#xx-xx$100$1+60%24;xx/G:xx_xx/H:xx_xx/I:xx_xx/J:xx~xx@1

550000 550000 c@xx^pau-aHP+a1HP=あHP_xx%xx^xx_xxxx-1!1[xx$xx]xx/A:xx-xx-xx@xxxx/B:1_1_1@xx|xx/C:1+1+1@xx&xx/D:xx!xx#xx$xx%100|1&8;3-xx/E:B3]xx^xx=xx100!1@60#24+xx]xx$xx|xx[xx&xx]xx=xx^xx1#20_0;110$0&441%0[100|xx]xx-xx^xx+xxxx=xx@xx$xx!xx%xx#xx|xx|xx-xx&xx&xx+xx[xx;xx]xx;xxxxxx^xx^xx@xx[xx#xx=xx!xxxx+p2!xx^xx/F:Db4#xx#xx-xx$100$1+60%24;xx/G:xx_xx/H:xx_xx/I:xx_xx/J:xx~xx@1

550000 550000 c@pau^aHP-a1HP+あHP=aHP_xx%xx^xx_xxxx-1!1[xx$xx]xx/A:1-1-1@xxxx/B:1_1_1@xx|xx/C:1+1+1@xx&xx/D:B3!xx#xx$xx%100|1&60;24-xx/E:Db4]xx^xx=xx100!1@60#24+xx]xx$xx|xx[xx&xx]xx=xx^xx2#19_6;104$24&417%5[95|xx]xx-xx^xx+xxxx=xx@xx$xx!xx%xx#xx|xx|xx-xx&xx&xx+xx[xx;xx]xx;xxxxxx^xx^xx@xx[xx#xx=xx!xxm2+m2!xx^xx/F:B3#xx#xx-xx$100$1+60%24;xx/G:xx_xx/H:xx_xx/I:xx_xx/J:xx~xx@1

550000 3950000 c@aHP^a1HP-あHP+aHP=a1HP_xx%xx^xx_xxxx-1!1[xx$xx]xx/A:1-1-1@xxxx/B:1_1_1@xx|xx/C:1+1+1@xx&xx/D:Db4!xx#xx$xx%100|1&60;24-xx/E:B3]xx^xx=xx100!1@60#24+xx]xx$xx|xx[xx&xx]xx=xx^xx3#18_12;98$48&393%11[89|xx]xx-xx^xx+xxxx=xx@xx$xx!xx%xx#xx|xx|xx-xx&xx&xx+xx[xx;xx]xx;xxxxxx^xx^xx@xx[xx#xx=xx!xxp2+p2!xx^xx/F:Db4#xx#xx-xx$100$1+60%24;xx/G:xx_xx/H:xx_xx/I:xx_xx/J:xx~xx@1

3950000 3950000 c@a1HP^あHP-aHP+a1HP=あHP_xx%xx^xx_xxxx-1!1[xx$xx]xx/A:1-1-1@xxxx/B:1_1_1@xx|xx/C:1+1+1@xx&xx/D:B3!xx#xx$xx%100|1&60;24-xx/E:Db4]xx^xx=xx100!1@60#24+xx]xx$xx|xx[xx&xx]xx=xx^xx4#17_18;92$72&369%16[84|xx]xx-xx^xx+xxxx=xx@xx$xx!xx%xx#xx|xx|xx-xx&xx&xx+xx[xx;xx]xx;xxxxxx^xx^xx@xx[xx#xx=xx!xxm2+m2!xx^xx/F:B3#xx#xx-xx$100$1+60%24;xx/G:xx_xx/H:xx_xx/I:xx_xx/J:xx~xx@1

This is wrong. With my mod, it makes labs that look like this:

2960000 5160000 a

5160000 7660000 i

7660000 11310000 a

11310000 11810000 i

11810000 15000000 pau

This is correct! None of that garbage text to tell what pitch is what and all of that other junk

Your oto should be PHONEMES ONLY, aka done by moresampler in phonemes only mode

An oto should look like this for optimal conversion:

_ああいあうえあ.wav=- a,578,248,3202,160,80

_ああいあうえあ.wav=a a,1018,260,2702,160,80

_ああいあうえあ.wav=a i,1518,260,2152,160,80

_ああいあうえあ.wav=i a,2068,254,1682,160,80

_ああいあうえあ.wav=a u,2538,255,1207,160,80

_ああいあうえあ.wav=u e,3013,255,732,160,80

_ああいあうえあ.wav=e a,3488,260,147,160,80

_ああいあうえあ.wav=a -,4083,167,12,160,80

_いいうあえいえ.wav=- i,558,251,3052,160,80

_いいうあえいえ.wav=i i,1013,260,2542,160,80

_いいうあえいえ.wav=i u,1523,258,2052,160,80

_いいうあえいえ.wav=u a,2013,258,1562,160,80

_いいうあえいえ.wav=a e,2503,260,1032,160,80

_いいうあえいえ.wav=e i,3033,260,502,160,80

_いいうあえいえ.wav=i e,3563,235,127,160,80

_いいうあえいえ.wav=e -,3953,162,4,160,80

okok so Step one open the GUI, click on oto -> seg tools, and click on Run Genon2DB (or open cannedbread_genon2db_GUI.py by itself if you want)

You should see this:

image

To select your oto.ini file, click browse next to original configuration file, and select your oto.ini

To select a table file, download a table file from intunist's github https://github.com/intunist (see pinned repositories)

Download a table file from image

image

I use their English (blank.table) and their Japanese (intunist_jp_compatibility.table)

Once you have one downloaded, select it using the browse button next to table file

Select an output directory (my mod allows you to select one, the old one just puts it to a folder called data)

Put in the tempo of the recordings (needed for UST creation)

Put in auto for auto estimation of the initial pause length

Put the recorded note in the note box (use things like C4 G#5 etc.)

Uta vcv mode is a mode that makes vcv labs into cv labs, usually you don't have to use it. The labs generate just fine in either mode. image

Once it's done you should see this:

image

image

The only thing you need is the lab folder, you can delete the wav and the ust folder (unless you're making an NNSVS or Diffsinger voicebank, but that's not what we're doing here).

lab to seg conversion

Take your labs, open the GUI and click on oto -> seg tools and click on Run lab2seg (or open lab2seg_GUI.py by itself if you'd like)

you should see this:

image

Select the lab directory using the browse button

select the output directory using the browse button

press convert

You should see a window pop up saying this

image

Everything in your .seg(s) should look like this

nPhonemes 8

articulationsAreStationaries 0

phoneme BeginTime EndTime

=================================================

a 0.738000 1.178000

a 1.178000 1.678000

i 1.678000 2.228000

a 2.228000 2.698000

u 2.698000 3.173000

e 3.173000 3.648000

a 3.648000 4.243000

Sil 4.243000 4.269814

this is technically wrong, since the phonemes aren't correct, but it's an example

You can go back to the home page to find the Sil adder script description and details, and where to go from there.