auto space by FST - beyondnlp/nlp GitHub Wiki

autospace by FST

  • Rouzeta๋ฅผ ์–ด๋А ์ •๋„ ์ˆ˜์ค€์œผ๋กœ ํŒŒ์•…ํ• ์ˆ˜ ์žˆ๋‹ค๋ฉด WFST๋ฅผ ์ด์šฉํ•œ ๋„์–ด์“ฐ๊ธฐ ๋ชจ๋“ˆ์„ ๊ฐœ๋ฐœํ•  ์ˆ˜ ์žˆ๋‹ค.
  • ์ด ํŽ˜์ด์ง€๋ฅผ ๊ทธ ๋ฐฉ๋ฒ•์— ๋Œ€ํ•œ ๊ฐ„๋žตํ•œ ์„ค๋ช…์„ ๋‹ด๊ณ  ์žˆ๋‹ค.

[example] ์•„๋ž˜๋Š” ์Œ์ ˆ๊ฐ„ ์ „์ด ๊ทธ๋ž˜ํ”„๋ฅผ ๊ทธ๋ž˜๊ธฐ ์œ„ํ•œ ๋ฌธ๋ฒ•์ด๋‹ค.( by lexc )

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!                     HPS by FST                    !!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

!noun : /n
!josa : /j


Multichar_Symbols   /j /n


!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

LEXICON Root
        noun ;! ์ผ๋ฐ˜๋ช…์‚ฌ



LEXICON noun !์ผ๋ฐ˜๋ช…์‚ฌ
์ด์„ฑ์นœ๊ตฌ/n  nNext;
์ด์„ฑ๋™๋ฃŒ/n  nNext;

LEXICON josa !์กฐ์‚ฌ
์„/j    jNext;
์ด/j    jNext;
๋Š”/j    jNext;
๋กœ/j    jNext;
๋งŒ/j    jNext;
์€/j    jNext;
๊ฐ€/j    jNext;
๊ณผ/j    jNext;
์™€/j    jNext;
๋„/j    jNext;
๋“ค/j    jNext;

LEXICON nNext
        josa;
        finLexicon;

LEXICON jNext
        finLexicon;
LEXICON finLexicon
     # ;

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!! End of Document !!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

[example] ์œ„ ๋ฌธ๋ฒ•์€ ์•„๋ž˜์™€ ๊ฐ™์ด ๊ทธ๋ž˜ํ”„๊ฐ€ ์ƒ์„ฑ๋œ๋‹ค.

hps1


[test] ์œ„ example์„ ๊ธฐ๋ฐ˜์œผ๋กœ ์žฌ์ž‘์„ฑํ•œ ๋ฌธ๋ฒ•์ด๋‹ค.

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!                     HPS by FST                    !!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

!noun : /n
!josa : /j


Multichar_Symbols   /js /nc /b


!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

LEXICON Root
        noun ;! ์ผ๋ฐ˜๋ช…์‚ฌ

LEXICON noun !์ผ๋ฐ˜๋ช…์‚ฌ
        ์ฃฝ์Œ/nc nNext;
        ์˜๋„/nc nNext;
        ๋„๋กœ/nc nNext;

LEXICON josa !์กฐ์‚ฌ
        ์˜/js   jNext;
        ๋กœ/js   jNext;


LEXICON nNext
        noun;
        josa;
        finLexicon;

LEXICON jNext
        noun;
        finLexicon;

LEXICON finLexicon
        # ;

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!! End of Document !!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
~

[test] ํƒœ๊ทธ๋Š” epsilon์œผ๋กœ ์น˜ํ™˜

!!!!!!!!!!!!!!!!!!!!!!!
! Read hps Lexicon !
!!!!!!!!!!!!!!!!!!!!!!!

read lexc hps.lexc
define Lexicon ;
define josa_lex [ ์ด | ๊ฐ€ ];
define josa_tag [ %/nc | %/js ];
define Filter1 %/nc -> %/w  || _ [ ์ด | ๊ฐ€ ];
define Filter2 %/nc -> 0;
define Filter3 %/js -> 0;
define Filter4 %/w -> 0;
define test Lexicon
        .o. Filter1
        .o. Filter2
        .o. Filter3
        .o. Filter4
        ;


regex test ;
invert net
att > hps.att;

[test] ์œ„ ๋ฌธ๋ฒ•์„ ์ ์šฉํ–ˆ์„ ๊ฒฝ์šฐ์˜ ๊ทธ๋ž˜ํ”„

hps2

openfst์— weight๋ฅผ ์ถ”๊ฐ€ํ•˜๋Š” ๋ฐฉ๋ฒ•

hps.png

  • ์œ„ ์˜คํ† ๋งˆํƒ€๋ฅผ ๋ณด๋ฉด '์ฃฝ์Œ/์˜/๋„๋กœ', '์ฃฝ์Œ/์˜๋„/๋กœ' ๋‘๊ฐ€์ง€ ๊ฒฝ๋กœ๊ฐ€ ๋ชจ๋‘ ์กด์žฌํ•œ๋‹ค.
  • 7๋ฒˆ ๋…ธ๋“œ์—์„œ 4๋ฒˆ ๋…ธ๋“œ(์˜์‚ฌ) ๋˜๋Š” 8๋ฒˆ ๋…ธ๋“œ(์กฐ์‚ฌ) ๋‘๊ฐ€์ง€ ์„ ํƒ์ด ๊ฐ€๋Šฅํ•˜๊ฒŒ ๋œ๋‹ค.
  • ํ˜„์žฌ๋Š” weight๋ฅผ ๋™์ผํ•˜๊ฒŒ ๋‘์—ˆ๊ธฐ ๋•Œ๋ฌธ์— '์ฃฝ์Œ/์˜๋„/๋กœ'๋กœ ๊ฐ€ ๊ฒฐ๊ณผ๋กœ ๋‚˜์˜ค๋Š”๋ฐ
  • ์˜/js์˜ weight๋ฅผ ์˜ฌ๋ฆฌ๋ฉด '์ฃฝ์Œ/์˜/๋„๋กœ'๋กœ ์„ ํƒ๊ฐ€๋Šฅํ•˜๊ฒŒ ๋œ๋‹ค.
0   1   ์˜  ์˜
0   2   ๋„  ๋„
0   3   ์ฃฝ  ์ฃฝ
3   4   ์Œ  ์Œ
4   5   @0@ /nc
5   6   ๋กœ  ๋กœ
5   7   ์˜  ์˜
5   2   ๋„  ๋„
5   3   ์ฃฝ  ์ฃฝ
7   4   ๋„  ๋„
7   8   @0@ /js
8   1   ์˜  ์˜
8   2   ๋„  ๋„
8   3   ์ฃฝ  ์ฃฝ
6   8   @0@ /js
2   4   ๋กœ  ๋กœ
1   4   ๋„  ๋„
5
8

  • ์œ„์™€ ๊ฐ™์€ at&tํฌ๋งท์„ ๋ณด๋ฉด weight๋ฅผ ์–ด๋–ป๊ฒŒ ๋ถ€์—ฌํ• ์ง€ ์ž˜ ๋ชจ๋ฅด๊ฒ ๋‹ค.
7   4   ๋„  ๋„
7   8   @0@ /js
  • 7๋ฒˆ ๋…ธ๋“œ๋Š” 4๋ฒˆ ๋…ธ๋“œ, 8๋ฒˆ ๋…ธ๋“œ ๋‘ ๊ฐ€์ง€ ๊ฒฝ๋กœ๋กœ ์ด๋™ ๊ฐ€๋Šฅํ•˜๋‹ค.
  • ๋‘๊ฐ€์ง€ ๋…ธ๋“œ ์ค‘ ํ•˜๋‚˜์— weight๋ฅผ ๋†’์ด๋ฉด ํ•ด๋‹น ๊ฒฝ๋กœ๋กœ ์ด๋™ํ•  ๊ฐ€๋Šฅ์„ฑ์ด ๋†’๋‹ค.

update weight

  • hps.lexc
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!                     HPS by FST                    !!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

Multichar_Symbols   /js /nc /vv /ncp /xsp /ma

Definitions
        NUM       = %0|1|2|3|4|5|6|7|8|9 ;
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

LEXICON Root
        Hnc ;! ์ผ๋ฐ˜๋ช…์‚ฌ
        Hvv ;! ์šฉ์–ธ๋ฅ˜
        Hncp;! ํ•˜๋‹ค์„ฑ ๋ช…์‚ฌ
        Hma;! ๋ถ€์‚ฌ

LEXICON Hnc !์ผ๋ฐ˜๋ช…์‚ฌ
        ์ฃฝ์Œ/nc nNext;
        ์˜๋„/nc nNext;
        ๋„๋กœ/nc nNext;

LEXICON Hncp !ํ•˜๋‹ค์„ฑ ๋ช…์‚ฌ
        ๊ณต๋ถ€/ncp    ncpNext;
        ๋…ธ๋ ฅ/ncp    ncpNext;
        ์‚ฌ๋ž‘/ncp    ncpNext;

LEXICON Hxsp !ํ•˜๋‹ค
        ํ•˜๋‹ค/xsp    xspNext;
        ํ•˜๊ณ /xsp    xspNext;
        ํ•˜๋ฉด/xsp    xspNext;
        ํ•ด์„œ/xsp    xspNext;


LEXICON Hma !๋ถ€์‚ฌ
        ์ž˜/ma   maNext;
        ์—„์ฒญ/ma   maNext;
        ์•„์ฃผ/ma   maNext;
        ๋งค์šฐ/ma   maNext;

LEXICON maNext
        Hma;
        Hvv;
        final;
LEXICON Hvv ! ์šฉ์–ธ๋ฅ˜
        ํ–‰๋ณตํ•œ/vv   vNext;
        ์ฆ๊ฑฐ์šด/vv   vNext;

LEXICON Hjs !์กฐ์‚ฌ
        ์˜/js   jNext;
        ๋กœ/js   jNext;

LEXICON vNext
        Hnc;
        final;

LEXICON nNext
        Hnc;
        Hjs;
        final;

LEXICON xspNext
        Hnc;
        Hjs;
        final;

LEXICON ncpNext
        Hxsp;
        Hma;
        final;

LEXICON jNext
        Hnc;
        final;

LEXICON final
        # ;

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!! End of Document !!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

  • hps.script
!!!!!!!!!!!!!!!!!!!!!!!
! Read hps Lexicon !
!!!!!!!!!!!!!!!!!!!!!!!

read lexc hps.lexc
define Lexicon ;
define Filter1 %/nc -> 0;
define Filter2 %/js -> 0;
define Filter3 %/vv -> 0;
define Filter4 %/nn -> 0;
define Filter5 %/ncp -> 0;
define Filter6 %/xsp -> 0;
define Filter7 %/ma -> 0;
define test Lexicon
        .o. Filter1
        .o. Filter2
        .o. Filter3
        .o. Filter4
        .o. Filter5
        .o. Filter6
        .o. Filter7
        ;


regex test ;
invert net
att > hps.att