Rouzeta - beyondnlp/nlp GitHub Wiki

Rouzeta

  • Rouzeta๋Š” ์ด์ƒํ˜ธ ๋ฐ•์‚ฌ๊ฐ€ ์ตœ๊ทผ์— ๊ณต๊ฐœํ•œ WFST๋ฅผ ๊ธฐ๋ฐ˜ ํ˜•ํƒœ์†Œ ๋ถ„์„๊ธฐ์˜ ํ”„๋กœ์ ํŠธ๋ช…์ด๋‹ค.
  • Rouzeta๋Š” ๋‹จ์ˆœํžˆ ์‚ฌ์šฉํ•˜๊ธฐ ์œ„ํ•ด์„œ๋„ ์•Œ์•„์•ผ ํ•  ์—ฌ๋Ÿฌ๊ฐ€์ง€ ์ •๋ณด๋“ค์ด ์žˆ์ง€๋งŒ ์ด๋ฅผ ๋ถ„์„ํ•˜๋Š” ๊ฒƒ์€ ๋‹ค๋ฅธ ์ฐจ์›์˜ ์ž‘์—…์ด๋‹ค.
  • Rouzeta๋ฅผ ๋‹ค์šด ๋ฐ›์•„ ์••์ถ•์„ ํ’€๋ฉด Rouzeta์™€ Tagger๋””๋ ‰ํ† ๋ฆฌ๊ฐ€ ์žˆ๊ณ  Rouzeta๋””๋ ‰ํ† ๋ฆฌ์— ๋‹ค์Œ 4๊ฐœ์˜ ํŒŒ์ผ์ด ์กด์žฌํ•œ๋‹ค.
 korean.lexc      : ํ˜•ํƒœ์†Œ ๊ฐ„์˜ ์ „์ด๋ฅผ ์˜คํ† ๋งˆํƒ€๋กœ ํ‘œํ˜„, ๊ธฐ๋ณธ์ ์œผ๋กœ ๋ชจ๋“  ํ˜•ํƒœ์†Œ๊ฐ€ ๋“ฑ์žฌ๋ผ ์žˆ๊ณ  ํ’ˆ์‚ฌ๊ฐ„ ์ „์ด๋„ ๊ธฐ์ˆ ๋ผ ์žˆ๋‹ค.
 morphrules.foma  : ๊ตฌ์ถ•๋œ ํ˜•ํƒœ์†Œ ์˜คํ† ๋งˆํƒ€ ์ค‘ ๋ถˆํ•„์š”ํ•œ edga๋ฅผ ์ œ๊ฑฐํ•˜๊ธฐ ์œ„ํ•ด ๊ธฐ์ˆ ๋œ ๊ทœ์น™ ํŒŒ์ผ
 splithangul.foma : ์™„์„ฑํ˜•์„ ์กฐํ•ฉํ˜•์œผ๋กœ ๋ฐ”๊พธ๊ธฐ ์œ„ํ•œ ๊ทœ์น™ ํŒŒ์ผ
 kormoran.script  : ์ตœ์ข… ์˜คํ† ๋งˆํƒ€๋ฅผ ๋งŒ๋“ค๊ธฐ ์œ„ํ•œ ์Šคํฌ๋ฆฝํŠธ ํŒŒ์ผ
  • ์œ„ ํŒŒ์ผ๋“ค์„ ๋ถ„์„ํ•˜๊ธฐ ์œ„ํ•ด์„œ๋Š” ํ˜•ํƒœ์†Œ ๋ถ„์„์— ๋Œ€ํ•œ ๊ธฐ๋ณธ ๊ธฐ์‹๊ณผ ํ•จ๊ป˜ lexc, foma, xfst์— ๋Œ€ํ•œ ์ง€์‹๋„ ํ•„์š”ํ•˜๋‹ค.
  • ์•„๋ž˜ ์ •๋ฆฌ๋œ ๋‚ด์šฉ์€ ๊ทธ๋•Œ๊ทธ๋•Œ ํ•„์š”ํ•œ ๋‚ด์šฉ์„ ์ˆœ์„œ์—†์ด ๋‚˜์—ดํ•œ ๊ฒƒ์ด๋‹ค.

  • Rouzeta Home
    • https://shleekr.github.io/
    • KSC5601 ํ•œ๊ธ€๋งŒ ์ฒ˜๋ฆฌ ๊ฐ€๋Šฅ( splithangul.foma(2352์ž) = ksc5601 + ๋ ›,์•ด )
    • Rouzeta ์‚ฌ์ „์— ๋“ฑ์žฌ๋œ ์—”ํŠธ๋ฆฌ๋Š” ๋ชจ๋‘ ksc5601(2350)๋‚ด์— ์žˆ๋Š” ์Œ์ ˆ๋กœ ๊ตฌ์„ฑ๋ผ ์žˆ๋‹ค.
    • ์•„๋ž˜ ๊ฒฐ๊ณผ๋Š” ๋ถˆ๊ทœ์น™์— ๋Œ€ํ•œ ๊ฒƒ๊ณผ, ksc5601์„ ๋„˜์–ด๊ฐ€๋Š” ๊ธ€์ž์— ๋Œ€ํ•œ ์ฒ˜๋ฆฌ๊ฒฐ๊ณผ์ž…๋‹ˆ๋‹ค.

apply up> ๊ณ ๋งˆ์› ๋‹ค ๊ณ ๋ง™/irrb/vj์—ˆ/ep๋‹ค/ef ๊ณ ๋ง™/irrb/vj์—ˆ/ep๋‹ค/ex apply up> ๋˜ ๋ฐฉ๊ฐํ•˜ #ksc5601์„ ๋„˜์–ด๊ฐ€๋Š” ๊ฒƒ์€ ???๋กœ ๋‚˜ํƒ€๋‚œ๋‹ค. ??? apply up> ํŽฒ์‹œ ??? apply up> ๊ธฐ์šค๊นŒ๋„ค ๊ธฐ์šค/nr๊นŒ๋„ค/nc ๊ธฐ์šค/nr๊นŒ/nc๋„ค/nc ๊ธฐ์šค/nr๊นŒ/nc๋„ค/np ๊ธฐ์šค/nr๊นŒ/nc๋„ค/xn ๊ธฐ์šค/nr๊นŒ/nc๋„ค/dn ๊ธฐ์šค/nr๊นŒ/nc์ด/pp๋„ค/ef ๊ธฐ์šค/nr๊นŒ/nc์ด/pp๋„ค/ex apply up> ๋ท€ ???

# Background knowledge

* foma download url 
* http://slideplayer.com/slide/11006062/
* http://foma.googlecode.com/

* foma notation

[ ] grouping ? any symbol ?* any sequence a a single symbol \a any symbol except a \C any symbol except a consonant, C presumably defined with "define" .#. word edge in rule contexts [a|b] a or b [C|.#.] a consonant or word edge a* any number of a symbols (a) optionally a .o. compose

* http://foma.sourceforge.net/lrec2010/lrec2010handout.pdf
* http://udel.edu/~heinz/classes/2015/608/materials/foma/foma.pdf
* https://www.cs.jhu.edu/~jason/465/PDFSlides/lect17-fsmbuild.pdf 
* http://foma.sourceforge.net/dokuwiki/doku.php?id=wiki:interfacereference
* http://slideplayer.com/slide/10878857/
* FAQ: When to use lexc vs. xfst?
* lexc๋Š” union ์—ฐ์‚ฐ์ž์— ๋Œ€ํ•ด ์ตœ์ ํ™” ๋ผ ์žˆ์–ด์„œ ๋งŽ์€ ์ˆ˜๋ฅผ unionํ•  ๋•Œ ์‚ฌ์šฉ์‹œ ๋น ๋ฅด๋‹ค.

* explain Rouzeta 
* https://github.com/dsindex/rouzeta

##  Kyoto Fst Decoder
* ์˜คํ† ๋งˆํƒ€๋ฅผ ๊ตฌ์„ฑํ•œํ›„ ์ž…๋ ฅ ๋ฌธ์žฅ๊ณผ ์ปดํฌ์ง€์…˜ํ•œ ํ›„ minimum distance๋ฅผ ๊ตฌํ•˜๋Š”๋ฐ kyfd๋ฅผ ์‚ฌ์šฉํ•˜๋‹ค.
* kyfd : http://www.phontron.com/kyfd/
* kyfd๋ฅผ ์‚ฌ์šฉํ•˜๊ธฐ ์œ„ํ•ด์„œ๋Š”  OpenFst ์™€ Xerces-C++๊ฐ€ ์„ค์น˜๋ผ ์žˆ์–ด์•ผ ํ•œ๋‹ค.
* kyfd์— ๋Œ€ํ•œ ์ž์„ธํ•œ ์„ค๋ช…์€ ์ดํ›„์— ๋‹ค์‹œ ์ž‘์„ฑ ์˜ˆ์ •

## Standard:์ •๊ทœํ‘œํ˜„์‹์„ ์ƒ๊ฐํ•˜๋ฉด ์ดํ•ดํ•˜๊ธฐ ์‰ฝ๋‹ค.
```
A B    Concatenation
A | B  Union
A & B  Intersection
A*     Kleene star
A+     Kleene plus
$A     โ€œContainsโ€ a string from A
A-B    Subtraction
~A     Complement of A
A.r    Reverse of A
(A)    Optionally A (same as A | 0)
```

#Transducer-related:
```
A:B      Cross-product of A and B
A .o. B  Composition of A and B           # <--- ๊ฐ€์žฅ ๋งŽ์ด ์‚ฌ์šฉํ•˜๋Š” ๊ธฐ๋Šฅ
A.i      Invert A                         # <--- ์ ์šฉ๋ฐฉํ–ฅ์„ ๋ฐ˜๋Œ€๋กœ
A.u      Extract upper side (domain) of A
A.l      Extract lower side (range) of A
A .P. B  Priority union of A and B
```

## Rewrite operations:
```
A -> B                Rewrite strings in A as B
A (->) B              Optionally rewrite A as B
A -> B || C _ D       Conditional rewrite of A as B (between C and D)
[..] -> B || C _ D    Insert a single B between C and D
A -> B , C -> D ,...  Multiple simultaneous rewrites (w/ or w/o contexts)
A -> B ... C          Markup: insert B before and C after A (w/ or w/o contexts)
```
## https://code.google.com/archive/p/foma/wikis/RegularExpressionReference.wiki
## etc
```
optional replacement (->)
longest-leftmost @->
shortest-leftmost @>
```

## Special symbols:
```
0 or []   Epsilon (the empty string)
?         The โ€œanyโ€ symbol
.#.       Word boundary in rewrite rules
[ and ]   Grouping symbols for forcing precedence
โ€œ โ€œ       Reserved symbols need to be escaped by quotes
```

## example
## Rouzeta ์ตœ์ƒ์œ„ ROOT ์„ ์–ธ
* ROOT๋Š” ๋‹ค์Œ์˜ vertex๋ฅผ ๊ฐ€์ง„๋‹ค
```
LEXICON Root
     ncLexicon ; ! ๋ณดํ†ต๋ช…์‚ฌ
     nbLexicon ; ! ์ˆซ์ž
     nrLexicon ; ! ๊ณ ์œ ๋ช…์‚ฌ
     ...
```
* acLexicon  == ์ ‘์†๋ถ€์‚ฌ
* ๊ณ ๋กœ/ac acNext
* ๊ณ ๋กœ/ac๋Š” acNext์™€ edge๋ฅผ ๊ฐ€์ง€๋Š”๋ฐ acNext๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์€ ๊ฒƒ๋“ค๋กœ ์ •์˜ํ•œ๋‹ค.


```
LEXICON acLexicon ! ์ ‘์†๋ถ€์‚ฌ
๊ณ ๋กœ/ac acNext ;
๊ณง/ac   acNext ;
๊ทธ๋‚˜/ac acNext ;
...

LEXICON acNext
   finLexicon ;
   srLexicon ;
   ...
LEXICON srLexicon ! ๋‹ซ๋Š”๋”ฐ์˜ดํ‘œ
%"/sr   srNext ;
%'/sr   srNext ;
%)/sr   srNext ;
%>/sr   srNext ;
%]/sr   srNext ;
%}/sr   srNext ;
โ€™/sr   srNext ;
โ€/sr   srNext ;
โ‰ซ/sr   srNext ;
ใ€‰/sr   srNext ;
ใ€‹/sr   srNext ;
ใ€/sr   srNext ;
ใ€/sr   srNext ;
ใ€‘/sr   srNext ;
ใ€•/sr   srNext ;
๏ผ‚/sr   srNext ;
๏ผž/sr   srNext ;


LEXICON xvLexicon ! ๋™์‚ฌํŒŒ์ƒ์ ‘๋ฏธ์‚ฌ
๊ฑฐ๋ฆฌ/xv xvNext ;
๋‹นํ•˜/xv xvNext ;
๋‹นํ—ˆ/xv xvNext ;
...

LEXICON naNext
   xvLexicon ;

LEXICON naLexicon ! ๋™์ž‘์„ฑ๋ณดํ†ต๋ช…์‚ฌ
๊ฐ€๊ฐ€๋Œ€์†Œ/na naNext ;
๊ฐ€๊ฐ/na naNext ;
๊ฐ€๊ฒฉ/na naNext ;
๊ฐ€๊ฒฉ์ธํ•˜/na naNext ;
...
```
* ๊ฐ€๊ฒฉ๋‹นํ•˜๋‹ค => ๊ฐ€๊ฒฉ + ๋‹นํ•˜๋‹ค => naLexicon -> naNext -> xvLexicon


## korean.lexc : ํ˜•ํƒœ์†Œ(vertex)๋“ค ๊ฐ„์˜ ๊ด€๊ณ„(edge)๋ฅผ ๊ธฐ์ˆ 

## morphrules.foma
* NounStringSet 
  * define NounStringSet [ NounSet | %/xn ] ;
  * define NounSet     [ %/na | %/nc | %/nd | %/ni | %/nm | %/nn | %/np | %/nr | %/ns | %/nu ] ;
* FilterPT0์— ๋Œ€ํ•œ ์ •์˜
* '์‚ฌ๊ณผ'์™€ ๊ฐ™์€ ๋ฌด์ข…์„ฑ ๋ช…์‚ฌ์™€ '์€' ์กฐ์‚ฌ์˜ edge๋ฅผ ์ œ๊ฑฐ
* ๋ฌด์ข…์„ฑ + ๋ช…์‚ฌํƒœ๊ทธ + ์€/pt <-๋ฅผ ์ œ๊ฑฐ
```
! ์€/๋Š”
! Filter0 : ์‚ฌ๊ณผ๋Š”
define FilterPT0 ~$[ FILLC NounStringSet ใ…‡ ใ…ก %_ใ„ด %/pt ] ;
```
* FilterPT1์— ๋Œ€ํ•œ ์ •์˜
* '์‚ฌ๋žŒ'๊ณผ ๊ฐ™์€ ์œ ์ข…์„ฑ ๋ช…์‚ฌ์™€ '๋Š”' ์กฐ์‚ฌ์˜ edgae๋ฅผ ์ œ๊ฑฐ
* ์œ ์ข…์„ฑ + ๋ช…์‚ฌํƒœ๊ทธ + ๋Š”/pt <- ๋ฅผ ์ œ๊ฑฐ 
```
! Filter1 : ์‚ฌ๋žŒ์€
define FilterPT1 ~$[ [Coda - FILLC] NounStringSet ใ„ด ใ…ก %_ใ„ด %/pt ] ;

```



## splithangul.foma : ์ž์†Œ๋ถ„๋ฆฌ
![split1](https://github.com/beyondnlp/nlp/raw/master/split1)
```
define split        ๊ฐ€ -> ใ„ฑ ใ… %_%_ .o.
                  ๊ฐ -> ใ„ฑ ใ… %_ใ„ฑ .o.
                  ๊ฐ„ -> ใ„ฑ ใ… %_ใ„ด .o.
                  ๊ฐ‡ -> ใ„ฑ ใ… %_ใ„ท ;

regex split;
apply down> ๊ฐ€
ใ„ฑใ…__
apply down> ๋‚˜
๋‚˜
apply down> ๋‹ค
๋‹ค
apply down> ๊ฐ
ใ„ฑใ…_ใ„ฑ
apply down> ๊ฐ„
ใ„ฑใ…_ใ„ด
```


## ์–ด๋ฏธํ‘œํ˜„
```
%_ใ…‡๊ฒŒ/ec   ecNext ;
%_ใ…‡๊นจ/ec   ecNext ;
%_ใ…‡๊ป˜/ec   ecNext ;
๊ฐ€/ec   ecNext ;
๊ฐ€๊ฐ€/ec ecNext ;
```

* %๊ฐ€ ๋ถ™์€ ๊ฒƒ์€ ์Œ์ ˆ์ด ์™„์„ฑ๋˜์ง€ ์•Š์€ ์ข…์„ฑ์— ๋ถ™์ธ๋‹ค.


## ํ™•์ธ ํ•„์š”1
 * '<' ์˜ ์˜๋ฏธ
```
LEXICON neLexicon ! ์˜์–ด
< Alphabet+ %/ne > neNext ;

```

## FILLC
  * %_%_์˜๋ฏธ
  * ์ข…์„ฑ์ด ์—†์Œ์„ ํ‘œ์‹œํ•˜๊ธฐ ์œ„ํ•œ ๊ฒƒ์œผ๋กœ ๋ณด์ž„(ex>๋ฐ”๋ณด : '๋ณด'์— ์ข…์„ฑ์ด ์—†์Œ)
```
# morphrules.foma
define FILLC   %_%_   ; ! No-coda

```

## ์ข…์„ฑ ์œ ๋ฌด์— ๋”ฐ๋ผ ์กฐ์‚ฌ๊ฐ€ ๋‹ฌ๋ผ์ง€๋Š” ๊ฒฝ์šฐ์˜ ์ฒ˜๋ฆฌ
* '์‚ฌ๊ณผ'๋Š” ์ข…์„ฑ์ด ์—†๊ธฐ ๋•Œ๋ฌธ์— '๊ฐ€'๋งŒ ํ—ˆ์šฉํ•œ๋‹ค.( ์ฆ‰ '์ด'๊ฐ€ ๋ถ™์„ ์ˆ˜ ์—†๋‹ค. )
* '์‚ฌ๋žŒ'์€ ์ข…์„ฑ์ด ์žˆ๊ธฐ ๋•Œ๋ฌธ์— '์ด'๋งŒ ํ—ˆ์šฉํ•œ๋‹ค.( ์ฆ‰ '๊ฐ€'๊ฐ€ ๋ถ™์„ ์ˆ˜ ์—†๋‹ค. )
* FILLC NounStringSet๋Š” ์ข…์„ฑ์ด ์—†๋Š” ๋ช…์‚ฌ์— ๋ถ™๋Š” ๋ช…์‚ฌํƒœ๊ทธ๋ฅผ ์˜๋ฏธํ•œ๋‹ค( ex> ์‚ฌ๊ณผ )
* [Code - FILLC] NounStringSet๋Š” ์ข…์„ฑ์ด ์žˆ๋Š” ๋ช…์„ธ์— ๋ถ™๋Š” ๋ช…์‚ฌํƒœ๊ทธ๋ฅผ ์˜๋ฏธํ•œ๋‹ค( ex> ์‚ฌ๋žŒ )
  * /ps  <- ์ฃผ์ œ๊ฒฉ ์กฐ์‚ฌ ํƒœ๊ทธ
```
! ์ด/๊ฐ€
! FilterPS0 : ์‚ฌ๊ณผ๊ฐ€
define FilterPS0 ~$[ FILLC NounStringSet ใ…‡ ใ…ฃ FILLC %/ps ] ;

! FilterPS1 : ์‚ฌ๋žŒ์ด
define FilterPS1 ~$[ [Coda - FILLC] NounStringSet ใ„ฑ ใ… FILLC %/ps ] ;

```

## apply up && apply down

```
foma[0]: define test ๊ฐ€ ๋‚˜ -> ๋‹ค ๋ผ;
defined test: 578 bytes. 3 states, 12 arcs, Cyclic.
foma[0]: regex test;
578 bytes. 3 states, 12 arcs, Cyclic.
foma[1]: up
apply up> ๊ฐ€๋‚˜
???
apply up> ๋‹ค๋ผ
๋‹ค๋ผ
...
foma[1]: down
apply down> ๊ฐ€๋‚˜
๋‹ค๋ผ
apply down> ๊ฐ€ ๋‚˜
๊ฐ€ ๋‚˜

```
![test8](https://github.com/beyondnlp/nlp/raw/master/test8)


## test1
```
foma[0]: define A %_ใ„ท -> %_ใ„น || _ %/irrd %/vb ใ…‡ ;
defined A: 898 bytes. 7 states, 30 arcs, Cyclic.
foma[0]: regex A ;
898 bytes. 7 states, 30 arcs, Cyclic.
foma[1]: view
```
![test1](https://github.com/beyondnlp/nlp/raw/master/test1)

## test2
```
foma[0]: define A [ํ•˜๋‚˜:one] ;
defined A: 235 bytes. 2 states, 1 arc, 1 path.
foma[0]: define B [one:ใ„ใก] ;
defined B: 235 bytes. 2 states, 1 arc, 1 path.
foma[0]: define C [ใ„ใก:ไธ€] ;
defined C: 235 bytes. 2 states, 1 arc, 1 path.
foma[1]: regex A .o. B .o. C ;
294 bytes. 2 states, 1 arc, 1 path.
foma[1]: view
```
* ![test3](https://github.com/beyondnlp/nlp/raw/master/test3) .o.
* ![test5](https://github.com/beyondnlp/nlp/raw/master/test5) .o. 
* ![test6](https://github.com/beyondnlp/nlp/raw/master/test6)

![test2](https://github.com/beyondnlp/nlp/raw/master/test2)

## notation
* define A a -> b || c _ d ;
* define ์‹ฌ๋ณผ๋ช… ํ˜„์žฌ์ƒํƒœ -> ๋‹ค์Œ์ƒํƒœ || ์ด์ „์กฐ๊ฑด_๋‹ค์Œ์กฐ๊ฑด
* m+1	๋‹ค/EC	๋‹ค/EF	SF	207832	4444
```
foma[0]: define test  ๋‹ค%/EC -> ๋‹ค%/EF || _@%/SF;
defined test: 524 bytes. 3 states, 10 arcs, Cyclic.
foma[0]: regex test;
524 bytes. 3 states, 10 arcs, Cyclic.

```

![test4](https://github.com/beyondnlp/nlp/raw/master/test4)

## test3
foma> read lexc test3.lexc;

foma> view;
```
Multichar_Symbols   /ac /ad /ai /am         ! ๋ถ€์‚ฌ
                  /di /dm /dn /du         ! ๊ด€ํ˜•์‚ฌ
                  /ec /ed /ef /en /ep /ex ! ์–ด๋ฏธ
                  /it                     ! ๊ฐํƒ„์‚ฌ
                  /na /nc /nd /ni /nm /nn /np /nr /ns /nu /nb ! ์ฒด์–ธ
                  /pa /pc /pd /po /pp /ps /pt /pv /px /pq /pm ! ์กฐ์‚ฌ
                  /vb /vi /vj /vx /vn     ! ์šฉ์–ธ
                  /xa /xj /xn /xv         ! ์ ‘์‚ฌ
                  /sc /se /sf /sl /sr /sd /su /sy /so ! ์‹ฌ๋ฒŒ
                  /nh /ne /un ! ํ•œ์ž (cHinese, English)
                  /irrL /irrb /irrd /irrh /irrl /irrs /irru ! ๋ถˆ๊ทœ์น™ ์ฝ”๋“œํ‘œ
                  %_ใ„ด %_ใ„น %_ใ… %_ใ…‚ %_ใ…… %_ใ…‡ ! ์ข…์„ฑ์œผ๋กœ ์‹œ์ž‘ (์–ด๋ฏธ, ์กฐ์‚ฌ)

Definitions
      Digit       = %0|1|2|3|4|5|6|7|8|9 ;
      Alphabet    = a|b|c|d|e|f|g|h|i|j|k|l|m|n|o|p|q|r|s|t|u|v|w|x|y|z|
                    A|B|C|D|E|F|G|H|I|J|K|L|M|N|O|P|Q|R|S|T|U|V|W|X|Y|Z ;
      Hanja       = ๏จ‹ ;

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

LEXICON Root
     ncLexicon ; ! ๋ณดํ†ต๋ช…์‚ฌ

LEXICON ncLexicon ! ๋ณดํ†ต๋ช…์‚ฌ
๊ฐ€๊ฑด๋ฌผ/nc   ncNext ;
๋„ค์ด๋ฒ„/nc   ncNext ;

LEXICON ncNext
   pxpxLexicon ;
   finLexicon ;

LEXICON pxpxLexicon ! ๋ณด์กฐ์‚ฌ-๋ณด์กฐ์‚ฌ
๊นŒ์ง€/px๋งŒ/px    pxpxNext ;
๋งŒ/px๋„/px  pxpxNext ;
๋ฐ–์—/px๋„/px    pxpxNext ;
๊นŒ์ง€/px๋„/px    pxpxNext ;
์กฐ์ฐจ/px๋„/px    pxpxNext ;
๊นŒ์ง€/px๋‚˜/px    pxpxNext ;
๋ฟ/px๋งŒ/px  pxpxNext ;
๋งˆ์ €/px๋„/px    pxpxNext ;
๋ฐ–์—/px%_ใ„ด/px  pxpxNext ;
๋งŒ/px์œผ๋กœ/pa๋„/px   pxpxNext ;

LEXICON finLexicon
   # ;

```
![test7](https://github.com/beyondnlp/nlp/raw/master/test7)


## apply ordered rule
* \*์— ๋Œ€ํ•œ ์ฒ˜๋ฆฌ ( simbol )์„ ์ •์˜ํ•˜์—ฌ ์‚ฌ์šฉ 
 * ๋‹ค/EC ๋‹ค/EF || _ \*/SF
* foma ์ž์†Œ๋ถ„ํ•ด ๋ฐฉ๋ฒ•์„ ์ดํ•ด


## lexc & xfst
* lexc์— ๊ธฐ์ˆ ๋œ ์—”ํŠธ๋ฆฌ๋“ค์€ ๊ธฐ๋ณธ์ ์œผ๋กœ ์Œ์ ˆ์ด edge๋กœ ์ƒ์„ฑ์ด ๋œ๋‹ค.
* ์—ฌ๋Ÿฌ ์Œ์ ˆ์„ ํ•˜๋‚˜์˜ edge๋กœ ํ‘œํ˜„ํ•˜๊ธฐ ์œ„ํ•ด์„œ๋Š” Multichar_Symbols์— ๋”ฐ๋กœ ์ •์˜๊ฐ€ ํ•„์š”
![test7](https://github.com/beyondnlp/nlp/raw/master/test7)
* ์ด์— ๋ฐ˜ํ•ด xfst์— ๊ธฐ์ˆ ํ• ๋•Œ๋Š” ๊ณต๋ฐฑ๋‹จ์œ„๋กœ edge๊ฐ€ ์ƒ์„ฑ
![test8](https://github.com/beyondnlp/nlp/raw/master/test8)


* @๋Š” ์˜คํ† ๋งˆํƒ€์— ํ‘œํ˜„๋˜์ง€ ์•Š๋Š” ์‹ฌ๋ณผ๋“ค์˜ ์—ฌ์ง‘ํ•ฉ์ด๋‹ค. 


# sample1
![sample1](https://github.com/beyondnlp/nlp/raw/master/sample1)
```
define n1 ๋‹ค ์Œ -> daum;
define n2 ๋‹ค -> ใ„ท ใ…;
define n3 [ ใ„ท | ใ… ];

define n4 n1 .o. n2 ;
regex n4;

```


# sample2
![sample2](https://github.com/beyondnlp/nlp/raw/master/sample2)
## ~$[n3] ์ถ”๊ฐ€
```
define n1 ๋‹ค ์Œ -> daum;
define n2 ๋‹ค -> ใ„ท ใ…;
define n3 [ ใ„ท | ใ… ];

define n4 n1 .o. n2 .o. ~$[n3];
regex n4;

```


# sample3
![sample3](https://github.com/beyondnlp/nlp/raw/master/sample3)
## ~$[n3] ์ถ”๊ฐ€
```
define n1 ๋‹ค ์Œ -> daum;
define n2 ๋‹ค -> ใ„ท ใ…;
define n3 [ ใ„ท  ];

define n4 n1 .o. n2 .o. ~$[n3];
regex n4;

```

# sample4
![sample4](https://github.com/beyondnlp/nlp/raw/master/sample4)
## ~$[n3] ์ถ”๊ฐ€
```
define n1 ๋‹ค ์Œ -> daum;
define n2 ๋‹ค -> ใ„ท ใ…;
define n3 [ ใ… ];

define n4 n1 .o. n2 .o. ~$[n3];
regex n4;

```



# complement 
![comp1](https://github.com/beyondnlp/nlp/raw/master/comp1)
## complement์—ฐ์‚ฐ์€ output string์—๋งŒ ์ ์šฉ๋œ๋‹ค.
## ์—ฐ์‚ฐ์ˆœ์„œ์— ๋”ฐ๋ผ ๊ฒฐ๊ณผ๊ฐ€ ๋‹ฌ๋ผ์ง„๋‹ค.
```
define n1 ๋‹ค ์Œ -> daum;
define n2 ๋‹ค -> ใ„ท ใ…;
define n3 ์Œ -> ใ…‡ใ…กใ…;


define test n1 .o. n2 .o. n3;
regex test;

```

## complement ์—ฐ์‚ฐ ํ›„ ์˜คํ† ๋งˆํƒ€
![comp2](https://github.com/beyondnlp/nlp/raw/master/comp2)
```
define n1 ๋‹ค ์Œ -> daum;
define n2 ๋‹ค -> ใ„ท ใ…;
define n3 ์Œ -> ใ…‡ใ…กใ…;
define n4 [ ใ„ท | ใ… | ใ…‡ | %_ | ใ… ];

define test n1 .o. n2 .o. n3 .o. ~$[n4];
regex test;

```

## ์œ„ ์˜คํ† ๋งˆํƒ€๋ฅผ ํŒŒ์ผ๋กœ ๋‚ด๋ฆฌ๋ฉด(: write att > output.txt )
```
0   0   @_IDENTITY_SYMBOL_@ @_IDENTITY_SYMBOL_@
0   0   ใ…‡ใ…กใ…  ใ…‡ใ…กใ…
0   0   daum    daum
0   0   ์Œ  ใ…‡ใ…กใ…
0   1   ๋‹ค  daum
1   0   ์Œ  @0@
0

```

## complement ์—ฐ์‚ฐ ์ˆœ์„œ ๋ณ€๊ฒฝ
![comp3](https://github.com/beyondnlp/nlp/raw/master/comp3)
```
define n1 ๋‹ค ์Œ -> daum;
define n2 ๋‹ค -> ใ„ท ใ…;
define n3 ์Œ -> ใ…‡ใ…กใ…;
define n4 [ ใ„ท | ใ… | ใ…‡ | %_ | ใ… ];

define test ~$[n4] .o. n1 .o. n2 .o. n3;
regex test;

```

# insert operation test
![insert1](https://github.com/beyondnlp/nlp/raw/master/insert1)
* ์‹ฌ๋ณผ์‚ฌ์ด์— ์–ด๋–ค ์กฐ๊ฑด์—์„œ ๋‹ค๋ฅธ ์‹ฌ๋ณผ์„ ์ถ”๊ฐ€ํ• ๋•Œ ์•„๋ž˜์™€ ๊ฐ™์ด ์‚ฌ์šฉ
## [..] -> B || C _ D Insert a single B between C and D
```
define test [..] -> B || C _ D ;
regex test
apply down> CDA
CBDA
```


# optional insert operation test
![op_insert1](https://github.com/beyondnlp/nlp/raw/master/op_insert1)
* A์‹ฌ๋ณผ์—์„œ B์‹ฌ๋ณผ์„ ์ƒ์„ฑ
```
define test A (->) B;
regex test;
apply down> A
A
B
```


# replace operation test
![replace1](https://github.com/beyondnlp/nlp/raw/master/replace1)
* A์‹ฌ๋ณผ์„ B์‹ฌ๋ณผ๋กœ ๋ณ€๊ฒฝ
```
define test A -> B;
regex test;
foma[1]: down
apply down> A
B
apply down> AAA
BBB
apply down> BBB
BBB
apply down> BAB
BBB
```