usage CRFSUITE - beyondnlp/nlp GitHub Wiki
crfsuite
* http://www.chokkan.org/software/crfsuite/
crfsuite learn [์ต์
] [ํ์ต๋ฌธ์]
- -m ๋ชจ๋ธ๋ช
- -g N : N๊ฐ์ ๊ทธ๋ฃน์ผ๋ก ํ์ต๋ฌธ์๋ฅผ ๋ถ๋ฆฌ
- -x : cross-validation์ ํ๊ธฐ ์ํ ์ต์
- -p key=val : ๋ด๋ถ์ ํ๋กํผํฐ๋ฅผ ์ค์ ํ๊ธฐ ์ํ ์ต์
( ex> mincount์ค์ )
- crfsuite learn -H ์ผ๋ก ์ค์ ๊ฐ๋ฅํ ํ๋กํผํฐ ๋ฆฌ์คํธ๋ฅผ ํ์ธํ ์ ์์
loat feature.minfreq = 0.000000;
The minimum frequency of features.
int feature.possible_states = 0;
Force to generate possible state features.
int feature.possible_transitions = 0;
Force to generate possible transition features.
float c1 = 0.000000;
Coefficient for L1 regularization.
float c2 = 1.000000;
Coefficient for L2 regularization.
int max_iterations = 2147483647;
The maximum number of iterations for L-BFGS optimization.
int num_memories = 6;
The number of limited memories for approximating the inverse hessian matrix.
float epsilon = 0.000010;
Epsilon for testing the convergence of the objective.
int period = 10;
The duration of iterations to test the stopping criterion.
float delta = 0.000010;
The threshold for the stopping criterion; an L-BFGS iteration stops when the
improvement of the log likelihood over the last ${period} iterations is no
greater than this threshold.
string linesearch = MoreThuente;
The line search algorithm used in L-BFGS updates:
{ 'MoreThuente': More and Thuente's method,
'Backtracking': Backtracking method with regular Wolfe condition,
'StrongBacktracking': Backtracking method with strong Wolfe condition
}
int max_linesearch = 20;
The maximum number of trials for the line search algorithm.
- -g : ์๊ณ ๋ฆฌ์ฆ ( lbfgs, l2sgd, ap, pa, arow )
- -e M : M๋ฒ์งธ ๊ทธ๋ฃน๋ง ํ
์คํธ์ ์ฌ์ฉ, ๋๋จธ์ง๋ ํ์ต์ ์ฌ์ฉ, ์ถ๋ ฅ๊ฒฐ๊ณผ์ ํด๋์ค๋ณ precision, recall, f1-score๋ฅผ ํ์
cross-validation์ ์คํ์ค์ผ๋๋ modelํ์ผ์ด ์์ฑ๋์ง ์์
- ์ stdin์ผ๋ก ์ฌ์ฉํ ์ ์์
crfsuite tag [์ต์
] [ํ
์คํธ๋ฌธ์]
- -m ๋ชจ๋ธ๋ช
: ํ
์คํธ์ ์ฌ์ฉํ ๋ชจ๋ธ
- -t : ๋ชจ๋ธ ํ๊ฐ ๊ฒฐ๊ณผ ๋ฆฌํฌํธ
- -r : ํ
์คํธ ๋ฌธ์์ ์๋ ๋ ์ด๋ธ์ ๊ฐ์ด ์ถ๋ ฅ
- -p : ๋ ์ด๋ธ์ ํ๋ฅ ๊ฐ ์ถ๋ ฅ
- -i : ์์ดํ
๋ณ marginal ํ๋ฅ ์ถ๋ ฅ
- -q : ํ
์คํธ ๋ชจ๋์์ ํ๊น
๊ฒฐ๊ณผ ์๋ต
crfsuite dump [๋ชจ๋ธ]
chunking.py input format
Rockwell NNP B-NP
International NNP I-NP
Corp. NNP I-NP
's POS B-NP
Tulsa NNP I-NP
unit NN I-NP
said VBD B-VP
it PRP B-NP
signed VBD B-VP
a DT B-NP
$
chunking.py output
B-NP w[0]=Rockwell w[1]=International w[2]=Corp. w[0]|w[1]=Rockwell|International pos[0]=NNP pos[1]=NNP pos[2]=NNP pos[0]|pos[1]=NNP|NNP pos[1]|pos[2]=NNP|NNP pos[0]|pos[1]|pos[2]=NNP|NNP|NNP __BOS__
I-NP w[-1]=Rockwell w[0]=International w[1]=Corp. w[2]='s w[-1]|w[0]=Rockwell|International w[0]|w[1]=International|Corp. pos[-1]=NNP pos[0]=NNP pos[1]=NNP pos[2]=POS pos[-1]|pos[0]=NNP|NNP pos[0]|pos[1]=NNP|NNP pos[1]|pos[2]=NNP|POS pos[-1]|pos[0]|pos[1]=NNP|NNP|NNP pos[0]|pos[1]|pos[2]=NNP|NNP|POS
I-NP w[-2]=Rockwell w[-1]=International w[0]=Corp. w[1]='s w[2]=Tulsa w[-1]|w[0]=International|Corp. w[0]|w[1]=Corp.|'spos[-2]=NNP pos[-1]=NNP pos[0]=NNP pos[1]=POS pos[2]=NNP pos[-2]|pos[-1]=NNP|NNP pos[-1]|pos[0]=NNP|NNP pos[0]|pos[1]=NNP|POS pos[1]|pos[2]=POS|NNP pos[-2]|pos[-1]|pos[0]=NNP|NNP|NNP pos[-1]|pos[0]|pos[1]=NNP|NNP|POS pos[0]|pos[1]|pos[2]=NNP|POS|NNP
B-NP w[-2]=International w[-1]=Corp. w[0]='s w[1]=Tulsa w[2]=unit w[-1]|w[0]=Corp.|'s w[0]|w[1]='s|Tulsa pos[-2]=NNP pos[-1]=NNP pos[0]=POS pos[1]=NNP pos[2]=NN pos[-2]|pos[-1]=NNP|NNP pos[-1]|pos[0]=NNP|POS pos[0]|pos[1]=POS|NNP pos[1]|pos[2]=NNP|NN pos[-2]|pos[-1]|pos[0]=NNP|NNP|POS pos[-1]|pos[0]|pos[1]=NNP|POS|NNP pos[0]|pos[1]|pos[2]=POS|NNP|NN
I-NP w[-2]=Corp. w[-1]='s w[0]=Tulsa w[1]=unit w[2]=said w[-1]|w[0]='s|Tulsa w[0]|w[1]=Tulsa|unit pos[-2]=NNP pos[-1]=POS pos[0]=NNP pos[1]=NN pos[2]=VBD pos[-2]|pos[-1]=NNP|POS pos[-1]|pos[0]=POS|NNP pos[0]|pos[1]=NNP|NN pos[1]|pos[2]=NN|VBD pos[-2]|pos[-1]|pos[0]=NNP|POS|NNP pos[-1]|pos[0]|pos[1]=POS|NNP|NN pos[0]|pos[1]|pos[2]=NNP|NN|VBD
I-NP w[-2]='s w[-1]=Tulsa w[0]=unit w[1]=said w[2]=it w[-1]|w[0]=Tulsa|unit w[0]|w[1]=unit|said pos[-2]=POS pos[-1]=NNP pos[0]=NN pos[1]=VBD pos[2]=PRP pos[-2]|pos[-1]=POS|NNP pos[-1]|pos[0]=NNP|NN pos[0]|pos[1]=NN|VBD pos[1]|pos[2]=VBD|PRP pos[-2]|pos[-1]|pos[0]=POS|NNP|NN pos[-1]|pos[0]|pos[1]=NNP|NN|VBD pos[0]|pos[1]|pos[2]=NN|VBD|PRP
B-VP w[-2]=Tulsa w[-1]=unit w[0]=said w[1]=it w[2]=signed w[-1]|w[0]=unit|said w[0]|w[1]=said|it pos[-2]=NNP pos[-1]=NN pos[0]=VBD pos[1]=PRP pos[2]=VBD pos[-2]|pos[-1]=NNP|NN pos[-1]|pos[0]=NN|VBD pos[0]|pos[1]=VBD|PRP pos[1]|pos[2]=PRP|VBD pos[-2]|pos[-1]|pos[0]=NNP|NN|VBD pos[-1]|pos[0]|pos[1]=NN|VBD|PRP pos[0]|pos[1]|pos[2]=VBD|PRP|VBD
B-NP w[-2]=unit w[-1]=said w[0]=it w[1]=signed w[2]=a w[-1]|w[0]=said|it w[0]|w[1]=it|signed pos[-2]=NN pos[-1]=VBD pos[0]=PRP pos[1]=VBD pos[2]=DT pos[-2]|pos[-1]=NN|VBD pos[-1]|pos[0]=VBD|PRP pos[0]|pos[1]=PRP|VBD pos[1]|pos[2]=VBD|DT pos[-2]|pos[-1]|pos[0]=NN|VBD|PRP pos[-1]|pos[0]|pos[1]=VBD|PRP|VBD pos[0]|pos[1]|pos[2]=PRP|VBD|DT
B-VP w[-2]=said w[-1]=it w[0]=signed w[1]=a w[2]=tentative w[-1]|w[0]=it|signed w[0]|w[1]=signed|a pos[-2]=VBD pos[-1]=PRP pos[0]=VBD pos[1]=DT pos[2]=JJ pos[-2]|pos[-1]=VBD|PRP pos[-1]|pos[0]=PRP|VBD pos[0]|pos[1]=VBD|DT pos[1]|pos[2]=DT|JJ pos[-2]|pos[-1]|pos[0]=VBD|PRP|VBD pos[-1]|pos[0]|pos[1]=PRP|VBD|DT pos[0]|pos[1]|pos[2]=VBD|DT|JJ
B-NP w[-2]=it w[-1]=signed w[0]=a w[1]=tentative w[2]=agreement w[-1]|w[0]=signed|a w[0]|w[1]=a|tentative pos[-2]=PRP pos[-1]=VBD pos[0]=DT pos[1]=JJ pos[2]=NN pos[-2]|pos[-1]=PRP|VBD pos[-1]|pos[0]=VBD|DT pos[0]|pos[1]=DT|JJ pos[1]|pos[2]=JJ|NN pos[-2]|pos[-1]|pos[0]=PRP|VBD|DT pos[-1]|pos[0]|pos[1]=VBD|DT|JJ pos[0]|pos[1]|pos[2]=DT|JJ|NN
- ์ด๊ธฐ์ CRFsuite source๋ฅผ ์ปดํ์ผํด์ ์ฌ์ฉํ๋๋ฐ
- L-BFGS terminated with error code (-1001)
- L-BFGS terminated with error code (-998)
- ๋ฌธ์ ๋ฅผ ๋ง๋ ๊ณ ์์ ํ๋ค. ์ค์น์์ ๋ฌธ์ ๊ฐ ์๋ ๊ฒ์ผ๋ก ํ๋จ๋๊ณ ํด๊ฒฐ์ฑ
์ ์ฐพ์ง ๋ชปํ๋ค.
- ์ด๋ฐ ๊ฒฝ์ฐ binaryํ์์ ๋ค์ด๋ฐ์ ์ฌ์ฉํ๋ฉด ๋๋ค.