Configurations - jiesutd/YATO GitHub Wiki

Configuration parameters

YATO is a PyTorch-based framework with flexible choices of input features and output structures. The design of neural sequence models with YATO is fully configurable through a configuration file, which does not require any code work.

The following are the configuration parameters.

Train Configuration Parameters

Dataloader

train_dir=string. The path of the train file
dev_dir=string. The path of the validation file
test_dir=string. The path of the test file
sentence_classification=boolean. Whether it is a sentence classification task MAX_SENTENCE_LENGTH=int. The max sentence length
MAX_WORD_LENGTH=int. The max word length of a sentence
feature=[POS] emb_size=20 emb_dir=%your_pretrained_POS_embedding
feature configuration. It includes the feature prefix [POS], pretrained feature embedding file and the embedding size.
feature=[Cap] emb_size=20 emb_dir=%your_pretrained_Cap_embedding
feature configuration. It includes the feature prefix [Cap], pretrained feature embedding file and the embedding size.

Weight Location

model_dir=string. The path to save model weights
dset_dir=string. The path of configuration encode file

Be careful if you have the same name model weights or dset file, we would keep the old one rather than update them. Sometimes, it could cause some errors.

Char/Word Embedding

word_emb_dir=string. The path of the word embedding file
char_emb_dir=string. The path of the char embedding file
word_emb_dir=boolen. If normalize the pretrained word embedding
char_emb_dir=boolen. If normalize the pretrained character embedding
number_normalized=boolen. If normalize the digit into 0 for input files
word_emb_dim=int. Word embedding dimension
char_emb_dim=int. char embedding dimension
If model use pretrained char/word embedding char_emb_dim/word_emb_dim will be reset as the same dimension as pretrained embedidng.

Metric

seg=boolen. If task is segmentation like, tasks with token accuracy evaluation (e.g. POS, CCG) is False; tasks with F-value evaluation(e.g. Word Segmentation, NER, Chunking) is True.

Model Design

use_crf=boolen. Flag of if using CRF layer. If it is set as False, then Softmax is used in inference layer
use_char=boolen. Flag of if using character sequence layer
char_seq_feature=boolen. Neural structure selection for character sequence,it only be used when use char=True.optional: GRU/LSTM/CNN
use_word_seq=boolen. Flag of if using word sequence layer
use_word_emb=boolen. Flag of if using word embedding word_seq_feature=boolen. Neural structure selection for word sequence,it only be used when use_word_seq=True.optional: GRU/LSTM/CNN/FeedFowrd low_level_transformer=string. Pretrain language model from huggingface(TNN+PLM)
low_level_transformer_finetune=boolen. Choose whether to fine-tune the low-level-transformer on your own dataset
high_level_transformer=string. Pretrain language model from huggingface(Pure PLM/Hierarchical PLM)
high_level_transformer_finetune=boolen. Choose whether to fine-tune the high-level-transformer on your own dataset
customTokenizer=string, Use a different Tokenizer than PLM
customModel=string, Use a different model than PLM customConfig=string, Use a different model config than PLM
cnn_layer=int. CNN layer number for word sequence layer
char_hidden_dim=int. Character hidden vector dimension for character sequence layer
hidden_dim=int. Word hidden vector dimension for word sequence layer
lstm_layer=int. LSTM layer number for word sequence layer
bilstm=boolen. If use bidirection Istm for word seuquence layer
words2sent=string. The words in the sentence are converted to sentence representation optional: attention/minpooling/maxpooling/avgpooling/None
classifier=boolean. Create classifier head without using softmax classification(Similar to the classification_head function in huggingface)
classifier_activation=string. Activation function of classifier head

Hyperparameters

optimizer=string. Select Optimizer: SGD/Adagrad/adadelta/rmsprop/adam/adamw
scheduler=string. Select Optimizer: get_linear_schedule_with_warmup/get_cosine_schedule_with_warmup
warmup_step_rate=float. learning rate warmup step rate
ave_batch_loss=boolen. Set average the batched loss during training
iteration=int. Set the iteration number of training
batch_size=int. Set the batch size of training or decoding
dropout=float. Dropout probability
classifier_dropout=float.Classification head dropout probability
learning_rate=float. learning rate
clip=float. Clip the gradient which is larger than the setted number
momentum=float. momentum
l2=float. L2-regulization
gpu=boolen. Use GPU or not
device=string. Select GPU

Decode Configuration Parameters

status=string. decode
raw_dir=string. The path of decode file
nbest=int. 0 (NER)/1 (sentence classification) Use CRF in NER tasks for nbest decoding
decode_dir=string. The path of decode result file load_model_dir=string. The path of model weights sentence_classification=boolean. Whether it is a sentence classification task