# Configuration parameters
**YATO** is a PyTorch-based framework with flexible choices of input features and output structures. The design of neural sequence models with **YATO** is fully configurable through a configuration file, which does not require any code work.

The following are the configuration parameters.

## Train Configuration Parameters

### Dataloader
train_dir=string. The path of the train file            
dev_dir=string. The path of the validation file              
test_dir=string. The path of the test file        
sentence_classification=boolean. Whether it is a sentence classification task 
MAX_SENTENCE_LENGTH=int. The max sentence length                     
MAX_WORD_LENGTH=int. The max word length of a sentence                   
feature=[POS] emb_size=20 emb_dir=%your_pretrained_POS_embedding                  
*feature configuration. It includes the feature prefix [POS], pretrained feature embedding file and the embedding size.*                      
feature=[Cap] emb_size=20 emb_dir=%your_pretrained_Cap_embedding                    
*feature configuration. It includes the feature prefix [Cap], pretrained feature embedding file and the embedding size.*                      

### Weight Location
model_dir=string. The path to save model weights   
dset_dir=string. The path of configuration encode file

*Be careful if you have the same name model weights or dset file, we would keep the old one rather than update them.
Sometimes, it could cause some errors.*

### Char/Word Embedding 
word_emb_dir=string. The path of the word embedding file   
char_emb_dir=string. The path of the char embedding file    
word_emb_dir=boolen. If normalize the pretrained word embedding    
char_emb_dir=boolen. If normalize the pretrained character embedding      
number_normalized=boolen. If normalize the digit into 0 for input files                          
word_emb_dim=int. Word embedding dimension                 
char_emb_dim=int. char embedding dimension           
*If model use pretrained char/word embedding
char_emb_dim/word_emb_dim will be reset as the same dimension as pretrained embedidng.*

### Metric
seg=boolen. If task is segmentation like, tasks with token accuracy evaluation (e.g. POS, CCG) is False; tasks with F-value evaluation(e.g. Word Segmentation, NER, Chunking) is True.

### Model Design
use_crf=boolen. Flag of if using CRF layer. If it is set as False, then Softmax is used in inference layer           
use_char=boolen. Flag of if using character sequence layer             
char_seq_feature=boolen. Neural structure selection for character sequence,it only be used when use char=True.optional: GRU/LSTM/CNN   
use_word_seq=boolen. Flag of if using word sequence layer                         
use_word_emb=boolen. Flag of if using word embedding
word_seq_feature=boolen. Neural structure selection for word sequence,it only be used when use_word_seq=True.optional: GRU/LSTM/CNN/FeedFowrd
low_level_transformer=string. Pretrain language model from huggingface(TNN+PLM)      
low_level_transformer_finetune=boolen. Choose whether to fine-tune the low-level-transformer on your own dataset                     
high_level_transformer=string. Pretrain language model from huggingface(Pure PLM/Hierarchical PLM)     
high_level_transformer_finetune=boolen. Choose whether to fine-tune the high-level-transformer on your own dataset       
customTokenizer=string, Use a different Tokenizer than PLM             
customModel=string, Use a different model than PLM
customConfig=string, Use a different model config than PLM               
cnn_layer=int. CNN layer number for word sequence layer               
char_hidden_dim=int. Character hidden vector dimension for character sequence layer              
hidden_dim=int. Word hidden vector dimension for word sequence layer               
lstm_layer=int. LSTM layer number for word sequence layer            
bilstm=boolen. If use bidirection Istm for word seuquence layer      
words2sent=string. The words in the sentence are converted to sentence representation optional: attention/minpooling/maxpooling/avgpooling/None              
classifier=boolean. Create classifier head without using softmax classification(Similar to the classification_head function in huggingface)              
classifier_activation=string. Activation function of classifier head           

### Hyperparameters
optimizer=string. Select Optimizer: SGD/Adagrad/adadelta/rmsprop/adam/adamw         
scheduler=string. Select Optimizer: get_linear_schedule_with_warmup/get_cosine_schedule_with_warmup         
warmup_step_rate=float. learning rate warmup step rate                   
ave_batch_loss=boolen. Set average the batched loss during training                       
iteration=int. Set the iteration number of training         
batch_size=int. Set the batch size of training or decoding                  
dropout=float. Dropout probability                  
classifier_dropout=float.Classification head dropout probability                           
learning_rate=float. learning rate                  
clip=float. Clip the gradient which is larger than the setted number                   
momentum=float. momentum                  
l2=float. L2-regulization                
gpu=boolen. Use GPU or not        
device=string. Select GPU       


## Decode Configuration Parameters
status=string. decode            
raw_dir=string. The path of decode file             
nbest=int. 0 (NER)/1 (sentence classification) Use CRF in NER tasks for nbest decoding         
decode_dir=string. The path of decode result file
load_model_dir=string. The path of model weights
sentence_classification=boolean. Whether it is a sentence classification task