# Configuration parameters **YATO** is a PyTorch-based framework with flexible choices of input features and output structures. The design of neural sequence models with **YATO** is fully configurable through a configuration file, which does not require any code work. The following are the configuration parameters. ## Train Configuration Parameters ### Dataloader train_dir=string. The path of the train file dev_dir=string. The path of the validation file test_dir=string. The path of the test file sentence_classification=boolean. Whether it is a sentence classification task MAX_SENTENCE_LENGTH=int. The max sentence length MAX_WORD_LENGTH=int. The max word length of a sentence feature=[POS] emb_size=20 emb_dir=%your_pretrained_POS_embedding *feature configuration. It includes the feature prefix [POS], pretrained feature embedding file and the embedding size.* feature=[Cap] emb_size=20 emb_dir=%your_pretrained_Cap_embedding *feature configuration. It includes the feature prefix [Cap], pretrained feature embedding file and the embedding size.* ### Weight Location model_dir=string. The path to save model weights dset_dir=string. The path of configuration encode file *Be careful if you have the same name model weights or dset file, we would keep the old one rather than update them. Sometimes, it could cause some errors.* ### Char/Word Embedding word_emb_dir=string. The path of the word embedding file char_emb_dir=string. The path of the char embedding file word_emb_dir=boolen. If normalize the pretrained word embedding char_emb_dir=boolen. If normalize the pretrained character embedding number_normalized=boolen. If normalize the digit into 0 for input files word_emb_dim=int. Word embedding dimension char_emb_dim=int. char embedding dimension *If model use pretrained char/word embedding char_emb_dim/word_emb_dim will be reset as the same dimension as pretrained embedidng.* ### Metric seg=boolen. If task is segmentation like, tasks with token accuracy evaluation (e.g. POS, CCG) is False; tasks with F-value evaluation(e.g. Word Segmentation, NER, Chunking) is True. ### Model Design use_crf=boolen. Flag of if using CRF layer. If it is set as False, then Softmax is used in inference layer use_char=boolen. Flag of if using character sequence layer char_seq_feature=boolen. Neural structure selection for character sequence,it only be used when use char=True.optional: GRU/LSTM/CNN use_word_seq=boolen. Flag of if using word sequence layer use_word_emb=boolen. Flag of if using word embedding word_seq_feature=boolen. Neural structure selection for word sequence,it only be used when use_word_seq=True.optional: GRU/LSTM/CNN/FeedFowrd low_level_transformer=string. Pretrain language model from huggingface(TNN+PLM) low_level_transformer_finetune=boolen. Choose whether to fine-tune the low-level-transformer on your own dataset high_level_transformer=string. Pretrain language model from huggingface(Pure PLM/Hierarchical PLM) high_level_transformer_finetune=boolen. Choose whether to fine-tune the high-level-transformer on your own dataset customTokenizer=string, Use a different Tokenizer than PLM customModel=string, Use a different model than PLM customConfig=string, Use a different model config than PLM cnn_layer=int. CNN layer number for word sequence layer char_hidden_dim=int. Character hidden vector dimension for character sequence layer hidden_dim=int. Word hidden vector dimension for word sequence layer lstm_layer=int. LSTM layer number for word sequence layer bilstm=boolen. If use bidirection Istm for word seuquence layer words2sent=string. The words in the sentence are converted to sentence representation optional: attention/minpooling/maxpooling/avgpooling/None classifier=boolean. Create classifier head without using softmax classification(Similar to the classification_head function in huggingface) classifier_activation=string. Activation function of classifier head ### Hyperparameters optimizer=string. Select Optimizer: SGD/Adagrad/adadelta/rmsprop/adam/adamw scheduler=string. Select Optimizer: get_linear_schedule_with_warmup/get_cosine_schedule_with_warmup warmup_step_rate=float. learning rate warmup step rate ave_batch_loss=boolen. Set average the batched loss during training iteration=int. Set the iteration number of training batch_size=int. Set the batch size of training or decoding dropout=float. Dropout probability classifier_dropout=float.Classification head dropout probability learning_rate=float. learning rate clip=float. Clip the gradient which is larger than the setted number momentum=float. momentum l2=float. L2-regulization gpu=boolen. Use GPU or not device=string. Select GPU ## Decode Configuration Parameters status=string. decode raw_dir=string. The path of decode file nbest=int. 0 (NER)/1 (sentence classification) Use CRF in NER tasks for nbest decoding decode_dir=string. The path of decode result file load_model_dir=string. The path of model weights sentence_classification=boolean. Whether it is a sentence classification task