Sentence Segmentation from Unformatted Text using Language Modeling and Sequence Labeling Approaches - volodymyr-sokolov/publications GitHub Wiki
Conference Paper
Ievgen Iosifov ,
Olena Iosifova
,
Volodymyr Sokolov
Current research devoted to the Natural Language Processing problem of sentence segmentation from raw text. The focus was directed to the task of segmentation of auto-generated transcripts for videos that do not have any punctuation and segmentation. Two general approaches to solve the problem of sentence segmentation were proposed and experiments concluded on a comparison of results of pre-trained transformer-based models. Research on how different approach of solving problem affects results were carried out. As a result, the sequence labeling approach turned out to be the most suitable.
https://ieeexplore.ieee.org/document/9468084 | 10.1109/PICST51311.2020.9468084
fine-tuning; natural language process; NLP; sentence segmentation component; transformer
Computational Linguistics; Natural Language Processing Systems; Language Modeling
6–9 October 2020 Kharkiv, Ukraine
First Online: 2 July 2021
-
ISBN: 978-1-7281-9178-2
, 978-1-7281-9177-5
-
EID: 2-s2.0-85114396341
-
INSPEC: 20778104
-
KUBG: 37097
I. Iosifov, O. Iosifova, V. Sokolov, Sentence Segmentation from Unformatted Text using Language Modeling and Sequence Labeling Approaches, in: IEEE 7th International Scientific and Practical Conference Problems of Infocommunications. Science and Technology (2020) 335–337. doi:10.1109/PICST51311.2020.9468084.